Several research efforts have been made to convert/compress the large-scale pretrained transformer models models into efficient inference models that facilitate downstream applications. This task becomes important, as variety of autoregressive transformers have shown very high improvements in the NLP application performance baselines. Similar to recurrent neural networks (RNNs), those models represent the context by a
In this blog, I have demonstrated the use of XLNet for a simple emotion classifier. The dataset used in this task contains four emotion classes (0-Anger, 1-Fear, 2-Joy, and 3-Sadness). I have used the XLNet pretrained model to classify it. Before going into the details – please check the XLNet basics (covered in the following
With the increase in the complexity of data, and to fulfill the accuracy-related requirements, people started preferring ensemble classifiers. However, the selection of ensemble classifiers is not that easy. We have a lot of ensemble strategies, like: (1) Model Averaging, (2) Weighted Model Averaging, (3) Majority Voting, (4) Bagging, (5) Boosting, (6) Stacking, (7) Blending,
Actually LSTM supports three-dimensional input. They are – (samples, time steps, features) Samples. One complete sequence is considered as one sample. A batch may contains one or more samples. In NLP, if we are dealing with the text at sentence level (means taking one sentence at a time), then our sample size will be one.
We can divide the Multi-task learning into four layers. Here Shared layer learns jointly learns important features from text input and plays a very important role. Finally, Task-Layer uses this jointly learned features for different task specific predictions. However, in complex Multi-Task learning, the Task layer can use additional features (additional to that learned from