Several research efforts have been made to convert/compress the large-scale pretrained transformer models models into efficient inference models that facilitate downstream applications. This task becomes important, as variety of autoregressive transformers have shown very high improvements in the NLP application performance baselines. Similar to recurrent neural networks (RNNs), those models represent the context by a
Tag: Deep Learning
Actually LSTM supports three-dimensional input. They are – (samples, time steps, features) Samples. One complete sequence is considered as one sample. A batch may contains one or more samples. In NLP, if we are dealing with the text at sentence level (means taking one sentence at a time), then our sample size will be one.
We can divide the Multi-task learning into four layers. Here Shared layer learns jointly learns important features from text input and plays a very important role. Finally, Task-Layer uses this jointly learned features for different task specific predictions. However, in complex Multi-Task learning, the Task layer can use additional features (additional to that learned from