Actually LSTM supports three-dimensional input. They are – (samples, time steps, features) Samples. One complete sequence is considered as one sample. A batch may contains one or more samples. In NLP, if we are dealing with the text at sentence level (means taking one sentence at a time), then our sample size will be one.
We can divide the Multi-task learning into four layers. Here Shared layer learns jointly learns important features from text input and plays a very important role. Finally, Task-Layer uses this jointly learned features for different task specific predictions. However, in complex Multi-Task learning, the Task layer can use additional features (additional to that learned from