Question : What does Deep Learning models do?
Answer : They take in raw inputs and learn representations. In case of image, they take in image pixels, in NLP, it is raw words or raw characters.

Question: Why Deep Learning ?
Answer: Manually designed features are often over-specific, incomplete and take a long time to design and validate. Now the projects that took 2-3 years for a team of 20,30 data science professionals + domain experts can now be learnt by the deep learning models in fraction of hours.
a. Deep learning can learn unsupervised (relation  between text) and supervised data.

Question : How does Deep Learning work with NLP?
Answer : In DL, every word is  represented as vectors and hence the noun phrase or phrase , hence is also represented as vector where NP vector is the sum of two word vectors. NN combines two vectors into one vector to  find out such Noun phrases.

Question : How does Word Vector work to capture relationship and solve discrete representation problem ?
Answer : In traditional NLP methods,  even similar words are represented as distinct vectors e.g  hotel = [0 0 0 1] and  and motel = [1 0 0 0] then and hotel and motel shows no relation.
The core idea of the modern statistical NLP is
“You shall know a word by the company it keeps” – J.R Firth, 1957
However  for word vectors, a word is defined in the context window. And the context is used to find relationship
For example In  “Banking crisis is mainly due to debt problem.”
The context crisis and debt problem is used to define banking. Hence  in the next sentence such as “Cooperatives and Finance have major crisis due to to debt problems.”
then from the context of crisis and debt problem, we know that “Cooperative” and  “Finance”  are similar to “Banks”.
Similarly, For next example,
“Boat has rudders. Rudder helps the boat to steer around in water.”
Well, in the above example we can clearly see that with co-occurrence matrix we can easily find that the “Boat” and “Rudder” are related.

Question : Nice that it can capture the neighborhood, but what size should we consider. Should we consider all the document or consider only a window of the document?
Answer: Both of these approach capture different aspects, hence both are equally valid based upon the scenario.
– Context over Full document : Full Document co-occurence matrix will give us what the document is about, while
– Context over window : Window based co-occurrence matrix will capture the syntactic and semantic information

Question : What does Document  co-occurrence give ?
Answer: It will give a general topic.  For example, boat, wet, swimming, ship, they all will be similar to each other, and will often occur around some boating topic. Great at capturing context such as this is about real estate, science, politics e.t.c

To know more on “Document co-occurrence”, See Converting Word to vectors using the Continuous Bag of Words Approach – Full Document context Approach

Question: What does Window based context capture ?
Answer :  They will capture semantic and syntactic analogy such as synonyms, antonyms.
Syntactic information refers to the grammatical information. For example,
1. Verb is followed by noun. i.e Boat has, Rudder helps
2. Noun : That Boat and Rudder are somehow related
3. Verb : And that has, helps are somehow related.
Semantic information means capturing the meaning. i.e Boat and rudder is related.

See Window Based Word Vectorisation For more details

See Full Document based word vectorisation for more details. (Coming Soon)