Question : Can you explain the loss function used in the word vectors?
Answer : Lets consider a sentence example
“Today I am giving a lecture”.
And we know that our word vectors loss function mathematically represented is,
word vectors loss functionLet’s say,
our window size, n =  3
start at I,
Vc (i.e Vector of center word) = “I”
Uo (i/.e Vector U of outside word)
Uo(today) & Uo(am)
Now we want to predict the word I, from the contextor outside word i.e
Uo(today) & Uo(am).
Now, For 1st window, “Today I am”, our terms will be
1st term  => exp(Vi * Uo(today)^T)
2nd term = > exp(Vi * Uo(am)^T)

Question :  Why using the windowed approach for word vectors is inefficient ?
Answer: In we use the windowed approach then, we would be computing the vector for a same word multiple times. For example, in the sentence,
“I like learning Machine Learning.”
With window size =1,  for the first window we would be computing the word vectors,
1st Window = “I like learning” , center word = V.like,  outside words =  {V.I , V.learning}
2nd Window= “like learning machiene” , center word = V.learning,
outside words =  {V.like , V.machiene}
Here, we observe that for each word, we are computing the vector three times, once as center word, and then as leftmost word to center and then the rightmost to the center word. Imagine doing this over all the documents and we can easily understand why the Windowed approach to the word vector is expensive.

Question: How can we overcome this problem ?
Answer:  We overcome this, by considering the whole document.

Question : But isn’t is expensive, since there are millions, billions of words in a training document i.e all Wikipedia?
Answer: If we have to go over the entire training set and do the vector updates then, it would certainly be expensive. Hence what we always do is update our parameters, after looking over some set of examples.
Question : What sort of subtleties should we be aware of ?
Answer : A lots of them, simple could be if we should consider the digits as d or treat all as different. For example, 1 dog live here, 2 dog live her , 3 dog live here. Should we replace the 1,2,3 as d since they have no relation to  define “dog”.

 

 

 

 

Advertisements