Question :  Wow, CBOW and skip-gram can capture the semantics  and syntactic information? But wait what about the polysemy i.e ( same  word with multiple meanings) ?
Answer : Unfortunately, CBOW and skip-gram cannot capture the polysemy, because they tend to represent a word as a single vector. e.g
“tie”
– (v.) attach or fasten with string. “He is tied to the bed by the strong rope”
– (v.) restrict or limit. “She didn’t want to be like her mother, tied to a feckless man”
– (v.) relate to or connect to. “Is allergy tied to dairy product ? ”
– (v.) finish equal. “Jane and I tied (for first place ) in the test.”
–  (n.) a wearable tie. “He always wears a jacket and a tie to work.”

But with CBOW and skip-gram model,   there is a single vector “tie”, which tries to represent  all the above six meanings, which is  not possible with a single word.

polysemy .JPG

Question : Is there any other drawback with  CBOW / Skip-gram models?
Answer:  Both of the CBOW and skip-gram model fails to identify the combined word phrases. e.g “New York” is a single word and cannot be treated as New  and York two different words.

Question : So Is there something we can do , so that such polysemy  words can be effectively represented?
Answer : To solve this problem of polysemy, Sense embedding is used. While in case of word embedding we used to represent a single word with a single vector, with sense embedding, we represent even a single word with different word vectors based on the sense it means.e.g

Word Vector Embedding Sense Vector Embedding
a. tie (1 word, 5 meaning)
is  1 word ~ 1 vector representation
a. tie (1 word, 5 sense (meaning) )
is 1 word (5 sense) ~ 5 vector representation

Question : Awesome, Sense Embedding is able to capture the polysemy of single word. How  can we achieve it ?
Answer :  There are three different ways to represent word sense.
a. Clustering based  Word Sense Representation
b. Non-parametric Word Sense Representation
c. Ontology- based word sense representation

Clustering based  Word Sense Representation

Question : How does Clustering based word sense representation work?
Answer :  The Clustering based word sense methods work on the philosophy that “Word meaning is reflected by the context words”. Reisinger and Mooney, 2010; Huang et al.,
2012a  follows on this approach. e.g
            “Bank is a financial institution”.  vs “River bank  is land along river edge”.
The two “bank” word refers to two different meanings. We know it via the context words of the word “bank”

Question : Excellent, what if  some word has 2 meaning, and some other have 6 meanings ? Can clustering based word sense representation handle such variation in word senses?
Answer :  Unfortunately not, clustering based word sense representation  assign fixed number of senses for each word. They cannot assign word senses dynamically.Only the non-parametric methods can do so.

 

Non-Parametric based  Word Sense Representation

Question : How does Non-parametric word sense representation work?
Answer : The non-parametric word sense representation also works  using the same philosophy as in the case of above Clustering based word sense representation.

Question : Some words have 2 meaning, and some others have 6 meanings ? Can non-parametric based word sense representation handle such variation in word senses?
Answers : Yes the non-parametric models can assign word sense dynamiclally.  They do so with non-parametric process such as “Chinese Restaurant Process (CRP)”

Ontology based  Word Sense Representation

Question : How does the Ontology -based word sense representation work ?
Answer : Ontology based word -sense representation uses the existing ontology  such as “WordNet” to infuse sense into existing sense inventory. Chen et. al 2014 uses this Ontology based word-sense representation.

 

 

 

 

 

 

 

 

 

Ref : https://www.cs.rochester.edu/~lsong10/papers/area.pdf

 

Advertisements