Question : Wow, CBOW and skip-gram can capture the semantics and syntactic information? But wait what about the polysemy i.e ( same word with multiple meanings) ?
Answer : Unfortunately, CBOW and skip-gram cannot capture the polysemy, because they tend to represent a word as a single vector. e.g
– (v.) attach or fasten with string. “He is tied to the bed by the strong rope”
– (v.) restrict or limit. “She didn’t want to be like her mother, tied to a feckless man”
– (v.) relate to or connect to. “Is allergy tied to dairy product ? ”
– (v.) finish equal. “Jane and I tied (for first place ) in the test.”
– (n.) a wearable tie. “He always wears a jacket and a tie to work.”
But with CBOW and skip-gram model, there is a single vector “tie”, which tries to represent all the above six meanings, which is not possible with a single word.
Question : Is there any other drawback with CBOW / Skip-gram models?
Answer: Both of the CBOW and skip-gram model fails to identify the combined word phrases. e.g “New York” is a single word and cannot be treated as New and York two different words.
Question : So Is there something we can do , so that such polysemy words can be effectively represented?
Answer : To solve this problem of polysemy, Sense embedding is used. While in case of word embedding we used to represent a single word with a single vector, with sense embedding, we represent even a single word with different word vectors based on the sense it means.e.g
|Word Vector Embedding||Sense Vector Embedding|
|a. tie (1 word, 5 meaning)
is 1 word ~ 1 vector representation
|a. tie (1 word, 5 sense (meaning) )
is 1 word (5 sense) ~ 5 vector representation
Question : Awesome, Sense Embedding is able to capture the polysemy of single word. How can we achieve it ?
Answer : There are three different ways to represent word sense.
a. Clustering based Word Sense Representation
b. Non-parametric Word Sense Representation
c. Ontology- based word sense representation
Clustering based Word Sense Representation
Question : How does Clustering based word sense representation work?
Answer : The Clustering based word sense methods work on the philosophy that “Word meaning is reflected by the context words”. Reisinger and Mooney, 2010; Huang et al.,
2012a follows on this approach. e.g
“Bank is a financial institution”. vs “River bank is land along river edge”.
The two “bank” word refers to two different meanings. We know it via the context words of the word “bank”
Question : Excellent, what if some word has 2 meaning, and some other have 6 meanings ? Can clustering based word sense representation handle such variation in word senses?
Answer : Unfortunately not, clustering based word sense representation assign fixed number of senses for each word. They cannot assign word senses dynamically.Only the non-parametric methods can do so.
Non-Parametric based Word Sense Representation
Question : How does Non-parametric word sense representation work?
Answer : The non-parametric word sense representation also works using the same philosophy as in the case of above Clustering based word sense representation.
Question : Some words have 2 meaning, and some others have 6 meanings ? Can non-parametric based word sense representation handle such variation in word senses?
Answers : Yes the non-parametric models can assign word sense dynamiclally. They do so with non-parametric process such as “Chinese Restaurant Process (CRP)”
Ontology based Word Sense Representation
Question : How does the Ontology -based word sense representation work ?
Answer : Ontology based word -sense representation uses the existing ontology such as “WordNet” to infuse sense into existing sense inventory. Chen et. al 2014 uses this Ontology based word-sense representation.
Ref : https://www.cs.rochester.edu/~lsong10/papers/area.pdf