Question : Wow, CBOW and skip-gram can capture the semantics and syntactic information? But wait what about the polysemy i.e ( same word with multiple meanings) ?
Answer : Unfortunately, CBOW and skip-gram cannot capture the polysemy, because they tend to represent a word as a single vector. e.g
– (v.) attach or fasten with string. “He is tied to the bed by the strong rope”
– (v.) restrict or limit. “She didn’t want to be like her mother, tied to a feckless man”
– (v.) relate to or connect to. “Is allergy tied to dairy product ? ”
– (v.) finish equal. “Jane and I tied (for first place ) in the test.”
– (n.) a wearable tie. “He always wears a jacket and a tie to work.”
But with CBOW and skip-gram model, there is a single vector “tie”, which tries to represent all the above six meanings, which is not possible with a single word.
Question : Is there any other drawback with CBOW / Skip-gram models?
Answer: Both of the CBOW and skip-gram model fails to identify the combined word phrases. e.g “New York” is a single word and cannot be treated as New and York two different words.
Question : So Is there something we can do , so that such polysemy words can be effectively represented?
Answer : To solve this problem of polysemy, Sense embedding is used. While in case of word embedding we used to represent a single word with a single vector, with sense embedding, we represent even a single word with different word vectors based on the sense it means.e.g
|Word Vector Embedding||Sense Vector Embedding|
|a. tie (1 word, 5 meaning)
is 1 word ~ 1 vector representation
|a. tie (1 word, 5 sense (meaning) )
is 1 word (5 sense) ~ 5 vector representation
Question : Awesome, Sense Embedding is able to capture the polysemy of single word. How can we achieve it ?
Answer : There are three different ways to represent word sense.
a. Clustering based Word Sense Representation
b. Non-parametric Word Sense Representation
c. Ontology- based word sense representation
Question : Well, some words have two meaning, some may have 3 and so on i.e each word has different number of meaning associated with it. Are all the three methods listed above able to handle it?
Answer : No, the Clustering based word sense is not able to infer different number of meanings for different words. It can only learn a fixed number of sense for each polysemous word.
As for the non-parametric approach, it is able to do so.
As for the ontology based methods, learns the sense according to a existing sense inventory.
Question : Also, next important aspect of learning word senses is the ability to learn a new word’s sense if it is introduced, how does the three sense representation perform in regards to it?
Answer: The clustering based and non-parametric word sense approach is able to handle such cases, however the ontology based word sense approach is not able to learn new senses because it is using the predefined word sense ontology. And until the ontology is updated, it can not learn it.
Clustering based Word Sense Representation
Question : How does Clustering based word sense representation work?
Answer : The Clustering based word sense methods work on the philosophy that “Word meaning is reflected by the context words”. Reisinger and Mooney, 2010; Huang et al.,
2012a follows on this approach.
Question : I still don’t understand, how does it work? Can you explain it with a simple example ?
Answer: Sure, Let’s consider the two articles.
Article 1 : “Bank is a financial institution. It accepts deposits from public and creates credit.Lending activities can be performed either directly or indirectly through capital markets…”
Article 2 : “River bank is land along river edge. It consists of the sides of the channel, between which the flow is confined. It is also of interest in navigation, where the term can refer either to a barrier island or a submerged plateau “.
As a human, based on the context, we can easily distinguish that bank in Article 1 refers to Money bank and that bank in Second article refers to river bank. This we can do so using the context. We use this same concept in clustering based word sense generation to identify polysemy words.
Question : Well, Should I always take the whole document as a context or some portion of the document as context to determine word sense ?
Question : Excellent, what if some word has 2 meaning, and some other have 6 meanings ? Can clustering based word sense representation handle such variation in word senses?
Answer : Unfortunately not, clustering based word sense representation assign fixed number of senses for each word. They cannot assign word senses dynamically.Only the non-parametric methods can do so.
Non-Parametric based Word Sense Representation
Question : How does Non-parametric word sense representation work?
Answer : The non-parametric word sense representation also works using the same philosophy as in the case of above Clustering based word sense representation.
Question : Some words have 2 meaning, and some others have 6 meanings ? Can non-parametric based word sense representation handle such variation in word senses?
Answers : Yes the non-parametric models can assign word sense dynamiclally. They do so with non-parametric process such as “Chinese Restaurant Process (CRP)”
Ontology based Word Sense Representation
Question : How does the Ontology -based word sense representation work ?
Answer : Ontology based word -sense representation uses the existing ontology such as “WordNet” to infuse sense into existing sense inventory. Chen et. al 2014 uses this Ontology based word-sense representation.
Ref : https://www.cs.rochester.edu/~lsong10/papers/area.pdf