Word Sense Embedded in Geometric Spaces - From Induction to Applications using Machine Learning


Licentiate seminar

Date & time: December 2, 2016
Location: HC1, Hörsalsvägen 14, Chalmers

Discussion leader

Richard Socher - Chief scientist at Salesforce and lecturer at Stanford University


Main supervisor: Devdatt Dubhashi
Co-supervisors: Richard Johansson, Shalom Lappin

Thesis abstract

Words are not detached individuals but part of an interconnected web of related concepts, and to capture the full complexity of this web they need to be represented in a way that encapsulates all the semantic and syntactic facets of the language. Further, to enable computational processing they need to be expressed in a consistent manner so that common properties, e.g. plurality, are encoded in a similar way for all words sharing that property. In this thesis dense real valued vector representations, i.e. word embeddings, are extended and studied for their applicability to natural language processing (NLP).
Word embeddings of two distinct flavors are presented as part of this thesis, sense aware word representations where different word senses are represented as distinct objects, and grounded word representations that are learned using multi-agent deep reinforcement learning to explicitly express properties of the physical world while the agents learn to play Guess Who?. The empirical usefulness of word embeddings is evaluated by employing them in a series of NLP related applications, i.e. word sense induction, word sense disambiguation, and automatic document summarisation. The results show great potential for word embeddings by outperforming previous state-of-the-art methods in two out of three applications, and achieving a statistically equivalent result in the third application but using a much simpler model than previous work.

Included papers and my contributions:

Paper I: Neural context embeddings for automatic discovery of word senses [pdf]

  • Main author.
  • Developed the main idea.
  • Wrote ~50% of the text.
  • Implemented ~50% of the experiments.

Paper II: Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence [pdf]

  • Initiated the project.
  • Supervised the main author
  • Contributed towards the manuscript (abstract, introduction, and conclusions)
  • Contributed towards the technical contribution of the paper.

Paper III: Word Sense Disambiguation using a Bidirectional LSTM [Accepted to Coling workshop]

  • Developed the main idea.
  • Wrote 90% of the text.

Paper IV: Extractive Summarization using Continuous Vector Space Models [pdf]

  • Main author.
  • Developed the main idea.
  • Wrote ~80% of the text.
  • Implemented ~50% of the experiments.

Paper V: Extractive Summarization by Aggregating Multiple Similarities [pdf]

  • Second author.
  • Multiplicative interaction between kernels.
  • Wrote ~20% of the text.
  • Implemented ~20% of the experiments (the parts relating to word embeddings).


Summarisation demo

Future Direction of Research

As the licentiate thesis, to a large extent, represent a milestone on the way to a PhD, some thoughts on current and future work that will lead up to the dissertation are presented next. The general direction that is being taken is towards sequences of words and emergent properties captured through the interaction between agents. At the time of writing, this translates to the following list of ongoing projects:

  • Symbolic input sequence optimization - Taking an optimization approach to the sequence to sequence decoding problem by utilizing the gradient to do optimization over a one-hot input space.
  • Grounded word embeddings of human language - Connecting the grounded embeddings described in Paper II with existing human language, to learn grounded embeddings of real words.
  • Waveform translation - Realizing that the models behind neural machine translation are independent of the underlying data, we try to connect the spectral voiceprint of the source sentence to the voiceprint of the target sentences directly. Though challenging, this approach has the potential of producing a far superior speech-to-speech translation system than approaches that are constraint by having to transcode the spoken language in text, since a lot of information gets lost in that step.

[1] Harris, Z. S.. (1954). Distributional structure.. Word.
title={Distributional structure.},
author={Harris, Zellig S},

Learning to Play Guess Who? and Inventing a Grounded Language as a Consequence


Learning your first language is an incredible feat and not easily duplicated. Doing this using nothing but a few pictureless books, a corpus, would likely be impossible even for humans. As an alternative we propose to use situated interactions between agents as a driving force for communication, and the framework of Deep Recurrent Q-Networks (DRQN) for learning a common language grounded in the provided environment. We task the agents with interactive image search in the form of the game Guess Who?. The images from the game provide a non trivial environment for the agents to discuss and a natural grounding for the concepts they decide to encode in their communication. Our experiments show that it is possible to learn this task using DRQN and even more importantly that the words the agents use correspond to physical attributes present in the images that make up the agents environment.