Children learn to understand language while learning to perceive, interact-with, explain and make predictions about the world around them. Our linguistic knowledge depends critically on sensory-motor and perceptual processes, which in turn are influenced and shaped by our language. My work simulates this process of acquiring language jointly with perceptual and motor processes as a path to realistic language understanding in fully embodied systems. With many brilliant collaborators at Deepmind, I have developed agents that can learn the meaning of words and short phrases as they pertain to perceptual stimuli and complex action sequences in continuous 3D worlds. These agents naturally compose known words to successfully interpret never-seen-before phrases, a trait that matches the productivity of human language understanding. Further analysis showed that learning is much more efficient if agents exploit multiple complementary learning algorithms, another property of human language learning. We developed AGILE, an algorithm that jointly learns a instruction-conditioned reward function and a policy for realising that instruction. This provides a solution to a critical challenge for training agents to follow language commands; that the truth conditions of most linguistic expressions are typically very hard to express formally (e.g. in a programmed reward function).
A child might learn what growing means by observing a sibling, a pet or a plant get physically bigger, but once understood, the same idea of growing can be applied to pocket money, a tummy ache or Dad’s age. This ability to represent relations, principles or ideas like growing with sufficient abstraction that they can be flexibly (re-)applied in disparate, and potentially unfamiliar, contexts and domains is central to human cognition, analogical reasoning and language. My work has shown that neural networks that combine raw perception from pixels, together with components for reasoning across discrete sets of images, can exhibit strong analogical reasoning and impressive generalisation if trained in a particular way. Similar models can also be trained to solve visual reasoning tasks that challenge even the most able humans; our dataset of these problems is available for further research here.
I don’t think it makes sense to attribute good or bad generalisation to a particular model, model class or functional form. Models with strong inductive bias suited to a particular problem can exhibit impressive generalisation on domains related to that problem. On the other hand, more general architectures with greater variance may be effective on a wider range of problems, but may need a specific curriculum of training experience in order to exhibit strong generalisation. Hybrid approaches, such as our Neural Arithmetic Logic Unit, can represent the best of both worlds; the unit itself introduces a strong inductive bias suited to the extrapolation of numerical quantities, but models that include NALUs alongside conventional architectural components can retain general applicability to non-numerical problems as well.
During my PhD, I worked with Anna Korhonen on ways to extract and represent meaning from text and other language data in distributed representations. I developed FastSent and Sequential Denoising Auto-Encoders, ways to learn sentence representations from unlabelled text. With Yoshua Bengio and Kyunghyun Cho, I noticed you can train a network on dictionary definitions to solve general-knowledge crosswords clues. With Jase Weston and Antoine Bordes I applied neural networks with external memory components to answer questions about passages in books. I also made SimLex-999 a way to measure how well distributed representations of words reflect human semantic intuitions, and recently helped to develop the GLUE benchmark for evaluating models of language understanding.
With Steve Clark I taught a Master’s course Deep Learning for NLP at the Computer Laboratory, Cambridge University in 2018. If you follow that link you can find the synopsis, lecture slides and Tensorflow code for training neural networks on dictionary definitions. We got nice feedback, and hope to do the course again (somewhere) soon.
Kenote address, First Mexican International Meeting on Artificial Intelligence (in Spanish), August 2018.
Kenote address, 39th TabuDag meeting of linguists, Groningen, Netherlands, June 2018.
Video talks @ MSR Redmond (2016) and ICLR 2016, San Juan, Puerto Rico.
See Google Scholar for a list of publications.