Google Research Blog
The latest news from Research at Google
Chat Smarter with Allo
Wednesday, May 18, 2016
Posted by Pranav Khaitan, Google Research
At Google, we are continuously building products powered by
Machine Learning
to delight our users and simplify their lives. Today, we are excited to talk about the technology behind
Allo
, a new smart messaging app that uses the power of
neural networks
and Google Search to make your text conversations easier and more productive.
Just like
Smart Reply for Inbox
, Allo understands the conversation history to generate a set of suggestions that the user will likely want to respond with. In addition to understanding the context of your conversation, Allo learns your individual style, so the responses are personalized for you.
How does it work?
About a year ago, we started exploring how we can make communication easier and more fun to use. The idea of Smart Reply for Allo came from my teammate Sushant Prakash who, along with Ori Gershony, led their teams to build this technology. We began by experimenting with neural network based model architectures which had proven to be successful for sequence prediction, including the encoder-decoder model used in Smart Reply.
One challenge we faced was that response generation in online conversations have very strict latency requirements. To address this, Pavel Sountsov and Sushant came up with an innovative two-stage model that works as follows. First, a
recurrent neural network
looks at the conversation context one word at a time and encodes it in the hidden state of a
long short term memory
(LSTM). Below, we show an example with a context ‘Where are you?’. The context has three tokens, each of which is embedded into a continuous space and input to the LSTM. The LSTM state now encodes the context as a continuous vector. This vector is used to generate the response as a discretized semantic class.
Each semantic class is associated with a set of possible messages that belong to it. We use a second recurrent network to generate a specific message from that set. This network also converts the context into a hidden LSTM state but this time the hidden state is used to generate the full message of the reply one token at a time. For example, now the LSTM after seeing the context “Where are you?” generates the tokens in the response: “I’m at work”.
A
beam search
is used to efficiently select the top-N highest scoring responses from among the very large set of possible messages that a LSTM can generate. A snippet of the search space explored by such a beam-search technique is shown below.
As with any large-scale product, there were several engineering challenges we had to solve in generating a set of high-quality responses efficiently. For example, in spite of the two staged architecture, our first few networks were very slow and required about half a second to generate a response. This was obviously a deal breaker when we are talking about real time communication apps! So we had to evolve our neural network architecture further to reduce the latency to less than 200ms. We moved from using a softmax layer to a hierarchical softmax layer which traverses a tree of words instead of traversing a list of words thus making it more efficient.
Another interesting challenge we had to solve when generating predictions is controlling for message length. Sometimes none of the most probable responses are appropriate - if the model predicts too short a message, it might not be useful to the user, and if we predict something too long, it might not fit on the phone screen. We solved this by biasing the beam search to follow paths that lead to higher utility responses instead of favoring just the responses that are most probable. That way, we can efficiently generate appropriate length response predictions that are useful to our users.
Personalized for you
The best part about these suggestions is that over time they are personalized to you so that your individual style is reflected in your conversations. For example, if you often reply to “How are you?” with “Fine.” instead of “I am good.”, it will learn your preference and your future suggestions will take that into account. This was accomplished by incorporating a user's "style" as one of the features in a Neural Network that is used to predict the next word in a response, resulting in suggestions that are customized for your personality and individual preferences. The user's style is captured in a sequence of numbers that we call the user embedding. These embeddings can be generated as part of the regular model training, but this approach requires waiting for many days for training to be complete and it cannot handle more than a handful of millions of users. To solve this issue, Alon Shafrir implemented a
L-BFGS
based technique to generate user embeddings quickly and at scale. Now, you'll be able to enjoy personalized suggestions after only a short time of using Allo.
More than just English
The neural network model described above is language agnostic so building separate prediction models for each language works quite well. To make sure that responses for each language benefit from our semantic understanding of other languages, Sujith Ravi came up with a graph-based machine learning technique that can connect possible responses across languages. Dana Movshovitz-Attias and Peter Young applied this technique to build a graph that connects responses to incoming messages and to other responses that have similar word embeddings and syntactic relationships. It also connects responses with similar meaning across languages based on the
machine translation
models developed by our
Translate team
.
With this graph, we use
semi-supervised learning
, as described in this
paper
, to learn the semantic meaning of responses and determine which are the most useful clusters of possible responses. As a result, we can allow the LSTM to score many possible variants of each possible response meaning, allowing the personalization routines to select the best response for the user in the context of the conversation. This also helps enforce diversity as we can now pick the final set of responses from different semantic clusters.
Here’s an example of how the graph might look for a set of messages related to greetings:
Beyond Smart Reply
I am also very excited about the Google assistant in Allo with which you can converse and get information about anything that Google Search knows about. It understands your sentences and helps you accomplish tasks directly from the conversation. For example, the Google assistant can help you discover a restaurant and reserve a table from within the Allo app when chatting with your friends. This has been made possible because of the cutting-edge research in natural language understanding that we have been doing at Google. More details to follow soon!
These smart features will be part of the Android and iOS apps for Allo that will be available later this summer. We can’t wait for you to try and enjoy it!
We wish to acknowledge the hard work of the following in building Smart Reply:
Ryan Cassidy, Dave Citron, Ori Gershony, Max Gubin, Pranav Khaitan, Harini Krishnamurthy, Patrick McGregor, Dana Movshovitz-Attias, Sergey Nazarov, Hung Pham, Sushant Prakash, Vivek Ramavajjala, Sujith Ravi, Sunita Sarawagi, Alon Shafrir, Pavel Sountsov, Peter Young, Shu Zhang
Labels
accessibility
ACL
ACM
Acoustic Modeling
Adaptive Data Analysis
ads
adsense
adwords
Africa
Android
API
App Engine
App Inventor
April Fools
Art
Audio
Australia
Automatic Speech Recognition
Awards
Cantonese
China
Chrome
Cloud Computing
Collaboration
Computational Photography
Computer Science
Computer Vision
conference
conferences
Conservation
correlate
Course Builder
crowd-sourcing
CVPR
Data Center
data science
datasets
Deep Learning
DeepDream
DeepMind
distributed systems
Diversity
Earth Engine
economics
Education
Electronic Commerce and Algorithms
EMEA
EMNLP
Encryption
entities
Entity Salience
Environment
Exacycle
Faculty Institute
Faculty Summit
Flu Trends
Fusion Tables
gamification
Genomics
Gmail
Google Books
Google Brain
Google Cloud Platform
Google Drive
Google Genomics
Google Science Fair
Google Sheets
Google Translate
Google Voice Search
Google+
Government
grants
HCI
Health
High Dynamic Range Imaging
ICLR
ICML
ICSE
Image Annotation
Image Classification
Image Processing
Inbox
Information Retrieval
internationalization
Internet of Things
Interspeech
IPython
Journalism
jsm
jsm2011
K-12
KDD
Klingon
Korean
Labs
Linear Optimization
localization
Machine Hearing
Machine Intelligence
Machine Learning
Machine Translation
MapReduce
market algorithms
Market Research
ML
MOOC
NAACL
Natural Language Processing
Natural Language Understanding
Network Management
Networks
Neural Networks
Ngram
NIPS
NLP
open source
operating systems
Optical Character Recognition
optimization
osdi
osdi10
patents
ph.d. fellowship
PiLab
Policy
Professional Development
Proposals
Public Data Explorer
publication
Publications
Quantum Computing
renewable energy
Research
Research Awards
resource optimization
Robotics
schema.org
Search
search ads
Security and Privacy
SIGCOMM
SIGMOD
Site Reliability Engineering
Software
Speech
Speech Recognition
statistics
Structured Data
Systems
TensorFlow
Translate
trends
TTS
TV
UI
University Relations
UNIX
User Experience
video
Vision Research
Visiting Faculty
Visualization
VLDB
Voice Search
Wiki
wikipedia
WWW
YouTube
Archive
2016
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2013
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2012
Dec
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Jul
May
Apr
Mar
Feb
2007
Oct
Sep
Aug
Jul
Jun
Feb
2006
Dec
Nov
Sep
Aug
Jul
Jun
Apr
Mar
Feb
Feed
Follow @googleresearch
Give us feedback in our
Product Forums
.