(cache)Duolingo AI Research

NAACL 2018 • Duolingo Shared Task

Second Language Acquisition Modeling

We present the task of second language acquisition (SLA) modeling. Given a history of errors made by learners of a second language, the task is to predict errors that they are likely to make at arbitrary points in the future. We describe a large corpus of more than 7M words produced by more than 6k learners of English, Spanish, and French using Duolingo, a popular online language-learning app. Then we report on the results of a shared task challenge aimed studying the SLA task via this corpus, which attracted 15 teams and synthesized work from various fields including cognitive science, linguistics, and machine learning. ... Read more

B. Settles, C. Brust, E. Gustafson, M. Hagiwara and N. Madnani

ACL 2016

A Trainable Spaced Repetition Model for Language Learning

We present half-life regression (HLR), a novel model for spaced repetition practice with applications to second language acquisition. HLR combines psycholinguistic theory with modern machine learning techniques, indirectly estimating the "half-life" of a word or concept in a student’s long-term memory. We use data from Duolingo — a popular online language learning application — to fit HLR models, reducing error by 45%+ compared to several baselines at predicting student recall rates. HLR model weights also shed light on which linguistic concepts are systematically challenging for second language learners. Finally, HLR was able to improve Duolingo daily student engagement by 12% in an operational user study. ... Read more

B. Settles and B. Meeder

Cognitive Science 2016

Self-directed Learning Favors Local, Rather Than Global, Uncertainty

Collecting (or "sampling") information that one expects to be useful is a powerful way to facilitate learning. However, relatively little is known about how people decide which information is worth sampling over the course of learning. We describe several alternative models of how people might decide to collect a piece of information inspired by "active learning" research in machine learning. We additionally provide a theoretical analysis demonstrating the situations under which these models are empirically distinguishable, and we report a novel empirical study that exploits these insights. Our model-based analysis of participants’ information gathering decisions reveals that people prefer to select items which resolve uncertainty between two possibilities at a time rather than items that have high uncertainty across all relevant possibilities simultaneously. Rather than adhering to strictly normative or confirmatory conceptions of information search, people appear to prefer a "local" sampling strategy, which may reflect cognitive constraints on the process of information gathering. ... Read more

D.B. Markant, B. Settles, and T.M. Gureckis

EDM 2015 • Best Paper Award

Mixture Modeling of Individual Learning Curves

We show that student learning can be accurately modeled using a mixture of learning curves, each of which specifies error probability as a function of time. This approach generalizes Knowledge Tracing, which can be viewed as a mixture model in which the learning curves are step functions. We show that this generality yields order-of-magnitude improvements in prediction accuracy on real data. Furthermore, examination of the learning curves provides actionable insights into how different segments of the student population are learning. To make our mixture model more expressive, we allow the learning curves to be defined by generalized linear models with arbitrary features. This approach generalizes Additive Factor Models and Performance Factors Analysis, and outperforms them on a large, real world dataset. ... Read more

M. Streeter

Duolingo ai

Research

Publications

Second Language Acquisition Modeling

A Trainable Spaced Repetition Model for Language Learning

Self-directed Learning Favors Local, Rather Than Global, Uncertainty

Mixture Modeling of Individual Learning Curves

Data Sets

SLAM Shared Task Data

Spaced Repetition Data

Our Team

Ready to work with us?

Senior Research Scientist

Machine Learning Engineer (NLP)

ML Research Intern (NLP)