This site accompanies the latter half of the ART.T458: Machine Learning course at Tokyo Institute of Technology, which focuses on Deep Learning for Natural Language Processing (NLP). Course materials, demos, and implementations are available.

Lecture #1: Feedforward Neural Network (I)

Slides for lecture #1 — Keywords: binary classification, Threshold Logic Units (TLUs), Single-layer Perceptron (SLP), Perceptron algorithm, sigmoid function, Stochastic Gradient Descent (SGD), Multi-layer Perceptron (MLP), Backpropagation, Computation Graph, Automatic Differentiation, Universal Approximation Theorem. Interactive demos available for single-layer perceptron and multi-layer perceptron. Implementation available on github and Colaboratory.

Interactive single-layer perceptron — Keywords: binary classification, Threshold Logic Units (TLUs), Single-layer Perceptron (SLP), Perceptron algorithm, sigmoid function, Stochastic Gradient Descent (SGD), Multi-layer Perceptron (MLP), Backpropagation, Computation Graph, Automatic Differentiation, Universal Approximation Theorem. Interactive demos available for single-layer perceptron and multi-layer perceptron. Implementation available on github and Colaboratory.

Lecture #2: Feedforward Neural Network (II)

Slides for lecture #2 — Keywords: multi-class classification, linear multi-class classifier, softmax function, Stochastic Gradient Descent (SGD), mini-batch training, loss functions, activation functions, dropout. All the above images link to the same slides. Implementation available on github and Colaboratory.

Lecture #3: Word embeddings

Slides for lecture #3 — Keywords: word embeddings, distributed representation, distributional hypothesis, pointwise mutual information, singular value decomposition, word2vec, word analogy, GloVe, fastText. Demo with word vectors trained on Japanese Wikipedia and English newspapers available.

Word vectors pre-trained on English newspapers — Keywords: word embeddings, distributed representation, distributional hypothesis, pointwise mutual information, singular value decomposition, word2vec, word analogy, GloVe, fastText. Demo with word vectors trained on Japanese Wikipedia and English newspapers available.

Lecture #4: DNN for structural data

Slides for lecture #4 — Keywords: Recurrent Neural Networks (RNNs), Gradient vanishing and exploding, Long Short-Term Memory (LSTM), Gated Recurrent Units (GRUs), Recursive Neural Network, Tree-structured LSTM, Convolutional Neural Networks (CNNs). All the above images link to the same slides. Implementation available on Github and Colaboratory.

Lecture #5: Encoder-decoder models

Slides for lecture #5 — Keywords: language modeling, Recurrent Neural Network Language Model (RNNLM), encoder-decoder models, sequence-to-sequence models, attention mechanism, reading comprehension, question answering, headline generation, multi-task learning, character-based RNN, byte-pair encoding, Convolutional Sequence to Sequence (ConvS2S), Transformer, coverage. All the above images link to the same slides.

Introduction to Deep Learning

Lecture #1: Feedforward Neural Network (I)

Lecture #2: Feedforward Neural Network (II)

Lecture #3: Word embeddings

Lecture #4: DNN for structural data

Lecture #5: Encoder-decoder models