自分用のメモです。

選考基準は思い出した順。

次元削減とは

赤穂先生の朱鷺の杜Wikiより。

次元削減 - 機械学習の「朱鷺の杜Wiki」

データの次元削減(Dimensionality reduction) + データの可視化(Data Visualization)

次元を二次元、三次元に落とし込んで可視化するパターンが多いので一緒に。このメモを書いてる途中に株式会社アルバートさんの記事がありました↓。

blog.albert2005.co.jp

PCA

Principal Component Analysis(PCA)

qiita.com

主成分分析、一番よく使います。行列の固有値計算を行う教師なしモデルの一つ。しかし線形部分空間への写像しか行えません。

randomized PCA

Warmuth, Manfred K., and Dima Kuzmin. “Randomized PCA algorithms with regret bounds that are logarithmic in the dimension.” Advances in neural information processing systems. 2006.

What is randomized PCA? - Quora

Online Robust Principal Component Analysis(OR-PCA)

videolectures.net

papers.nips.cc

多様体学習

Manifold learning with application to object recognition from zukun

www.slideshare.net

Methods of Manifold Learning for Dimension Reduction of Large Data Sets from Ryan B Harvey, CSDP, CSM

www.slideshare.net

これらのスライドに簡単に説明があります。

t-Distributed Stochastic Neighbor Embedding(t-SNE)

pdf: http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf

lvdmaaten.github.io

Maaten, Laurens van der, and Geoffrey Hinton. “Visualizing data using t-SNE.” Journal of Machine Learning Research 9.Nov (2008): 2579-2605.

blog.albert2005.co.jp

Visualizing Data Using t-SNE from Tomoki Hayashi

www.slideshare.net

Multidimensional Scaling(MDS)

Wiki: 多次元尺度構成法 - Wikipedia

https://documents.software.dell.com/statistics/textbook/multidimensional-scalingdocuments.software.dell.com

多次元尺度構成法です。”類似した”データを近くに、”類似していない”データ同士をより離れるように配置しなおします。

Isomap

The Isomap Algorithm and Topological Stability | Science

Balasubramanian, Mukund, and Eric L. Schwartz. “The isomap algorithm and topological stability.” Science 295.5552 (2002): 7-7.

多様体学習の一種でMultidimensional Scalingの拡張版と捉えることができます。

Locally Linear Embedding (LLE)

pdf: https://www.cs.nyu.edu/~roweis/lle/papers/lleintro.pdf

Saul, Lawrence K., and Sam T. Roweis. “An introduction to locally linear embedding.” unpublished. Available at: http://www. cs. toronto. edu/~ roweis/lle/publications. html (2000).

Laplacian Eigenmaps(LE)

Semidefinite Embedding (SDE)

解説スライド: http://www.cs.columbia.edu/~jebara/4772/proj/oldprojects/bs2018-adv-ml-pres.pdf

Wiki: Semidefinite embedding - Wikipedia

Latent Dirichlet Allocation(LDA)

pdf: http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. “Latent dirichlet allocation.” Journal of machine Learning research 3.Jan (2003): 993-1022.

LDA入門 from 正志坪坂

www.slideshare.net

いわゆるトピックモデルです。文章中の各単語に対してランダムなトピックを設定、逐次的にそのトピックの割り当てを更新していきます。

Labeled LDA

pdf: Labeled LDA

Ramage, Daniel, et al. “Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora.” Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 2009.

教師ありのLDAです。

Partially Labeled Topic Models for Interpretable Text Mining

pdf: http://131.107.65.14/en-us/um/people/sdumais/KDD2011-pldp-final.pdf

Ramage, Daniel, Christopher D. Manning, and Susan Dumais. “Partially labeled topic models for interpretable text mining.” Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2011.

これだけを扱ったスライドが見つかりませんでした。以下トピックモデル全般のスライドです。

Topic Models from Claudia Wagner

www.slideshare.net

[Kim+ ICML2012] Dirichlet Process with Mixed Random Measures : A Nonparametric Topic Model for Labeled Data from Shuyo Nakatani

www.slideshare.net

Diffusion Map

グラフ理論ベースの次元削減手法。局所的なデータ間の関係性を保存しつつより低次元な空間での表現を作る？

Diffusion maps is a dimensionality reduction or feature extraction algorithm introduced by R. R. Coifman and S. Lafon. It computes a family of embeddings of a data set into Euclidean space (often low-dimensional) whose coordinates can be computed from the eigenvectors and eigenvalues of a diffusion operator on the data.

引用元：Diffusion map - Wikipedia

http://anomaly.hatenablog.com/entry/2015/03/29/211528:embed:

Kernel PCA

線形の分離しかできなかったPCAに、高次元での内積計算を行うカーネルを組み合わせて非線形の表現を可能にします。

関東CV勉強会 Kernel PCA (2011.2.19) from Akisato Kimura

www.slideshare.net

Autoencoder

www.beam2d.net

株式会社Preferred Infrastructureの得居さんの記事。

Autoencoder

AutoEncoderで特徴抽出 from Kai Sasaki

www.slideshare.net

PFIの方の説明、やっぱり応用ばかりじゃなく数式もある程度は理解しておきたい、いつか。

Denoising Autoencoder

pdf: Pascal Vincent, Hugo Larochelle, Yoshua Bengio and Pierre-Antoine Manzagol. Extracting and Composing Robust Features with Denoising Autoencoders. Proc. of ICML, 2008.

WSDM2016読み会 Collaborative Denoising Auto-Encoders for Top-N Recommender Systems from Kotaro Tanahashi

www.slideshare.net

は Denoising Autoencoder を使う例。

Stacked Denoising AutoEncoders (SDAE)

上と　[1409.2944] Collaborative Deep Learning for Recommender Systems　は推薦システムを作る過程でDenoising Autoencoderを利用します。Stackedというのは「Stacked LSTM」とかと同じ「多次元の」ぐらいのもの。ノイズの乗った行列から元の行列を復元して、中間層が次元削減された行列と捉えられます。

載せていないもの

MVU, LLE, Laplacian Eigenmaps, LTSA, Sammon mapping, LLC など？勉強不足。

その他後ほど追記したもの

SOM(Self organizing map/Kohonen map)

T. Kohonen. Self-organization and associative memory: 3rd edition. Springer-Verlag New York, Inc., New York, NY, USA, 1989.

Self-organizing map from Tarat Diloksawatdikul

GTM

SOMの拡張版、という認識。資料ないと思ったらあった。

Bishop, Christopher M., Markus Svensén, and Christopher KI Williams. “GTM: The generative topographic mapping.” Neural computation 10.1 (1998): 215-234.

Dimension Reduction And Visualization Of Large High Dimensional Data Via Interpolation from wl820609

Independent Component Analysis(ICA/独立成分分析)

Hyvärinen, Aapo, and Erkki Oja. “Independent component analysis: algorithms and applications.” Neural networks 13.4 (2000): 411-430.

Independent Component Analysis: A Tutorial

Gaussian Process Latent Variable Models(GPLVM)

The Gaussian Process Latent Variable Model (GPLVM) from James McMurray

Generalized Discriminant Analysis

Baudat, Gaston, and Fatiha Anouar. “Generalized discriminant analysis using a kernel approach.” Neural computation 12.10 (2000): 2385-2404.

統計的学習の基礎 ―データマイニング・推論・予測―

作者: Trevor Hastie,Robert Tibshirani,Jerome Friedman,杉山将,井手剛,神嶌敏弘,栗田多喜夫,前田英作,井尻善久,岩田具治,金森敬文,兼村厚範,烏山昌幸,河原吉伸,木村昭悟,小西嘉典,酒井智弥,鈴木大慈,竹内一郎,玉木徹,出口大輔,冨岡亮太,波部斉,前田新一,持橋大地,山田誠
出版社/メーカー: 共立出版
発売日: 2014/06/25
メディア: 単行本
この商品を含むブログ (5件) を見る

めも

これはメモ。

データの次元削減に関する資料集

次元削減とは

データの次元削減(Dimensionality reduction) + データの可視化(Data Visualization)

PCA

Principal Component Analysis(PCA)

randomized PCA

Online Robust Principal Component Analysis(OR-PCA)

多様体学習

t-Distributed Stochastic Neighbor Embedding(t-SNE)

Multidimensional Scaling(MDS)

Isomap

Locally Linear Embedding (LLE)

Laplacian Eigenmaps(LE)

Semidefinite Embedding (SDE)

Latent Dirichlet Allocation(LDA)

Labeled LDA

Partially Labeled Topic Models for Interpretable Text Mining

Diffusion Map

Kernel PCA

Autoencoder

Autoencoder

Denoising Autoencoder

Stacked Denoising AutoEncoders (SDAE)

載せていないもの

その他後ほど追記したもの

SOM(Self organizing map/Kohonen map)

GTM

Independent Component Analysis(ICA/独立成分分析)

Gaussian Process Latent Variable Models(GPLVM)

Generalized Discriminant Analysis

関連記事