LibGUNDAM : A library for General UNsupervised DAta Mining

Ming-Hen Tsai

Introduction

There are many tedious data processing techinques needed to be conducted before training in many machine learning problems.
The project written in C aims to include some of state-of-the-art feature extraction methods to ease the pain to analysis and generate useful data.
A discussion forum of the software can be found HERE.

News

Using lib-gundam to mine the features, we placed the third place in Kaggle R Package Recommendation Engine Competition (Feb, 2011)

Function

The current version (1.03) includes the following functions :
1. do polynomial expansion on some possibly correlated sets of features
2. add calibrating features
3. Automatic finding categorial features
4. add arbitrary features by doing common arithmatics in a given dimension
5. format converter between categorical data and numerical data, and between csv and svm-light(libsvm) format
6. demo code for running the kaggle R competition

In the next version 1.04, we may include the following functions :
1. PCA on possibly related dimensions. 2. Adding indicators for missing values

Software

Download a zip or tgz.
Or download a windows installer
If the installer does not work, please download a zip file containing libgundam HERE
All versions are included in SourceForge

User Manual

See a working version of user manual HERE. This is not a completed version, I will make it complete as soon as possible.

Reference

1. Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen, Tsung-Hsien Chiang, Chun-Sung Ferng, Cho-Jui Hsieh, Yi-Kuang Ko, Tsung-Ting Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang, Hsiang-Fu Yu, Chih-Jen Lin, Hsuan-Tien Lin and Shou-de Lin. An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes. In G. Dror et al., eds., Proceedings of KDD-Cup 2009 competition, vol. 7 of JMLR Workshop and Conference Proceedings, 57-64, 2009.
2. Yin-Wen Chang, Cho-Jui Hsieh, Kai-Wei Chang, Michael Ringgaard, and Chih-Jen Lin. Training and Testing Low-degree Polynomial Data Mappings via Linear SVM. Journal of Machine Learning Research, 11(2010), 1471-1490.
3. Johan Suykens and Carlos Alzate. Support vector machines and kernel methods: new approaches in unsupervised learning. Tutorial at IEEE World Congress on Computational Intelligence WCCI 2010, Barcelona Spain.

Comments

I will be glad to know any good methods that I haven't included in the software, so I can implement them and include them in it. Also, I'd like to know how can I improve the software. Thus, please let me know by sending e-mail to scan33scan33 AT gmail.com if you have any comments.

Last modified: June,22 2011 16:07:44 (UTC+8)