DynaML A Scala ML library

Build Status

Aim

DynaML is a scala library/repl for implementing and working with general Machine Learning models. Machine Learning/AI applications make heavy use of various entities such as graphs, vectors, matrices etc as well as classes of mathematical models which deal with broadly three kinds of tasks, prediction, classification and clustering.

The aim is to build a robust set of abstract classes and interfaces, which can be extended easily to implement advanced models for small and large scale applications. But the library can also be used as an educational/research tool for multi scale data analysis.

Currently DynaML has implementations of Least Squares Support Vector Machine (LS-SVM) for binary classification and regression. For further background consider Wikipedia or the book.

A good general introduction to Probabilistic Models for Machine Learning can be found here in David Barber’s text book. The LS-SVM is equivalent to the class of models discussed in Chapter 18 (Bayesian Linear Models) of the book.

Installation

Prerequisites: Maven

mvn clean compile
mvn package
chmod +x target/bin/DynaML
target/bin/DynaML

You should get the following prompt.

___       ___       ___       ___       ___       ___   
   /\  \     /\__\     /\__\     /\  \     /\__\     /\__\  
  /::\  \   |::L__L   /:| _|_   /::\  \   /::L_L_   /:/  /  
 /:/\:\__\  |:::\__\ /::|/\__\ /::\:\__\ /:/L:\__\ /:/__/   
 \:\/:/  /  /:;;/__/ \/|::/  / \/\::/  / \/_/:/  / \:\  \   
  \::/  /   \/__/      |:/  /    /:/  /    /:/  /   \:\__\  
   \/__/               \/__/     \/__/     \/__/     \/__/  

Welcome to DynaML v 1.2
Interactive Scala shell

DynaML>

Getting Started

The data/ directory contains a few sample data sets, and the root directory also has example scripts which can be executed in the shell.

val config = Map("file" -> "data/ripley.csv", "delim" -> ",", "head" -> "false", "task" -> "classification")
val model = LSSVMModel(config)
val rbf = new RBFKernel(1.025)
model.applyKernel(rbf)
15/08/03 19:07:42 INFO GreedyEntropySelector$: Returning final prototype set
15/08/03 19:07:42 INFO SVMKernel$: Constructing key-value representation of kernel matrix.
15/08/03 19:07:42 INFO SVMKernel$: Dimension: 13 x 13
15/08/03 19:07:42 INFO SVMKernelMatrix: Eigenvalue decomposition of the kernel matrix using JBlas.
15/08/03 19:07:42 INFO SVMKernelMatrix: Eigenvalue stats: 0.09104374173019622 =< lambda =< 3.110068839504519
15/08/03 19:07:42 INFO LSSVMModel: Applying Feature map to data set
15/08/03 19:07:42 INFO LSSVMModel: DONE: Applying Feature map to data set
DynaML>
model.setRegParam(1.5).learn
val configtest = Map("file" -> "data/ripleytest.csv", "delim" -> ",", "head" -> "false")
val met = model.evaluate(configtest)
met.print
15/08/03 19:08:40 INFO BinaryClassificationMetrics: Classification Model Performance
15/08/03 19:08:40 INFO BinaryClassificationMetrics: ============================
15/08/03 19:08:40 INFO BinaryClassificationMetrics: Accuracy: 0.6172839506172839
15/08/03 19:08:40 INFO BinaryClassificationMetrics: Area under ROC: 0.2019607843137254
met.generatePlots

Plots

import com.tinkerpop.blueprints.Graph
import com.tinkerpop.frames.FramedGraph
import io.github.mandar2812.dynaml.graphutils.CausalEdge
val (optModel, optConfig) = KernelizedModel.getOptimizedModel[FramedGraph[Graph],
Iterable[CausalEdge], model.type](model, "csa",
"RBF", 13, 7, 0.3, true)

We see a long list of logs which end in something like the snippet below, the Coupled Simulated Annealing model, gives us a set of hyper-parameters and their values.

optModel: io.github.mandar2812.dynaml.models.svm.LSSVMModel = io.github.mandar2812.dynaml.models.svm.LSSVMModel@3662a98a
optConfig: scala.collection.immutable.Map[String,Double] = Map(bandwidth -> 3.824956165264642, RegParam -> 12.303758608075587)

To inspect the performance of this kernel model on an independent test set, we can use the model.evaluate() function. But before that we must train this ‘optimized’ kernel model on the training set.

optModel.setMaxIterations(2).learn()
val met = optModel.evaluate(configtest)
met.print()
met.generatePlots()

And the evaluation results follow …

15/08/03 19:10:13 INFO BinaryClassificationMetrics: Classification Model Performance
15/08/03 19:10:13 INFO BinaryClassificationMetrics: ============================
15/08/03 19:10:13 INFO BinaryClassificationMetrics: Accuracy: 0.8765432098765432
15/08/03 19:10:13 INFO BinaryClassificationMetrics: Area under ROC: 0.9143790849673203

Documentation

You can refer to the project documentation for getting started with DynaML. Bear in mind that this is still at its infancy and there will be many more improvements/tweaks in the future.