May 18, 2015

How Machine Learning Is Eating the Software World

Alex Woodie

Marc Andreessen famously said in 2011 that software was eating the world. Four years later, that trend has accelerated, only now it appears that machine learning technology is on the cusp of eating software, and that algorithms will take over the world, with a little help from their friends: the APIs.

Not that this is a bad thing, at least not as Elon Musk envisions, with AI-powered overlords enslaving the human race (a separate story for another day). But if you recognize the points that Andreessen made in his famous Wall Street Journal articleโ€”how outfits like Amazon, Netflix, Flickr, and Pandora that are essentially software companies eviscerated the โ€œbricks and mortarโ€ giants that had dominated the markets for books, movies, photos, and radio up to that pointโ€”then you have probably recognized that the trend has only intensified here in 2015.

In todayโ€™s big data world, the focus is all about building โ€œsmart applications.โ€ The intelligence in those apps, more often than not, doesnโ€™t come from adding programmatic responses to the codeโ€“it comes from allowing the software itself to recognize whatโ€™s happening in the real world, how itโ€™s different from what happened yesterday, and adjust its response accordingly.

Computers are learning to think, read, and write, says Bloomberg Beta investor Shivon Zilis. โ€œTheyโ€™re also picking up human sensory function, with the ability to see and hear (arguably to touch, taste, and smell, though those have been of a lesser focus),โ€ she writes on her blog. โ€œMachine intelligence technologies cut across a vast array of problem types (from classification and clustering to natural language processing and computer vision) and methods (from support vector machines to deep belief networks). All of these technologies are reflected on this landscape.โ€ (See below for Zilisโ€™ informative 2014 graphic of players in the ML space.)

Armed with every-increasing volumes of data and sophisticated machine learning modeling environments, weโ€™re able to discern patterns that were never detectable before. The next stepโ€”deploying those machine learning models into real-world applicationsโ€”can be tricky. The question, then, becomes how to deploy machine learning technology in the quickest, more efficient, and impactful way. Of course, that is easier said than done.

Checking the ML Box

One person who has puzzled over this challenge more than most is Sri Ambati, the co-founder and CEO of H2O.ai, which develops tools that data scientists use to build machine learning models. โ€œUp until now, most of the focus was this offline analysis,โ€ Ambati says. โ€œThe way data analytics was done historically was youโ€™d have a statistician or a mathematician sitting in the corner trying to see information from analysis.โ€

The prototypical data scientistโ€“one part math genius, one part Java developer, and one part business expertโ€”would be your typical go-to person for deploying a predictive app with a machine learning algorithm at its heart. Of course, thereโ€™s a huge shortage of data scientists, so naturally people (and venture capitalists like Andreessen and Zilis) are looking to software to solve the problem.

The Flow UI in H2O v3 aims to make building and deploying ML models easy.

The Flow UI in H2O v3 aims to make building and deploying ML models easy.

H2Oโ€™s Ambati hopes to make the power of machine learning accessible to standard programmers with the next iteration of the H2O product, which was launched today and is free under an open source license. With version 3, H2O is allowing Java, Python, and Scala developers to access the H2O machine learning technology directly from their integrated development environment (IDE). Whatโ€™s more, they can call the machine learning routines with a simple REST API.

โ€œWhat we see as the vision for the space is that these analyses happen on the fly as things are happening, and as this is happening, youโ€™re changing your models and building new business solutions,โ€ Ambati tells Datanami. โ€œThe future of software engineering is going to be transformed with data science and machine learning. Software is eating the world and machine learning is eating software.โ€

Ambati aims to make it easy for regular software developers to work with various machine learning algorithms that H2O includes in its library, including gradient boosting machine, deep learning, generalized linear model, K-Means, distributed random forests, and naรฏve Bayes. Once you have the data, itโ€™s relatively simple matter to release the machine learning algorithms against it, to find the patterns for you. Then those patterns can be encapsulated in software code and made available for embedding into the smarter apps by exposing them to multiple IDE environments via APIs.

โ€œIt lowers the bar of building machine learning-based code,โ€ Ambati says. โ€œBehind the scenes, H2O is now focused on making sure we build all sorts of good models on your data, and coming up with a single end point from a REST call or a JavaScript call that makes the best result, or a cumulative or ensemble result of the model [available], so modeling becomes more automatedโ€ฆTo really boil it down, a regular developer is able to take advantage of machine learning out of the box.โ€

ML for the Masses

H2O, of course, isnโ€™t the only big data software company chasing this goal. The big cloud players, Amazon, Microsoft Azure, and Google Cloud, have all launched cloud-based machine learning systems that allow developers to call machine learning tasks through an API. If you want to keep your apps on-prem or want more sophisticated capabilities than the cloud players can provide, itโ€™s recommended you look elsewhere.

Cask is an open source software outfit thatโ€™s seeking to build higher-order application templates for Apache Hadoop that take the sting out of tedious data science and app development work. As Cask CEO Jonathan Gray recently explained, the company is seeking to bundle sophisticated machine learning capabilities behind a simple API.

โ€œWe can hide the machine learning modeling task as a template, so a developerโ€ฆcan be writing their code against domain specific- API, against a higher-level API,โ€ Gray told Datanami in a recent interview. โ€œOur mission in life is to get a developer as far down the path of solving the problem as they can, so when they get to our platform, instead of being all the way on the other end zone, theyโ€™re on the five-yard line coming in, and you really just need to tweak this and that, write a little code here, some custom logic there, and youโ€™re done.โ€

Hadoop may be the standard bearer for the types of large-scale data analytics work that took place over the last decade. But increasingly, organizations are looking to Apache Spark to give them an edge. Databricks, the company behind the open source framework, is looking to put the power of big data analytics and machine learning modeling into the hands of as many developers as it can.

โ€œA very important part of Spark is the productivity the user gains,โ€ Databricks co-founder Reynold Xin told Datanami in a recent interview.  Whereas Java was the go-to language for first-gen Hadoop and MapReduce jobs, Spark opens that world up to Scala and Python. โ€œWe just want them to be using what theyโ€™re comfortable with.โ€

Machine learning is all around us, and its use will only accelerate as time goes on. Luckily, as the software gets better, as the distributed systems get faster, and as more data sets become available, you wonโ€™t need a data scientist to monetize itโ€”just the ability to code an application and call an API. The possibilities for application developers to take advantage of machine learning are tantalizing. Only time will tell what they make of it.

Related Items:

What Police Can Learn from Deep Learning

Deep Dive Into Databricksโ€™ Big Speedup Plans for Apache Spark

Five Reasons Machine Learning Is Moving to the Cloud

 

ML_Landscape_ShivonZilis

 

Share This
:)

Original text