AnalyticBridge

A Data Science Central Community

12 Statistical and Machine Learning Methods that Every Data Scientist Should Know

Below is my personal list of statistical and machine learning methods that every data scientist should know in 2016.

Statistical Hypothesis Testing (t-test, chi-squared test & ANOVA)
Multiple Regression (Linear Models)
General Linear Models (GLM: Logistic Regression, Poisson Regression)
Random Forest
Xgboost (eXtreme Gradient Boosted Trees)
Deep Learning
Bayesian Modeling with MCMC
word2vec
K-means Clustering
Graph Theory & Network Analysis

(A1) Latent Dirichlet Allocation & Topic Modeling
(A2) Factorization (SVD, NMF)

From my experience in the data science industry for 4 years, I think that currently these 12 methods are the most popular, useful and suitable for various problems requiring data science.

As far as I've known, there have been not a few lists of "representative methods in data science" ever. However, I feel some of them are already out-of-date because they appear to neglect the latest advance of data science in the industry. Thus I made this list as the one by business person, who knows practical matters and solutions with data science, including statistics and machine learning in the industry.

In addition to the list itself, I showed R or Python scripts of an experiment on sample datasets for each method, in order to enable readers to try it easily.

The original post is here, including R or Python scripts and experiments on sample datasets.

1 member likes this

Facebook

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Priya on March 8, 2017 at 4:23am: Please suggest a link for budding data scientists

RSS

Welcome to
AnalyticBridge

Sign Up
or Sign In

Or sign in with:

@Analyticbridge | RSS Feeds

On Data Science Central

On DataViz

On Hadoop

Badges | Report an Issue | Terms of Service

1	How to Detect if Numbers are Random or Not
2	Overpromising and Underperforming: Understanding and Evaluating Self-service BI Tools
3	Time Series Analysis using R-Forecast package
4	Interesting Math Brain Teaser

AnalyticBridge

12 Statistical and Machine Learning Methods that Every Data Scientist Should Know

You need to be a member of AnalyticBridge to add comments!

Follow Us

On Data Science Central

Top Content

How to Detect if Numbers are Random or Not

Overpromising and Underperforming: Understanding and Evaluating Self-service BI Tools

Time Series Analysis using R-Forecast package

Interesting Math Brain Teaser

On DataViz

On Hadoop