12 Statistical and Machine Learning Methods that Every Data Scientist Should Know

Below is my personal list of statistical and machine learning methods that every data scientist should know in 2016.

  1. Statistical Hypothesis Testing (t-test, chi-squared test & ANOVA)
  2. Multiple Regression (Linear Models)
  3. General Linear Models (GLM: Logistic Regression, Poisson Regression)
  4. Random Forest
  5. Xgboost (eXtreme Gradient Boosted Trees)
  6. Deep Learning
  7. Bayesian Modeling with MCMC
  8. word2vec
  9. K-means Clustering
  10. Graph Theory & Network Analysis
  • (A1) Latent Dirichlet Allocation & Topic Modeling
  • (A2) Factorization (SVD, NMF)

From my experience in the data science industry for 4 years, I think that currently these 12 methods are the most popular, useful and suitable for various problems requiring data science.

As far as I've known, there have been not a few lists of "representative methods in data science" ever. However, I feel some of them are already out-of-date because they appear to neglect the latest advance of data science in the industry. Thus I made this list as the one by business person, who knows practical matters and solutions with data science, including statistics and machine learning in the industry.

In addition to the list itself, I showed R or Python scripts of an experiment on sample datasets for each method, in order to enable readers to try it easily.

The original post is here, including R or Python scripts and experiments on sample datasets.

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Comment by Priya on March 8, 2017 at 4:23am
Please suggest a link for budding data scientists

Follow Us

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service