全ての 22 コメント

[–]RocketSurgeon85Scientific Computing 31 ポイント32 ポイント  (4子コメント)

While my official title is not "Data Scientist" (I'm a post doc at a US DOE national lab), about 75% of my day-to-day involves what I would consider data science using numpy,scipy,scikit-image, some pandas, matplotlib, etc...

I would suggest finding something you are interested in and doing some "data science" on it. My personal opinion (which is worth what you have paid for it) is that it is best to learn by doing, rather than just reading. The reading and courses will help, but that is only a tiny fraction of it. Things you may be able to do:

  • Analyze stock tick data
  • Find out some information about sports players and their statistics
  • Look at currency market data (there is a lot of historical data for bitcoin readily available for various exchanges)
  • Analyze ebook data (for common words, sentence length, ...)
  • Analyze twitter feeds/trends (similar stuff to ebooks, and you can throw in some info about geospatial location)
  • Look at price data of a product/s as a function of time on something like amazon or newegg (you can learn some simple url scraping with this too)
  • Learn something about your local region with weather data.

I'm sure there are more options that others can think of too.

Good Luck!

[–]rhiever 0 ポイント1 ポイント  (1子コメント)

This is exactly how I started. Then I started posting my results publicly. Sometimes people would give feedback, I'd go learn about what they suggested, fix the problems, and post it again. Then the cycle began anew.

Check out the problems on Kaggle if you can't think of anything at first. You don't even have to submit an entry -- just find something cool, play with it, and learn.


That said, just aimlessly playing around isn't going to make you an expert data scientist. Find some highly rated books on machine learning, data mining, statistics, etc. and work through them. Make up a cool project that applies what you learn in every chapter. You're not going to really learn something until you apply it.

[–]tHEbigtHEb 0 ポイント1 ポイント  (0子コメント)

Any books that you can recommend ?

[–]Trylks 0 ポイント1 ポイント  (0子コメント)

I think the hard part of that approach is choosing the tools. Depending on that decision you may find one day that you have to start nearly from scratch again to continue with something new. It's nice to have a starting point, but it's important to have a roadmap. IMHO.

[–]shirtandtieler 0 ポイント1 ポイント  (0子コメント)

Sorry for him hijacking your post but just wanting to add to your list.

I've found games to also be a great source for gathering data and doing data science. Plus it can be really helpful to both other users and devs.

[–]IOvOI_owl 8 ポイント9 ポイント  (0子コメント)

http://datasciencemasters.org/ seems to be a good collection of resources and books.

[–]jstutters 6 ポイント7 ポイント  (0子コメント)

You don't say much about what your background is. Do you already know some Python, stats and a bit about scientific method? If not, you might be better off doing more focussed courses on those first. If you've got the background then I would complete (including the exercises and projects) any of the courses you listed above and then grab something easy off Kaggle or come up with your own managably small project and see it through to completion.

TLDR- learn something, figure out what you still don't know, iterate

[–]fnord123 4 ポイント5 ポイント  (0子コメント)

For many people, the best way to learn is to do. So find a project and get to work. Enter a competition on kaggle. Or scrape reviews from a website and write a sentiment analyser. Pull down financial data off Yahoo and use it to determine a trading portfolio where you only change positions each week (or month). Write a spam filter.

[–]rishsriv 2 ポイント3 ポイント  (0子コメント)

Harvard's CS109 is one of the most comprehensive data science courses out there, IMO. It's rigorous, but you'll learn a lot from it if you follow it till the very end.

[–]StringyLow 2 ポイント3 ポイント  (0子コメント)

I found this a couple weeks ago:

How to become a data scientist

[–]shaggorama 1 ポイント2 ポイント  (0子コメント)

I mean, the absolute best course of action would be to get a masters in CS, math or statistics with a focus on applied methods and machine learning.

The coursera machine learning course is really good and is what got me started down the path. I took the innaugural course a few years ago, studied a bunch on my own, went to grad school, and now I'm working as a data scientist. It can be done.

[–]Notre1 1 ポイント2 ポイント  (0子コメント)

For some more ideas, check out the curriculum linked below and some of the comments on it from Hacker News. I don't have anything to do with it, but I just remembered it looked really well put together when I saw it a few months ago.

[–]toodim 0 ポイント1 ポイント  (0子コメント)

If you are into MOOCs Coursera's Machine learning with Andrew Ng and edX's Machine Learning course from Caltech (just ended...) go into greater depth. Udacity released an intro machine learning course last month that uses scikit-learn; it doesn't go into much mathematical depth but it covers a lot of different topics and uses python.

[–]Prinkster 0 ポイント1 ポイント  (0子コメント)

Depends on the kind of thing you want to do! If you're more interested in the statistics aspect, one way to start would be to get a copy of an old edition of a textbook (can usually be bought for under $20 used on Amazon, for example) for something like Econometrics (the Wooldridge book is highly recommended) and working through all of the computer examples. This would be a nice way to start if you're interested in modeling and statistics.

[–]pybokeh 0 ポイント1 ポイント  (0子コメント)

Without knowing your background, basics or foundations first:
Learn statistics, linear/matrix mathematics, and learn all the ancillary skills centered around data analysis life cycle:
- obtaining data
- cleaning or transforming data
- analyzing data
- visualizing data
- presenting data to draw conclusions or drive business decisions

The above was just a tools/skills agnostic point of view.

Now for tools and skills that are used to perform the above:
Obtaining data: often requires SQL and database knowledge
Cleaning or Transforming data via Python or R or Excel, etc
Analyzing data via Python or R or Excel, etc
Visualizing data via Python or R or Excel, etc

For practical uses, others have already provided good suggestions:
- create a simple database using sqlite, then progress to MySQL or Postgres
- web scrape data (maybe use scraped data to populate your database)
- if you're already into raspberry pi or arduino, use data collected from sensors and analyze and chart that too
- check out /r/datasets for data set ideas or /r/pystats
- check out kaggle competitions

Check out blogs or github accounts from prominent people or organizations in the data science fields:
- Rob Stoy's Python visualization stack
- www.gregreda.com
- Simple to follow exploratory data analysis example
- yhat

Hope this helps!

[–]iNeverHaveNames 0 ポイント1 ポイント  (0子コメント)

You may have already found this and its not really a structured course, but youtuber SentDex does a ton of hands-on videos involving several types of big data analysis.

[–]shashwat986 -2 ポイント-1 ポイント  (3子コメント)

Learn Machine Learning, not Data Science. Data Science is basically an application of ML.

https://www.coursera.org/course/ml

https://www.edx.org/course/learning-data-caltechx-cs1156x#.VIqibTGUcms

[–]laMarm0tte 21 ポイント22 ポイント  (0子コメント)

Not really a data scientist, but I believe Data Science is more than machine learning, it encompasses more techniques from statistics (descriptive statistics, testing...), data vizualization, big databases problems, and so on.

[–]nameBrandon 12 ポイント13 ポイント  (0子コメント)

IMO, you have that backwards.. ML is really just one application of Data Science.

How are you going to understand when to use a linear vs non-linear kernel in an SVM, or even troubleshoot error messages, without a basic understanding of Linear Algebra?

What if you've got results from multiple ML algorithms and want to compare them for correlation or statistical significance against other data? You should have some statistical training and familiarity with a stats package (R, SASS, SPSS, etc..)

What about visualizations and turning those results into a simple infographic for C level execs.. Experience with Tableau or other packages would be very helpful.

If you're asked to extract your own dataset based on existing data from a DB or warehouse, you need some basic knowledge with SQL and relational databases..

or if you're given a really messy dataset and need to clean it up, you want to have some knowledge around awk/sed/python, etc..

To me, all those skills are core skills for data science, and you wouldn't be very successful in a lot of ML tasks without them.

[–]elelias 1 ポイント2 ポイント  (0子コメント)

you have that backwards.