I left my postdoc job in May 2016 and started preparing for data scientist jobs. Since the best way to learn is to teach, I started to organize learning materials. Unfortunately (fortunately?), I landed in a non-data-scientist job before all materials are finished. In this post I will share them anyways.
You can also read my transition to industry job if interested.
If you fail to plan, you are planning to fail. — Benjamin Fraklin (1706-1790)
According to the O’Reilly’s 2015 survey, data scientist jobs pay a median base salary $104k in US (add 13k if you have a PhD). Of course there are other perks such as stocks and bonus.
It is also a relatively easy transition for STEM PhDs. Many of my former classmates and colleagues (math, physics, CS, biology PhDs) work as data scientists in industry. Two of them even wrote a book on data science interviews
The work responsibility of data scientist varies significantly from company to company. According to my friends in industry, there are two prototypes of data scientists
- Business analyst: report-oriented
- Machine learning software development engineer: product-oriented
- machine learning
- software development
A mixture of the two prototypes is also likely. The book above provides overview of the positions and guidance on interpreting the job advertisement.
The guiding principle
For the job hunter, the guiding principle is to add data science credentials. Specifically, you can
- compete on kaggle and drivendata and get top ranking
- write a blog to show-case your data science knowledge and projects
- make web apps and mobile apps to show-case your software development skills
- maintain a github account to show that you know git
- get a data science degree
You can also take online courses and get certificates. But they may be weak in terms of credentials.
The ideal reader
When planning out the learning materials, I see the reader as someone with quantitative background, for example a STEM PhD. Specifically, you should be familiar with the following concepts.
- linear algebra
- e.g., matrix inverse, eigen values and vectors, SVD
- multi-variable calculus
- e.g., Jacobian matrix, inverse function theorem
- probability and statistics
- e.g., Bayes rule, chain rule, hypothesis test
- e.g., C, C++, Python, Matlab
The learning materials
- hypothesis test
- bias-variance trade-off
- machine learning
- portfolio projects
- The best way to learn data science is to do projects. You can take a look at the public data sets and think of some projects to test out your newly-learned machine learning algorithms
- The best way to learn pandas is also via project. Wes McKinney’s pandas book is nicely written but will be too dry if you only read. Here is a tutorial by Wes McKinney’s in 2012 as well
data science programs
There are many short-term programs (say 7 weeks) to help people break ice into the data scientist jobs, e.g., insight, data incubator. I have interviews with insight but didn’t get in. Here are the demos that didn’t get me into the insight data science program.
According to the insight program managers, the program enables the trainee to have an overview of the whole data science industry. I think it will be helpful to go.
I discovered this site in late July and wished I could have found it earlier. It is an online learning platform for data analyst, data scientist, and data engineer jobs. The premium subscription is $49/month. The nice thing is that you can video chat with the founder Vik Paruchuri for 10 min every week as office hour, and you can ask him anything.
git is a must-learn
Some after thoughts
- Do let people know that you are looking for job
- Do participate in kaggle competitions ASAP
- Don’t waste time on deep learning (even though it’s fun)
Finally, there is the question of how much preparation is needed. I have been thinking about this question quite often along my way of preparation and finally figured it out: when you stop considering receiving an offer as getting lucky or receiving a favor, you are ready. (I haven’t reached to that point though.)