A partial guide for data scientist jobs

The motive

I left my postdoc job in May 2016 and started preparing for data scientist jobs. Since the best way to learn is to teach, I started to organize learning materials. Unfortunately (fortunately?), I landed in a non-data-scientist job before all materials are finished. In this post I will share them anyways.

You can also read my transition to industry job if interested.

If you fail to plan, you are planning to fail. — Benjamin Fraklin (1706-1790)

The job

According to the O’Reilly’s 2015 survey, data scientist jobs pay a median base salary $104k in US (add 13k if you have a PhD). Of course there are other perks such as stocks and bonus.

It is also a relatively easy transition for STEM PhDs. Many of my former classmates and colleagues (math, physics, CS, biology PhDs) work as data scientists in industry. Two of them even wrote a book on data science interviews

The work responsibility of data scientist varies significantly from company to company. According to my friends in industry, there are two prototypes of data scientists

Business analyst: report-oriented
- statistics
- SQL
Machine learning software development engineer: product-oriented
- machine learning
- software development

A mixture of the two prototypes is also likely. The book above provides overview of the positions and guidance on interpreting the job advertisement.

The guiding principle

For the job hunter, the guiding principle is to add data science credentials. Specifically, you can

compete on kaggle and drivendata and get top ranking
write a blog to show-case your data science knowledge and projects
make web apps and mobile apps to show-case your software development skills
maintain a github account to show that you know git
get a data science degree

You can also take online courses and get certificates. But they may be weak in terms of credentials.

The ideal reader

When planning out the learning materials, I see the reader as someone with quantitative background, for example a STEM PhD. Specifically, you should be familiar with the following concepts.

linear algebra
- e.g., matrix inverse, eigen values and vectors, SVD
multi-variable calculus
- e.g., Jacobian matrix, inverse function theorem
probability and statistics
- e.g., Bayes rule, chain rule, hypothesis test
programming
- e.g., C, C++, Python, Matlab

The learning materials

overview

coding
- Python
  - basics
  - packages: pandas, sklearn, numpy, scipy
    - Python Machine Learning by Sebastian Raschka
    - Python for Data Analysis by Wes McKinney
- SQL
  - Stanford database course
- git and github
- interview problems
  - LeetCode and LeetCode solutions
  - codewars
statistics
- hypothesis test
- bias-variance trade-off
machine learning
portfolio projects
- kaggle and drivendata

notes

The best way to learn data science is to do projects. You can take a look at the public data sets and think of some projects to test out your newly-learned machine learning algorithms
The best way to learn pandas is also via project. Wes McKinney’s pandas book is nicely written but will be too dry if you only read. Here is a tutorial by Wes McKinney’s in 2012 as well

data science programs

There are many short-term programs (say 7 weeks) to help people break ice into the data scientist jobs, e.g., insight, data incubator. I have interviews with insight but didn’t get in. Here are the demos that didn’t get me into the insight data science program.
- youtube project
- NYC restaurant project
According to the insight program managers, the program enables the trainee to have an overview of the whole data science industry. I think it will be helpful to go.
dataQuest

I discovered this site in late July and wished I could have found it earlier. It is an online learning platform for data analyst, data scientist, and data engineer jobs. The premium subscription is $49/month. The nice thing is that you can video chat with the founder Vik Paruchuri for 10 min every week as office hour, and you can ask him anything.
git is a must-learn

Some after thoughts

Do let people know that you are looking for job
Do participate in kaggle competitions ASAP
Don’t waste time on deep learning (even though it’s fun)

Finally, there is the question of how much preparation is needed. I have been thinking about this question quite often along my way of preparation and finally figured it out: when you stop considering receiving an offer as getting lucky or receiving a favor, you are ready. (I haven’t reached to that point though.)