
The World’s #1 Data Scientist Talks about Data Science Skills and Tools
[youtube https://www.youtube.com/watch?v=dpzxW6buh9Y]
Owen Zhang is ranked #1 on Kaggle, the online stadium for data science competitions. An engineer by training, Zhang says that data science is finding “practical solutions to not very well-defined problems,” similar to engineering. He believes that good data scientists, “otherwise known as unicorn data scientists,” have three types of expertise. Since data science deals with practical problems, the first one is being familiar with a specific domain and knowing how to solve a problem in that domain. The second is the ability to distinguish signal from noise, or understanding statistics. The third skill is software engineering.
[youtube https://www.youtube.com/watch?v=7YnVZrabTA8]
Zhang, Chief Product Officer at DataRobot, shares in this talk his experience with open source tools in data science competitions. Slides here.
9 Categories of Data Scientists

- Those strong in statistics: they sometimes develop new statistical theories for big data, that even traditional statisticians are not aware of. They are expert in statistical modeling, experimental design, sampling, clustering, data reduction, confidence intervals, testing, modeling, predictive modeling and other related techniques.
- Those strong in mathematics: NSA (national security agency) or defense/military people working on big data, astronomers, and operations research people doing analytic business optimization (inventory management and forecasting, pricing optimization, supply chain, quality control, yield optimization) as they collect, analyse and extract value out of data.
- Those strong in data engineering, Hadoop, database/memory/file systems optimization and architecture, API’s, Analytics as a Service, optimization of data flows, data plumbing.
- Those strong in machine learning / computer science (algorithms, computational complexity)
- Those strong in business, ROI optimization, decision sciences, involved in some of the tasks traditionally performed by business analysts in bigger companies (dashboards design, metric mix selection and metric definitions, ROI optimization, high-level database design)
- Those strong in production code development, software engineering (they know a few programming languages)
- Those strong in visualization
- Those strong in GIS, spatial data, data modeled by graphs, graph databases
- Those strong in a few of the above. After 20 years of experience across many industries, big and small companies (and lots of training), I’m strong both in stats, machine learning, business, mathematics and more than just familiar with visualization and data engineering. This could happen to you as well over time, as you build experience. I mention this because so many people still think that it is not possible to develop a strong knowledge base across multiple domains that are traditionally perceived as separated (the silo mentality). Indeed, that’s the very reason why data science was created.
Advancing Your AI Career

“AI Career Pathways” is designed to guide aspiring AI engineers in finding jobs and building a career. The table above shows Workera’s key findings about AI roles and the tasks they perform. You’ll find more insights like this in the free PDF.
From the report:
People in charge of data engineering need strong coding and software
engineering skills, ideally combined with machine learning skills to help them
make good design decisions related to data. Most of the time, data engineering is done using database query languages such as SQL and object-oriented programming languages such as Python, C++, and Java. Big data tools such as Hadoop and Hive are also commonly used.
Modeling is usually programmed in Python, R, Matlab, C++, Java, or another language. It requires strong foundations in mathematics, data science, and machine learning. Deep learning skills are required by some organizations, especially those focusing on computer vision, natural language processing, or speech recognition.
People working in deployment need to write production code, possess strong back-end engineering skills (in Python, Java, C++, and the like), and understand cloud technologies (for example AWS, GCP, and Azure).
Team members working on business analysis need an understanding of
mathematics and data science for analytics, as well as strong communication skills and business acumen. They sometimes use programming languages suchas R, Python, and Tableau, although many tasks can be carried out in a spreadsheet, PowerPoint or Keynote, or an A/B testing software.
Working on AI infrastructure requires broad software engineering skills to write production code and understand cloud technologies.
Salaries of Data Scientists
Top Skills and Backgrounds of Data Scientists on LinkedIn
A new study of LinkedIn profiles by RJMetrics has found that the number of data scientists has doubled over the last 4 years . This reflects the increasing demand for sophisticated data analysis skills, combining computer programming with statistics, and the growth in the popularity of the term “data science” both in job openings and the words people use to describe their work on LinkedIn. At least 52% of all current 11,400 data scientists on LinkedIn have added that title to their profiles within the past 4 years.

In the chart above, the cumulative number of data scientists in any given year corresponds to the number of present-day data scientists who started their first job that year. We can safely assume that those who started their first jobs between 1995 and 2009 were not called then “data scientists,” but the data shows the cumulative growth in the number of professionals who have this title today.
Here are the other highlights of the study:
The high-tech industry (LinkedIn classification: Information Technology and Services industry, Internet and Computer Software industries) employs 44.9% of the professionals identified on LinkedIn as data scientists, followed by education (8.3%, probably employed mostly by universities), Banking and Financial Services (7.2%), and Marketing and Advertising (5.2%).
The top ten companies employing data scientists are Microsoft, Facebook, IBM, GlaxoSmithKline, Booz Allen Hamilton, Nielsen, GE, Apple, LinkedIn, and Teradata. Note that Google is not at the top ten, possibly because the data science Googlers on LinkedIn adhere to the title Google bestows on them: quantitative analyst.

Both Microsoft and Facebook, according to RJMetrics’ analysis, appear to be on a hiring spree, accelerating their data scientist recruiting during the 2014 calendar year by at least 151% and 39%, respectively, when compared to 2013. But given the scarcity of experienced data scientists, it’s a revolving door, with Microsoft also losing the largest number of data scientists over that period.
So how do you become one of these unicorn data scientists, commanding annual salaries of $200,000 plus? The study provides fresh data on the skills and background of data scientists.
RJMetrics analyzed 254,000 skill records of the data scientists on LinkedIn and ranked each skill by the number of people listing it on their profile. In addition to the catch-all categories of “data analysis,” “data mining,” and “analytics,” the top skills are R, Python, machine learning, statistics, SQL, MATLAB, Java, statistical modeling, and C++. Hadoop (20.9%) is at the bottom of the top 20, as a specific skill, behind SAS (22.78%).

An analysis of skills by job levels revealed that chief data scientists appear to be less technical on average: Only 27% and 26% listed Python and R, respectively, compared to 52% and 53% of junior data scientists, along with 38% and 43% of senior practitioners. Those at higher level jobs may not need to emphasize their technical skills or may not need them in positions where management experience and knowledge of a business domain are valued more than technical proficiency.
Over 79% of data scientists listing their education have earned a graduate degree, with 38% of all data scientists who had an education record earning a PhD, and close to 42% listing a Master’s degree as the highest degree attained.
Computer Science is the dominant field of study among data scientists, followed by business administration/management, statistics, mathematics, and physics. Only 4.6% of data scientists list “machine learning/data science” as their graduate degree, a number that will probably increase in coming years due to the proliferation of new Master in Data Science programs, supplanting the older Master in Analytics programs.

Note that RJMetrics included in their sample only data scientists associated with specific companies, assuming that those listing “data scientist” in their profile without an association with an actual company may only have aspirations about a career in data science, but not actual experience. They analyzed 60,200 records of professional experiences, 27,700 records of education, and 254,600 records of skills, and information about 6,200 unique companies that employed self-identified data scientists as of June 1, 2015.
For other recent studies of the skills and salaries of data scientists see here and here.
2 New Surveys About the Market for Data Scientists
Two new surveys tell us a lot about both the supply and demand sides of the hot market for data scientists, “the sexiest job of the 21st Century.”
On the demand side—the challenges of recruiting, training, and integrating data scientists—we have the MIT Sloan Management Review and SAS fifth annual survey of 2,719 business executives, managers and analytics professionals worldwide. On the supply side—the talent available and what salaries it commands—we have the second annual Burtch Works Study, surveying 371 data scientists in the U.S. (see also the video presentation at the end of this post).
The median salary of a junior level data scientist is $91,000, but those managing a team of ten or more data scientists earn base salaries of well over $250,000, according to Burtch Works. Supply is still tight and top managers enjoyed over the last year an eight percent increase in base salary and median bonuses over $56,000. When changing jobs, data scientists see a 16 percent increase in their median base salary.
Who are these data scientists that are so much in demand? The vast majority have at least a master’s degree and probably a Ph.D., and one in three are foreign-born. But with a younger generation of data scientists, freshly minted from more than 100 graduate programs worldwide, the median years of experience dropped from 9 in 2014 to 6 in 2015.
As data science is increasingly adopted by all companies in all industries, the proportion of data scientists employed by startups—the firms that have dominated the application of big data analytics— declined from 29 percent in 2014 to 14 percent in 2015.
It is the mainstreaming of data science and the specific challenges of acquiring and benefiting from this still-scarce talent pool that is the focus of the MIT Sloan Management Review survey. Four in ten (43%) companies report their lack of appropriate analytical skills as a key challenge but only one in five organizations has changed its approach to attracting and retaining analytics talent.
As a result of the scarcity of data scientists, 63 percent of the companies surveyed are providing formal or on-the-job training in-house. “One big plus of developing analytics skills among current employees,” says the report, “is that they already know the business.” These companies are also doing more to train existing managers to become more analytical (49%) and train their new data scientists to better understand their business (34%). Still, half of the survey respondents cited turning analytical insights into business actions as one of their top analytics challenges.
To better manage these challenges, the study recommends giving preference to people with analytical skills when hiring and promoting, developing analytical skills through formal in-house training, and integrating new talent with more traditional data workers.
“Infusing new analytics talent without proper support and guidance can alienate traditional data workers and undermine everyone’s contributions,” says the report. Yet only 27% of companies report that they successfully integrate new analytics talent with more traditional data workers. So even after managing to find (and pay for) the data science talent, there is no guarantee for the desired results, either because of the lack of understanding of the business by the new recruits, resistance from current employees engaged in data preparation and analysis, or failure to translate new insights into meaningful action.
Many companies have responded to these challenges by creating new roles and responsibilities and devising new organizational structures. The report points out that the range of analytics skills, roles and titles within organizations has broadened in recent years. What’s more, new executive roles, such as chief data officers, chief analytics officers and chief medical information officers, have emerged to ensure that analytical insights can be applied to strategic business issues.
Whether the work is centralized or decentralized, data science and analytics should be perceived and managed by companies as a professional function with its own clear career path and well-defined roles. Tom Davenport asked in a recent essay: “When was the last time you saw a job posting for a ‘light quant’ or an ‘analytical translator’? But almost every organization would be more successful with analytics and big data if it employed some of these folks.”
Davenport defines a “light quant” as someone who knows something about analytical and data management methods, and a lot about specific business problems, and can connect the two. An “analytical translator” is someone who is extremely skilled at communicating the results of quantitative analyses.
Data science is a team sport that requires the right blending of people with different skills, expertise, and experiences. Data science itself is an emerging discipline, drawing people with diverse educational backgrounds and work experiences. Typical of the requirements for a graduate degree is what we find in a recent announcement from the University of Wisconsin’s first system-wide online master’s degree in data science: “The Master of Science in Data Science program is intended for students with a bachelor’s degree in math, statistics, analytics, computer science, or marketing; or three to five years of professional experience as a business intelligence analyst, data analyst, financial analyst, information technology analyst, database administrator, computer programmer, statistician, or other related position.”
As with any team sport, there are stars that are paid more than the average player. According to Glassdoor (HT: Illinois Institute of Technology Master of Data Science program), the average salary for data scientists is a bit more than what Burtch Works reported, at over $118,000 per year. (By the way, Glassdoor reports the average salary for statistician is $75,000 and $92,000 for a senior statistician).
It’s possible that the Glassdoor numbers include more of what Burtch Works calls “elite data scientists.” Do we know who is in the elite of top data science players? The closest we get to identify the MVP of data science is the Kaggle ranking of the data scientists participating in its competitions. Currently, Owen Zhang is number one. Zhang says on his profile that “the answer is 42” and his bio section tells us that he is “trying to find the right question to ask.” He lists his skills as “Excessive Effort, Luck, and Other People’s Code.”
Zhang is currently the Chief Product Officer at DataRobot, a startup helping other data scientists build better predictive models in the cloud. He is also yet another example of how experience and skills still matter today more than formal data science education. His educational background? Master of Applied Science in Electrical Engineering from the University of Toronto.
This Burtch Works webinar provides highlights from the 40+ pages of compensation and demographic data in the report, which is available for free download here: http://goo.gl/RQX1xd
[youtube https://www.youtube.com/watch?v=aEkpVr8Q6oI?rel=0]
Data Scientists Still Hot, Salaries Cool Off


The third annual Burtch Works Study: Salaries of Data Scientists April 2016 is out, documenting the continuation of a very favorable market for those with the sexiest job of the 21st century. However, the salaries of data scientists appear to be leveling off: Every job category except one (entry-level individual contributors) experienced a marginal single-digit shift in median base salary over the past year. This compared to the overall increase in compensation of 14% in last year’s report.
The Burtch Works Study is based on compensation and demographic data for 374 data scientists collected in interviews conducted by Burtch’s recruiting staff during the 12 months ending March 2016. It focuses on data scientists as distinguished from other analytics professionals, defining them as follows:
Data scientists apply sophisticated quantitative and computer science skills to both structure and analyze massive unstructured datasets or continuously streaming data, with the intent to derive insights and prescribe action. The depth and breadth of their coding skills distinguishes them from other predictive analytics professionals and allows them to exploit data regardless of its source, size, or format. Through the use of one or more general-purpose coding languages and data infrastructures, data scientists can tackle problems made very difficult by the size and disorganization of the data.
Here are the highlights of the new report.
Individual contributors: Median base salaries range from $97,000 at level 1 to $152,000 at level 3 plus bonuses ranging from $10,000 to $21,000 (over 73% of all individual contributors are eligible for bonuses).
Managers: Median base salaries range from $140,000 at level 1 to $240,000 at level 3 plus bonuses ranging from $15,000 to $80,000 (over 80% of managers are eligible for bonuses).
Salary changes from last year’s study: Base salaries for individual contributors have increased 7% at level 1 and 1% at level 3, while salaries remained steady at level 2. For managers, salaries remained steady at level 1 while those at level 2 increased 3%. At level 3, the median base salary decreased by 4% ($10,000).
Data scientists continue to get top compensation for analytics professionals: Data scientists earn base salaries up to 39% higher than other predictive analytics professionals depending on job category.

A shift in the educational background of data scientists: 59% of level 1 individual contributors’ highest degree is a Master’s, a significant increase from last year’s 48%.
An increase in the number of U.S. citizens in the data science talent pool: Among level 1 individual contributors, only 43% of this year’s professionals are foreign-born vs. 53% last year.
It appears that the increase in the number of graduate-level programs in data science has started to make its mark and is contributing to an increase in the supply of entry-level data scientists with a Master’s degree. Other trends Burtch Works has observed in its recent conversations with data scientists are increased desire to work for “more mission-driven organizations attempting to make an impact on society” rather than large companies such as Facebook or Google and “the increasing pressure on many startups to show their value,” otherwise known as the coming burst of the Unicorn Bubble.
If we do see a contraction in startup activity and attractiveness over the next year, it may well be that larger and more stable companies, even in traditional industries, will become more desirable for budding—and even experienced—data scientists, regardless of their desire to “change the world.” The job opportunities—and the high compensation—will certainly be there as the practice of data science spreads into all corners of the economy. As Burtch Works predicts: “The use of data science will become more ubiquitous, the talent supply will improve, and there will be even more use cases for these techniques.”
Originally published on Forbes.com
Current Salaries for AI Professionals and Data Scientists
How to Become a Unicorn Data Scientist and Make More than $240,000
What makes a good data scientist? And if you are a good data scientist, how much should you expect to get paid?
Owen Zhang, ranked #1 on Kaggle, the online stadium for data science competitions, lists his skills on his Kaggle profile as “excessive effort,” “luck,” and “other people’s code.” An engineer by training, Zhang says in this ODSC interview that data science is finding “practical solutions to not very well-defined problems,” similar to engineering. He believes that good data scientists, “otherwise known as unicorn data scientists,” have three types of expertise. Since data science deals with practical problems, the first one is being familiar with a specific domain and knowing how to solve a problem in that domain. The second is the ability to distinguish signal from noise, or understanding statistics. The third skill is software engineering.
Not having formal education in statistics or software engineering, Zhang explains that he acquired his data science skills by competing in Kaggle and learning from its community. No doubt being very good at learning on your own is a required skill, to say nothing about hanging out with the right people, preferably unicorn data scientists. Galit Shmueli, Professor of Business Analytics at NTHU, told rjmetrics that her one piece of advice for data scientists just getting started is to “attend a conference or two, see what people are working on, what are the challenges, and what’s the atmosphere.”
Recent data shows that unicorn data scientists can make more than $240,000 annually. This according to the 2015 Data Science Salary Survey where O’Reilly Media’s John King and Roger Magoulas report the results of a survey of 600 “data practitioners” (reflecting the recency of the term, only one-quarter of the respondents have job titles that explicitly identify them as “data scientists”).
The median annual base salary of the survey sample is $91,000, and among U.S. respondents is $104,000, similar to last year’s results. 23% said that it would be “very easy” for them to find another position.
Keep in mind that “23% of the sample hold a doctorate degree,” and additional 44% hold a master’s. The word “sample” here means, as it does in almost all other surveys today, “the people that wanted to answer our survey.” But unlike other survey report authors, King and Magoulas make sure to issue this warning: “We should be careful when making conclusions about survey data from a self-selecting sample—it is a major assumption to claim it is an unbiased representation of all data scientists and engineers… the O’Reilly audience tends to use more newer, open source tools, and underrepresents non-tech industries such as insurance and energy.”
Still, we can learn quite a lot about the background and skills required for admission into this well-paid group of data masters. Two-thirds of respondents had academic backgrounds in computer science, mathematics, statistics, or physics.
Beyond the initial training, it is important to keep abreast of the ever-changing landscape of data science tools: “It seems likely that in the long run knowing the highest paying tools will increase your chances of joining the ranks of the highest paid,” say King and Magoulas. And the most recent additions to the data science tool pantheon provide the greatest boost to salaries: “…learning Spark could apparently have more of an impact on salary than getting a PhD. Scala is another bonus: those who use both are expected to earn over $15,000 more than an otherwise equivalent data professional.”
The bad news is that the more time spent in meetings (even for non-managers), the more money a data scientist makes. Another widely discussed unpleasant part of the job—data cleaning—is the #2 task on which data scientists spend the most time, with 39% of survey participants spending at least one hour per day on this task. The good news is that exploratory data analysis is what occupies them most, with 46% spending one to three hours per day on this task and 12% spending four hours or more.
More data on the skills employed by practicing data scientists comes from an AnalyticsWeek survey of 410 data professionals. In Optimizing Your Data Science Team, Bob E. Hayes reports that respondents were asked to indicate their level of proficiency for 25 different skills.” Solving problems with data,” says Hayes, “requires expertise across different skill areas: 1) Business, 2) Technology, 3) Programming, 4) Math & Modeling and 5) Statistics. Proficiency in each skill area is related to job role.”
All of these skills may not present themselves in a single data scientist but it’s possible to assemble all of them by putting together a top-notch data science team. In “Tips for building a data science capability” from consulting firm Booz Allen Hamilton, we learn that “rather than illuminate a single data science rock star, it is important to highlight a diversity of talent at all levels to help others self-identify with the capability. It is also a more realistic version of the truth. Very rarely will you find ‘magical unicorns’ that embody the full breadth of math and computer science skills along with the requisite domain knowledge. More often, you will build diverse teams that when combined provide you with the ‘triple-threat’ (computer science, math/statistics, and domain expertise) model needed for the toughest data science problems.”
The concept of a data science team, combining various skills and educational backgrounds, is high on the agenda of the 175-year-old American Statistical Association (ASA) which is probably looking in dismay at the oodles of funds going to establishing new data science programs and research centers at American universities, to say nothing about the salaries of data scientists as opposed to the salaries of statisticians.
The ASA issued a “policy statement” on October 1, reminding the world that statistics is one of the three disciplines “foundational to data science” (the other two being database management and distributed and parallel systems, providing a “computational infrastructure”). The statement concludes with “The next generation [of statisticians] must include more researchers with skills that cross the traditional boundaries of statistics, databases and distributed systems; there will be an ever-increasing demand for such ‘multi-lingual’ experts.”
In other words, if you aspire to a $200,000+ salary, better call yourself a data scientist and start coding.

