The Data Science Interview: Edwin Chen, Twitter

I don’t do pure research—my analysis enables real-world functionality

Currently mining terabytes of tweets as a data scientist with Twitter, Edwin Chen studied math and linguistics at MIT and then crunched numbers at Peter Thiel’s hedge fund, Clarium Capital Management. He blogs on topics of interest to data scientists such as crowdsourcing text analysis with Amazon’s Mechanical Turk or ggplot2, a data visualization tool. The following is an edited transcript of our recent phone conversation.

When you went to MIT, what were your future plans?     Continue reading

Posted in Data Science | Leave a comment

DataKind’s Jack Porway on Data Science

[youtube=http://www.youtube.com/watch?v=Mm1RplOU0cQ&w=560&h=315]

“If you leave an excited data scientist on his own to solve a problem, he’s going to solve his own problem – which is usually parking his car, or finding a bar to drink at. The trick that we worked on was actually less about data and more about translation, about finding a way for data scientists to speak the language of the people who were trying to solve the big problems… the biggest [challenge] is actually the framing of the problem: really finding the question. As any good data scientist will tell you, it’s not so much about the data, it’s the question you start with”–Jack Porway, DataKind

More here

Posted in Data Science | Leave a comment

Where should you put your data scientists?

[slideshare id=61486991&doc=whereshouldyouputyourdatascientists-160429025555]

Posted in Data Science | Leave a comment

Data Science Skills

datascience_skills1

datascience_skills_proficiency

Source: Bob Hayes

Posted in Data Science, Misc | Leave a comment

What Makes a Good Data Scientist?

Data_Scientist_infographic_WR

Posted in Data Science | Tagged | Leave a comment

Tom Davenport on Managing Data Scientists (Video)

[youtube https://www.youtube.com/watch?v=VK4-ASEUmgE?rel=0]

Posted in Data Science | Leave a comment

Survey: The Hunt for Unicorn Data Scientists Boosts the Salaries of Predictive Analytics Professionals

Burtch1

Base Salaries for Individual Contributors

Burtch2

Base Salaries for Managers

Unicorn Data Scientists (upgraded from “sexy data scientists”) are hard to find and are paid more than $200,000 per year. A new survey finds that the rising data science tide lifts the compensation of all other data analytics professionals, even if they don’t know how to code.

The Burtch Works Study: Salaries for Predictive Analytics Professionals is based on interviews with 1,757 data analytics professionals conducted over the 12 months ending April 2015 by executive recruiting firm Burtch Works. It is a unique source of information in that it does not rely on self-reporting or data provided by human resources departments. It also provides insights into how the demand for data scientists impact the salaries of other data analytics professionals because it excludes data scientists, covered in a separate Burtch Works study, published earlier this year (I wrote about that study here).

Burtch Works defines predictive analytics professionals as those who can “apply sophisticated quantitative skills to data describing transactions, interactions, or other behaviors of people to derive insights and prescribe actions.” Data scientists are a subset of this group—they have the “computer science skills necessary to acquire and clean or transform unstructured or continuously streaming data, regardless of its format, size, or source.”

The additional computer science skills put data scientists on top in terms of compensation regardless of their levels of experience and managerial responsibilities but predictive analytics professionals are keeping up, seeing their salaries and bonuses rise. For example, the median base salary for the most experienced individual contributors rose from $115,250 last year to $125,000 this year and for managers managing teams of ten or more the median base salary rose from $225,000 to $235,000.

Predictive analytics professionals continue to benefit from the increasing demand and short supply for their quantitative analysis skills. The median base salary of individual contributors varies from $76,000 for those at level 1 (0 to 3 years of experience) to $125,000 for those at level 3 (9+ years of experience). The median bonus received varies from $8,100 to $18,100, depending on job level.

The median base salary of managers varies from $125,500 for those at level 1 (1 to 3 reports) to $235,000 for those at level 3 (10+ reports). The median bonus received by managers varies from $23,000 to $75,000 depending on job level.

More and more people are attracted by the demand for data analytics professionals and the potential to become a unicorn. Data recently released by the National Center for Education Statistics, according to Phys.org, shows bachelor’s degrees in statistics grew 17% from 2013 to 2014. This marks 15 consecutive years the number of undergraduates in statistics has risen, increasing by more than 300% since the 1990s. In addition, from 2000 to 2014, master’s and doctorate degrees in statistics also grew significantly at 260% and 132%, respectively.

“The Bureau of Labor Statistics projects job growth for statisticians will increase 27% between 2012 and 2022, outpacing the projected 11% rate for all other occupations. The number of graduates in statistics each year—approximately 2,000 bachelor’s degrees, 3,000 master’s degrees and 575 doctorate degrees—seems unlikely to match this demand,” says Phys.org.

Originally published on Forbes.com

Posted in Data Science, Predictive analytics | Tagged , , | Leave a comment

A Career in Data Science: Bob Rogers, Chief Data Scientist, Intel

Bob_Rogers_headshot

Bob Rogers, Chief Data Scientist, Intel

 

“Business leaders want ‘the answer,’” says Bob Rogers, Chief Data Scientist for Big Data Solutions at Intel. But data scientists must understand what “the answer” means in the specific business context and communicate the expected impact in the language of the business executives.  They need to explain the results of their analysis in “terms of the risk to the business” and “translate uncertainty into outcomes,” says Rogers. “If you show error bars on a number in a business presentation, you are probably going down the wrong path.”

When the data scientist as a new business role has emerged about a decade ago, the emphasis was on how it combined two disciplines and skill sets: computer science and statistics. More recently, the discussion of this evolving role has been along the lines of Rogers’ observation, as one combining technical and business expertise, emphasizing the importance of communications skills. Drew Conway’s 2010 definition of a data scientist as a Venn diagram of computer science, statistics and domain expertise has now been updated to include communications as a stand-alone set of required skills.

“Statisticians have missed the initial boat of data science,” says Rogers. “They tend to be very specific about the way they discuss data, ways that are not necessarily amenable to a broader discussion with a business audience.”

What we have here is a re-definition of what was previously perceived as a highly technical job to a more generalized business role. The rise of the Sexiest Job of the 21st Century has spawned numerous undergraduate and graduate programs focusing on imparting technical skills and knowledge, aiming to supply the widely-discussed shortage of experts in managing and mining the avalanche of big data. We now see business schools (e.g., Wharton) establishing a major in analytics, combining data science training with general business education. The nest step, I would argue, will be the complete integration of the two types of training: Business education as data science education.

Rogers’ varied work experience over the last twenty five years is a prime example of the amalgam of skills and expertise that will be the hallmark of successful business leaders in the years to come. It’s a unique combination of scientific curiosity and acumen, facility with computer programming and data manipulation, entrepreneurial drive and experimental inclination. All of these wrapped in a deep understanding, derived from direct experience, of the business context—the requirements, challenges, human motivations and attitudes that drive business success.

Like some of the leading data scientists of recent vintage, Rogers started his working life after earning a PhD in Physics. But in 1991, when he got his degree from Harvard University, there was not much data to support his thesis work in astrophysics, so he and others like him “were doing a lot more simulations.”  Today, “there is a lot of data associated with cosmology,” says Rogers, but then and now, knowing how “to model the data” has been a crucial requirement in this and other scientific fields. A new training ground today for budding data scientists, according to Rogers, is computational neuroscience, where the “amount and shape of data” coming from functional MRI requires “advanced modeling thinking.”

While doing a post-doc at a research institute, his own experience with computer modeling and simulations led Rogers to co-author a book on using artificial neural networks for time series forecasting. All of a sudden he was getting phone calls from people asking him about forecasting the stock market, a subject he didn’t know much about.

Serendipity plays a major role in many illustrious careers and Rogers’ was no exception. The husband of a friend of his wife’s owned a trading firm in Chicago, and with his help, Rogers started a company rather than pursue an academic career, just like many latter-day data scientists. “I was 28 at the time,” he explained when I asked him why he made such a risky career switch.

In another similarity to today’s data scientists, Rogers did not limit his involvement with the startup to developing forecasting models for the Chicago futures market, but also got down and dirty building a research platform for collecting data on transactions and the back-office systems for executing trades, accounting, and other functions.

This went on for about a dozen years, in the last four of which Rogers has switched from R&D work to selling the company’s services when it opened up for new—international—investors.

“What was really profound for me as a data scientist,” says Rogers, “was actually the marketing side—I started to appreciate that there was a huge difference between having a technology that performed well and having a product that was tailored to fit the specific business needs of the customer. International investors had very specific needs around how the product was configured.”

Recalling his own experience leads Rogers to yet another observation about how understanding the business context and being able to communicate with business leaders are such important components of the data scientist’s job today:

“What I’ve seen changed between the pre-data science period and the current era is that analytics in the enterprise used to be very focused on a business leader asking a business analyst for a report on X—that was the process. Now, it’s much more of a conversation. Story telling skills, sensitivity to what the business needs are—successful data scientists tend to have this conversation.” In addition, there is more sensitivity to the uncertainty associated with data—“awareness that a number is not just a number”—even data that comes from a structured database should be handled with care.

By 2006, it was time to move on and “get into something that was more personally satisfying to me,” says Rogers, as “our computational and technological advantages have started to decline.” Healthcare turned out to be the more personally satisfying domain and he became the global product manager for the Humphrey Visual Field Analyser, widely used in Glaucoma care.

In yet another application of adding a time dimension to data, Rogers worked with a research team to move beyond a single, one-time measurement of the patient’s peripheral vision and compute the rate of change and the progression of blindness over time.  “It became an important tool for tracking these patients and their response to therapy,” he says. And in yet another immersion in practical, hands-on computing, the solution involved adding networking software (licensed from Apple) to multiple devices in a clinic to facilitate the collection of data from past measurements.

Better access to data, Rogers understood from that experience, was crucial for improving healthcare. In 2009, when the US federal government started to give incentives for healthcare providers to use Electronic Medical Records (EMR) systems, he saw how the original paper silos were simply replaced with electronic silos, with each EMR system becoming a stand-alone database. Not only there was no physical connection, there was no interoperability “from a semantic point of view—descriptions in one system could not be directly compared with those in another.”

The solution was a cloud-based system that pulled data from a variety of sources and a machine learning software that constructed a table of all the codes and concepts in the clinical data and mapped them to each other. ”The more data we got, the better we got at mapping these concepts and building a robust set of associations,” says Rogers. “That allowed us to build a clinically intelligent search engine.”

You may think that this is “big data” in a nutshell—more data equals better learning or as some have called it, “the unreasonable effectiveness of data.” But you may want to reconsider admiring data quantity for quantity’s sake, given what Rogers and his colleagues found out while mining electronic medical records.

“63% of the key information that the doctor needs to know about you is not in your coded data at all,” says Rogers. “And 30% of the time, if you have a heart failure in your code, it’s not heart failure” and could have been a mistake or a related entry (e.g., a test for heart failure) in the billing system. As a result, most of the learning in Rogers’ machine learning system was dedicated to analysis of the text to “understand what information about the patient is actually correct.” An important big data lesson or what one may call the unreasonable effectiveness of data quality.

That system became the foundation of another startup, Apixio, which has recently raised $19.3 million in Series D venture capital funding. After serving there as Chief Scientist for 5 years, Rogers moved on again, in January 2015, this time from the world of startups to the corporate world and his current role at Intel.

As Chief Data Scientist he works internally on product road maps, providing input related to his expertise and the trends he sees. Externally, he works with the customers of Intel’s customers, helping them in “conceptualizing their entire analytics pipeline.” Providing free advice to consumers of analytics “helps keep Intel at the center of the computational world” and helps keep Rogers abreast of the latest data mining trends and developments. He learns about on-going concerns regarding whether a “new architecture” is required to accommodate the most recent data science tools and observes the rise of new challenges such as “monitoring many different real-time data streams.” And he reports that recently there has been a lot of interest in deep learning. Here, too, a key concern is integration–is it possible to build these new capabilities within the existing big data infrastructure?

Rogers’ role as a trusted advisor also includes working with partners. For example, the Collaborative Cancer Cloud, an Intel-developed precision medicine analytics platform. Currently, it is used by the Knight Cancer Institute at Oregon Health & Science University, Dana-Farber Cancer Institute and Ontario Institute for Cancer Research, to securely share patient genomic, imaging and clinical data to accelerate their research into potentially lifesaving discoveries.

Extrapolating from his current and previous work, Rogers sees the future of AI as “the development of machine learning systems that are good at figuring out the context.” A lot of the recent AI news has been about what he perceives as immature work—“image captioning is a sort of parlor trick,” says Rogers. “We will start to see an emerging AI capability” when we have machine or deep learning capable of identifying the context of the image.

Unlike others who see the machines as potentially replacing humans, Rogers envisions human-machine collaboration: “AI capabilities are most interesting when they are used to amplify human capabilities. There are things that we are good at cognitively but we cannot do at scale. We [should use machine learning] to surface the information from a large volume of data so we can do the next level of inference ourselves,” he says.

Understanding the context. Accepting and managing uncertainty. Linking pieces of data to uncover new insights. Like good data scientists, future business leaders will not look for “the answer.” With the right attitude, experience, and training, they will actively search for data to refute their assumptions, question their most beloved initiatives, and challenge their established career trajectories.

Originally published on Forbes.com

Posted in Data Science | Tagged | Leave a comment

Data Science: Ranking Online Influencers

Data science is the defining specialty of the business of big data and an emerging career path for those who love to find new insights in the gazillion bytes of data created each day. It’s where you find fierce competition for talent, the jobs of the future, new training programs and courses, new ventures, and new products. But where to find the data science-relevant online conversations with the most impact?  Continue reading

Posted in Data Science | Leave a comment

The $250K Median Salary of Data Scientists Managers is Why Google and Salesforce Invested $20B in Self-Service Data Science

2019 salaries of data scientists–managers (Burtch Works)

Earlier this month, Salesforce announced the acquisition of data visualization and analytics leader Tableau for $15.7 billion and Google announced the acquisition of data discovery and analytics platform Looker for $2.6 billion. Both acquired companies will beef up the acquiring companies’ Data Science as a Service (DSaaS) capabilities, providing their enterprise customers with a wide range of easy (or easier) to use tools that “democratize” data preparation, integration, analysis, and presentation.

With self-service data science, all business users that do not have statistical analysis background and don’t know how to code can make data-driven decisions, instead of relying on expensive and hard-to-find data scientists.

How expensive? The average annual base salary for an experienced data scientist in a management position is currently $257,443 according to Burtch Works.

Read more here

Posted in Data Science | Leave a comment