Data Scientist: 6 Definitions

From Simon Rogers, “What is a Data Scientist?”:

“Someone who can bridge the raw data and the analysis – and make it accessible. It’s a democratising role; by bringing the data to the people, you make the world just a little bit better.”–Simon Rogers

“A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.”–DJ Patil

“A data scientist is someone who blends, math, algorithms, and an understanding of human behavior with the ability to hack systems together to get answers to interesting human questions from data.”–Hilary Mason

“A data scientist is a rare hybrid, a computer scientist with the programming abilities to build software to scrape, combine, and manage data from a variety of sources and a statistican who knows how to derive insights from the information within. S/he combines the skills to create new protoypes with the creativity and thoroughness to ask and answer the deepest questions about the data and what secrets it holds”–Jake Porway

“The four qualities of a great data scientist are creativity, tenacity, curiosity, and deep technical skills. They use skills in data gathering and data munging, visualization, machine learning, and computer programming to make data driven decisions and data driven products. They prefer to let the data do the talking.”–Jeremy Howard

“By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo – starry eyed explorers and skeptical detectives.”–Monica Rogati

Posted in Data Science, Data Scientists | Leave a comment

Management Education in the Age of Big Data

What if business schools based their entire curriculum on the fundamentals of business analytics?

McKinsey estimates that the demand for “deep analytical positions” in the U.S. will exceed supply by 140,000 to 190,000 positions and that there will be a need for 1.5 million additional ”managers and analysts who can ask the right questions and consume the results of the analysis of big data effectively.” Continue reading

Posted in Misc | Leave a comment

Big Data Bytes: “Information technology has entered a big-data era”

“From social media to medical revolutions anchored in metadata analyses, wherein astronomical feats of data crunching enable heretofore unimaginable services and businesses, we are on the cusp of unimaginable new markets.”–Mark Mills and Julio Ottino, “The Coming Tech-Led Boom,”  The Wall Street Journal, January 30, 2012

“The data fabric is the next middleware”–Todd Papaioannou of http://continuuity.com/ quoted in Derrick Harris, “5 low-profile startups that could change the face of big data,” GigaOm, January 28, 2012

“You can’t have a conversation in today’s business technology world without touching on the topic of Big Data….companies such as Yahoo, Amazon, comScore and AOL have turned to Hadoop to both scale. According to some recent research from Infineta Systems, a WAN optimisation startup, traditional data storage runs $5 per gigabyte, but storing the same data costs about 25 cents per gigabyte using Hadoop.”–Michael Friedenberg, “Why Big Data Means a Big Year for Hadoop,” techworld.com, January 29, 2012

Posted in Misc | Leave a comment

What’s a Data Scientist? One More Definition

Shawn Hessinger at AllAnalytics.com summarizes yesterday’s e-chat with Gartner’s Doug Laney on what data scientists do and who they are. Gartner’s definition of a data scientist:

Responsible for mining, modeling, interpreting, blending, and extracting information from large datasets and then presenting something of use to non-data experts. These experts combine expertise in mathematics-based semantics in computer science with knowledge of the physics of digital systems.

And Laney thinks that a “A good data scientist could probably be a good data scientist in any industry and with almost any problem.”

Posted in Data Scientists | Leave a comment

Big Data Bytes: More on What’s a Data Scientist?

Chuck Hollis calls Data Scientists “rock stars” and argues that they are “a fundamentally different profession with a different profile than the BI analysts that came before [them].  They’re more likely to have advanced degrees, frequently have a background in the sciences (vs. business) and they interact with data in more ways — and using different tools.”

Over at Vator they call them “rocket scientists” and “data junkies.” And an article in the November/December issue of the IEEE Intelligent Systems explores a “what if” scenario in which data scientists are criminals. No quotation marks.

Posted in Data Scientists | Leave a comment

Asking Good Questions is What Will Make Big Data Work for You

Asking good questions as the key to unleashing the potential of big data got significant blog time this past week. Continue reading

Posted in Data Scientists | Leave a comment

Big Data Bytes of the Week: What’s a Data Scientist?

What’s a Data Scientist? Joshua Konkle, Vice President at DCIG, quoted (scroll down) a few definitions earlier this week: Continue reading

Posted in Misc | Leave a comment

The First Law of Big Data

EMC released today the 5th annual Digital Universe study from IDC.  So now we have five years’ worth of estimating, with a consistent methodology, the amount of data created and copied annually in the world. It turns out that the amount of digital data created each year has grown by a factor of 9 in the last five years. And since IDC uses the same methodology to forecast the next five years, it looks like data will grow by a factor of 61 over the ten-year period, 2005 t0 2015. Continue reading

Posted in Data Growth | Leave a comment

Crowdsourcing and Big Data

The Wikipedia article on Big Data says it “requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times.” The examples given (Hadoop, MapReduce, Cloud Computing, etc.) do not include one very exceptional technology, the human brain, and a new way to harness its power, “crowdsourcing.” In the 2006 Wired article in which he coined the term, Jeff Howe wrote: “Just as distributed computing projects like UC Berkeley’s SETI@home have tapped the unused processing power of millions of individual computers, so distributed labor networks are using the Internet to exploit the spare processing power of millions of human brains.” Isn’t crowdsourcing one of the “exceptional technologies” required by Big Data?

To find out more about crowdsourcing and its role in the service of Big Data,  I attended yesterday a Crowdsortium Meetup. Karim Lakhani from the Harvard Business School opened with a brief keynote, reminding us of (Bill) Joy’s Law: “No matter where you are, most smart people work for someone else.” Following him was a panel with the aforementioned Howe, Dwayne Spradlin (CEO of Innocentive), Doron Reuveni (CEO of uTest), Dan Sullivan (CEO of Appswell), moderated expertly by Jim Savage, partner and co-founder of Longworth Venture Partners. Continue reading

Posted in Data Scientists | Leave a comment

Big Data News Roundup

IBM’s Watson visited a few conferences last week. Watson’s lead developer, David Ferrucci delivered a keynote at the ACM’s 2011 Federated Computing Research Conference in San Jose, CA. Continue reading

Posted in AI | Leave a comment