“The startup’s three co-founders have backgrounds in engineering and data science, but not weather, and there are no meteorological models involved. By keeping weather predictions within a two-hour window, they believe statistics are sufficient.”–Mashable in “Can Statistics Predict Weather Without Meteorologists? This App Thinks So” on Ourcast, a new app that uses real-time radar data and crowdsourcing to predict how weather at a given location will change within the next two hours. Continue reading
Domain Expertise vs. Machine Learning: The Debate Continues
Top Ten Kaggle Data Scientists
1. Alexander D’yakonov
An academic in the Faculty of Computational Mathematics and Cybernetics department at Moscow State University, Alexander modestly describes his favorite problem-solving technique as “luck.” Despite this, the 33-year-old Russian has earned a reputation for using methods known for their theoretical rigor and elegant simplicity. This helped him to win the dunnhumby Shopper Challenge, which asked competitors to predict the amount and timing of supermarket shoppers’ next spends. Continue reading
Kirk Borne on Data Science: Start Small, Think Big
Domain Expertise vs. Machine Learning: The Debate Continues
By starting to rank all the data scientists participating in its competitions, Kaggle today advanced further its argument that data science is a generic set of skills that can be applied to any problem without prior domain expertise. Talking to The New York Times‘ Quentin Hardy, Jeremy Howard, Kaggle’s president and chief scientist, said that “it makes little difference for a top performer if the problem is public health or essays in Arabic. The argument that great data science is just about letting the data talk holds true.”
For a (short, recent) history of the debate, see Mike Driscoll’s summary of the deliberations of the panel arguing for and against machine learning and domain expertise at the recent Strata conference (video here), the results of a KDnuggets poll, and Mike Loukides’ passionate defense of expertise, concluding that “the real value of a subject matter expert: not just asking the right questions, but understanding the results and finding the story that the data wants to tell. Results are good, but we can’t forget that data is ultimately about insight, and insight is inextricably tied to the stories we build from the data.”
Big Data Bytes: Data Scientists Wanted
“Businesses now looking for talent with deep analytical and statistical backgrounds include big publishers, portals, ad networks, and e-commerce sites – just about any company that possesses massive amounts of data. Salaries range from $75,000 to $100,000 for someone starting out with strong analytical skills and background to as much as $150,000 to $300,000 for experienced professionals.”–“Wanted: Data Scientist With a Human Touch”
“[Former Vertica CEO] Lynch told his staff during the February meeting that he has no intention of retiring. Indeed, he pledged to his staff that he would assist in starting-up or otherwise supporting no less than 20 Big Data start-ups in the Boston area over the next five years.”–“HP Lead Big Data Exec Chris Lynch Resigns”
“This is the time to be super aggressive.”–Chris Lynch
“As the amount of data in the world grows, the only certainty is that there will need to be more qualified peopled to make sense of it. That should be good news as we stop and salute our machine overlords.”–“The Age of Big Data”
Big Data Startups News
Wikibon’s Jeff Kelly bravely put a stake in the ground recently, first among IT market observers, by estimating the big data market at $5 billion, growing to $50 billion in five years. Kelly’s 5/50/5 plan is a great guide to the initial jostling for market position in this very promising and very emerging market. it shows that most–if not all–of the innovation in big data came from startups, and some have already been acquired by established IT firms.
The big data market, as defined by Wikibon, includes the hardware, software, and services designed to address the shortcomings of traditional data base technologies in handling large data sets. This means that the $5 billion estimate is a conservative one as it represents a narrow market, the market comprised of what we could call the hardware, software, and services platforms for big data. Continue reading
Data Scientist: 6 Definitions
From Simon Rogers, “What is a Data Scientist?”:
“Someone who can bridge the raw data and the analysis – and make it accessible. It’s a democratising role; by bringing the data to the people, you make the world just a little bit better.”–Simon Rogers
“A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data.”–DJ Patil
“A data scientist is someone who blends, math, algorithms, and an understanding of human behavior with the ability to hack systems together to get answers to interesting human questions from data.”–Hilary Mason
“A data scientist is a rare hybrid, a computer scientist with the programming abilities to build software to scrape, combine, and manage data from a variety of sources and a statistican who knows how to derive insights from the information within. S/he combines the skills to create new protoypes with the creativity and thoroughness to ask and answer the deepest questions about the data and what secrets it holds”–Jake Porway
“The four qualities of a great data scientist are creativity, tenacity, curiosity, and deep technical skills. They use skills in data gathering and data munging, visualization, machine learning, and computer programming to make data driven decisions and data driven products. They prefer to let the data do the talking.”–Jeremy Howard
“By definition all scientists are data scientists. In my opinion, they are half hacker, half analyst, they use data to build products and find insights. It’s Columbus meet Columbo – starry eyed explorers and skeptical detectives.”–Monica Rogati
Management Education in the Age of Big Data
What if business schools based their entire curriculum on the fundamentals of business analytics?
McKinsey estimates that the demand for “deep analytical positions” in the U.S. will exceed supply by 140,000 to 190,000 positions and that there will be a need for 1.5 million additional ”managers and analysts who can ask the right questions and consume the results of the analysis of big data effectively.” Continue reading
Big Data Bytes: “Information technology has entered a big-data era”
“From social media to medical revolutions anchored in metadata analyses, wherein astronomical feats of data crunching enable heretofore unimaginable services and businesses, we are on the cusp of unimaginable new markets.”–Mark Mills and Julio Ottino, “The Coming Tech-Led Boom,” The Wall Street Journal, January 30, 2012
“The data fabric is the next middleware”–Todd Papaioannou of http://continuuity.com/ quoted in Derrick Harris, “5 low-profile startups that could change the face of big data,” GigaOm, January 28, 2012
“You can’t have a conversation in today’s business technology world without touching on the topic of Big Data….companies such as Yahoo, Amazon, comScore and AOL have turned to Hadoop to both scale. According to some recent research from Infineta Systems, a WAN optimisation startup, traditional data storage runs $5 per gigabyte, but storing the same data costs about 25 cents per gigabyte using Hadoop.”–Michael Friedenberg, “Why Big Data Means a Big Year for Hadoop,” techworld.com, January 29, 2012
What’s a Data Scientist? One More Definition
Shawn Hessinger at AllAnalytics.com summarizes yesterday’s e-chat with Gartner’s Doug Laney on what data scientists do and who they are. Gartner’s definition of a data scientist:
Responsible for mining, modeling, interpreting, blending, and extracting information from large datasets and then presenting something of use to non-data experts. These experts combine expertise in mathematics-based semantics in computer science with knowledge of the physics of digital systems.
And Laney thinks that a “A good data scientist could probably be a good data scientist in any industry and with almost any problem.”