Data visualization: Plotting life expectancy against income for 200 countries over 200 years
[youtube https://www.youtube.com/watch?v=jbkSRLYSojo?rel=0]
Hans Rosling’s famous lectures combine enormous quantities of public data with a sport’s commentator’s style to reveal the story of the world’s past, present and future development. Now he explores stats in a way he has never done before – using augmented reality animation. In this spectacular section of ‘The Joy of Stats’ he tells the story of the world in 200 countries over 200 years using 120,000 numbers – in just four minutes. Plotting life expectancy against income for every country since 1810, Hans shows how the world we live in is radically different from the world most of us imagine.
A Practical Introduction to Data Science Skills (Video)
Google’s Michael Manoochehri at DataEDGE 2013 presenting an introduction to data analysis and suggestions for how to become a data scientist (his notes for the presentation are here).
[youtube=http://www.youtube.com/watch?v=rpwZ_i-9U0o&w=560&h=315]
What’s a Data Scientist? One More Definition
Shawn Hessinger at AllAnalytics.com summarizes yesterday’s e-chat with Gartner’s Doug Laney on what data scientists do and who they are. Gartner’s definition of a data scientist:
Responsible for mining, modeling, interpreting, blending, and extracting information from large datasets and then presenting something of use to non-data experts. These experts combine expertise in mathematics-based semantics in computer science with knowledge of the physics of digital systems.
And Laney thinks that a “A good data scientist could probably be a good data scientist in any industry and with almost any problem.”
Top Ten Kaggle Data Scientists
1. Alexander D’yakonov
An academic in the Faculty of Computational Mathematics and Cybernetics department at Moscow State University, Alexander modestly describes his favorite problem-solving technique as “luck.” Despite this, the 33-year-old Russian has earned a reputation for using methods known for their theoretical rigor and elegant simplicity. This helped him to win the dunnhumby Shopper Challenge, which asked competitors to predict the amount and timing of supermarket shoppers’ next spends. Continue reading
Domain Expertise vs. Machine Learning: The Debate Continues
By starting to rank all the data scientists participating in its competitions, Kaggle today advanced further its argument that data science is a generic set of skills that can be applied to any problem without prior domain expertise. Talking to The New York Times‘ Quentin Hardy, Jeremy Howard, Kaggle’s president and chief scientist, said that “it makes little difference for a top performer if the problem is public health or essays in Arabic. The argument that great data science is just about letting the data talk holds true.”
For a (short, recent) history of the debate, see Mike Driscoll’s summary of the deliberations of the panel arguing for and against machine learning and domain expertise at the recent Strata conference (video here), the results of a KDnuggets poll, and Mike Loukides’ passionate defense of expertise, concluding that “the real value of a subject matter expert: not just asking the right questions, but understanding the results and finding the story that the data wants to tell. Results are good, but we can’t forget that data is ultimately about insight, and insight is inextricably tied to the stories we build from the data.”
Big Data Bytes: Data Scientists Wanted
“Businesses now looking for talent with deep analytical and statistical backgrounds include big publishers, portals, ad networks, and e-commerce sites – just about any company that possesses massive amounts of data. Salaries range from $75,000 to $100,000 for someone starting out with strong analytical skills and background to as much as $150,000 to $300,000 for experienced professionals.”–“Wanted: Data Scientist With a Human Touch”
“[Former Vertica CEO] Lynch told his staff during the February meeting that he has no intention of retiring. Indeed, he pledged to his staff that he would assist in starting-up or otherwise supporting no less than 20 Big Data start-ups in the Boston area over the next five years.”–“HP Lead Big Data Exec Chris Lynch Resigns”
“This is the time to be super aggressive.”–Chris Lynch
“As the amount of data in the world grows, the only certainty is that there will need to be more qualified peopled to make sense of it. That should be good news as we stop and salute our machine overlords.”–“The Age of Big Data”
Crowdsourcing and Big Data
The Wikipedia article on Big Data says it “requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times.” The examples given (Hadoop, MapReduce, Cloud Computing, etc.) do not include one very exceptional technology, the human brain, and a new way to harness its power, “crowdsourcing.” In the 2006 Wired article in which he coined the term, Jeff Howe wrote: “Just as distributed computing projects like UC Berkeley’s SETI@home have tapped the unused processing power of millions of individual computers, so distributed labor networks are using the Internet to exploit the spare processing power of millions of human brains.” Isn’t crowdsourcing one of the “exceptional technologies” required by Big Data?
To find out more about crowdsourcing and its role in the service of Big Data, I attended yesterday a Crowdsortium Meetup. Karim Lakhani from the Harvard Business School opened with a brief keynote, reminding us of (Bill) Joy’s Law: “No matter where you are, most smart people work for someone else.” Following him was a panel with the aforementioned Howe, Dwayne Spradlin (CEO of Innocentive), Doron Reuveni (CEO of uTest), Dan Sullivan (CEO of Appswell), moderated expertly by Jim Savage, partner and co-founder of Longworth Venture Partners. Continue reading
Asking Good Questions is What Will Make Big Data Work for You
Asking good questions as the key to unleashing the potential of big data got significant blog time this past week. Continue reading
Big Data Bytes: More on What’s a Data Scientist?
Chuck Hollis calls Data Scientists “rock stars” and argues that they are “a fundamentally different profession with a different profile than the BI analysts that came before [them]. They’re more likely to have advanced degrees, frequently have a background in the sciences (vs. business) and they interact with data in more ways — and using different tools.”
Over at Vator they call them “rocket scientists” and “data junkies.” And an article in the November/December issue of the IEEE Intelligent Systems explores a “what if” scenario in which data scientists are criminals. No quotation marks.

