Boston’s new data science-related meetup, The Data Scientist, got off to a great start yesterday with a presentation titled “The Scientist, The Team and The Purpose,” entertainingly delivered by Mingsheng Hong, Chief Data Scientist at Hadapt.
The event was live tweeted by Tammy Kahn Fennell, CEO of MarketMeSuite.com, a Boston-based start-up providing a (free!) social media dashboard. The Storify digest of all the tweets is here. For me, the highlight of the presentation was Mingsheng’s assertion that the data scientist is the new product manager. “Data scientists are taking a data-driven position to make the product better,” he said. He qualified himself somewhat by saying that this is true “especially for the platform vendors, where data scientists can help develop a product that is really easy to use.” This way, data scientists in these companies (such as Hadapt, I guess), empower not just one user at a time, but “empower a wave of new users and companies that benefit from data science.”
This is an interesting (and I believe, original) view of the role of the data scientist and I would argue that it may also apply to companies that are not “platform vendors” and even outside of tech, in industries such as retail, financial services, and consumer goods. Good product managers have always been data-driven, bringing the voice of the market to bear on innovative ideas that are sometimes an engineering or manufacturing marvel but have no chance of market acceptance. Of course, it could also be the other way around—that great breakthroughs are shot down because of “gut feelings” or biased use of market data. So having data scientists as some or all of your product managers may help make product-related decision-making more rigorous than it is currently practiced in your company.
But you have to be careful not to replace good human judgment with an automated data science process. Unlike some of his fellow data scientists (and many scientists today), Mingsheng warned about over-reliance on machine learning. “The machines will never take over,” he stated emphatically, “because machines don’t have domain knowledge.” Machine learning is just a great tool but it is still (and maybe forever) humans that issue the high-level instructions. In his meetings with customers, Mingsheng said, “I don’t expect anyone to say, ‘I don’t know anything about my business, tell me.’”
In his presentation, Minsheng covered briefly the data lifecycle, or (my words) the big data analytics process, as an introduction to the next few meeting of the group. In a somewhat unusual move for a meetup, the organizers, Jonh Baker and Jason Sroka, decided to launch the meetup with a pre-planned series of six meetings, covering the big data analytics process from finding and organizing data all the way to data visualization. They say: “To promote the exchange of ideas, we hope to provide a framework of key areas in which we as Data Scientists work, focused initially on the stages we go through in planning and executing a data analytics effort. We seek to solicit from you the anecdotes and experiences that will spur us all to increased understanding of the field and our collective efforts in it… To facilitate this exchange of ideas, we have organized six seminars covering topics related to the entire data lifecycle… We hope it will be a venue where we can all share the trials and tribulations of Data Science, where each of us can learn from one another and a place to build support groups as each of us tackle these really hard data problems.”
Mingsheng concluded his presentation magnificently by telling us about his work with the St. Baldrick’s Foundation, “a volunteer-driven charity committed to funding the most promising research to find cures for childhood cancers and give survivors long and healthy lives.” He encouraged all attendees to go to his page on St. Baldrick’s website and donate in lieu of the free presentation and pizza. Just another example of how BigDataBoston is leading the way!