A Very Short History of Data Science

Source: http://compsocsci.blogspot.com/

I’m in the process of researching the origin and evolution of data science as a discipline and a profession. Here are the milestones that I have picked up so far, tracking the evolution of the term “data science,” attempts to define it, and some related developments.  I would greatly appreciate any pointers to additional key milestones (events, publications, etc.).

[An updated version of this timeline is at Forbes.com]

1974 Peter Naur publishes Concise Survey of Computer Methods in Sweden and the United States. The book is a survey of contemporary data processing methods that are used in a wide range of applications. It is organized around the concept of data as defined in the IFIP Guide to Concepts and Terms in Data Processing, which defines data as “a representation of facts or ideas in a formalized manner capable of being communicated or manipulated by some process.“ The Preface to the book tells the reader that a course plan was presented at the IFIP Congress in 1968, titled “Datalogy, the science of data and of data processes and its place in education,“ and that in the text of the book, ”the term ‘data science’ has been used freely.” Naur offers the following definition of data science: “The science of dealing with data, once they have been established, while the relation of the data to what they represent is delegated to other fields and sciences.”

Continue reading
Posted in Big Data Analytics, Big Data Jobs, Data Science, Data Science History, Data Scientists | Leave a comment

Imagination and Data Science

Today in 1833, Ada Byron (later Countess Lovelace) met Charles Babbage when visiting his house to see a portion the Difference Engine, or what her mother, Lady Byron, called his “thinking machine.” James Gleick writes in The Information: “Babbage saw a sparkling, self-possessed young woman with porcelain features and a notorious name, who managed to reveal that she knew more mathematics than most men graduating from university. She saw an imposing forty-one-year-old, authoritative eyebrows anchoring his strong-boned face, who possessed wit and charm and did not wear these qualities lightly. He seemed a kind of visionary–just what she was seeking. She admired the machine, too.”

With the Analytical Engine, Babbage imagined the modern computer. Gleick quotes Ada on imagination, from an essay she wrote in 1841: “It is that which penetrates into the unseen worlds around us, the worlds of Science. It is that which feels & discovers what is, the real which we see not, which exists not for our senses. Those who have learned to walk the threshold of the unknown worlds… may then with the fair white wings of Imagination hope to soar further into the unexplored amidst which we live.”

In this she anticipated Albert Einstein’s much-quoted observation: “Imagination is more important than knowledge. For knowledge is limited to all we now know and understand, while imagination embraces the entire world, and all there ever will be to know and understand.”

Note to Data Scientists (or more specifically, those making exaggerated claims about IBM’s Watson or the promise of “data-driven” science): Without our imagination, machines can’t learn.

Posted in Data Science, Data Science History, Data Scientists | Leave a comment

The Big Data Landscape Revisited

Bruce Reading, CEO of VoltDB, has an interesting and original take on the big data landscape.

Last year, Dave Feinleib published the Big Data Landscape, “to organize this rapidly growing technology sector.” One prominent data scientist told me “it’s just a bunch of logos on a slide,” but it has become a popular reference point for categorizing the different players in this bustling market. Sqrrl, a big data start-up, published recently its own version of Feinleib’s chart, its “take on the big data ecosystem.” Sqrrl’s eleven big data “buckets” are somewhat different from Feinleib’s, demonstrating a lack of agreement, understandable at this stage, on what exactly are the different segments of the big data market and what to call them. Furthermore, Sqrrl positions itself “at the intersection of four of these boxes” which raises questions about the accuracy of its positioning  of other big data companies inside just one or two boxes.

Another interesting recent attempt to make sense of the big data landscape comes from The 451’s Matt Aslett in the form of a “Database Landscape Map.” Taking its inspiration from the map of the London Underground and a content technology map from the Real Story Group, it charts the links between an ever-expanding database market and the data storing/organizing/mining technologies and tools (Hadoop, NoSQL, NewSQL…) that now form the core of the big data market.

Which brings me to Bruce Reading, VoltDB, and their take on the big data landscape. “It’s a very noisy market,” Bruce told a packed room at a recent VoltDB event. “It’s like shopping in a mall at Christmas time when there’s a lot of noise and a lot of information about a lot of technologies. We are trying to work with the marketplace to understand what you are trying to accomplish. Instead of using market maps based on technologies, we are looking at use cases.”

“Use case” is technology-speak for the list of requirements for achieving a specific goal, requirements that are embodied in the software that allows the user to achieve that goal. In other words, specialized software focused on addressing some unique need. VoltDB is focused on time (or data velocity) and believes, to quote Bruce, that “the whole world is trying to get as close to real-time as possible because that’s where the greatest value is of a single point of data.” Or, in the words of VoltDB’s website, companies are “devising new ways to identify and act on fast-moving, valuable data,” and VoltDB helps them “narrow the ‘ingestion-to-decision’ gap from minutes, or even hours, to milliseconds.” Which is why they see the “Data Value Chain” like this:

And describe the “Database Universe” like this:

This is the first attempt I’ve seen to map big data technologies based on what these technologies are trying to achieve and the type of data involved–is it unique (an individual item) or is it a part of a collection of data?–along three dimensions: Time, the value of the data, and application complexity.

The insight behind these charts is that the value of an individual piece of data goes down with time and the value of a collection of data goes up with time. Maybe this should be called “Stonebraker Law.” Mike Stonebraker is the database legend (forty years and counting) behind VoltDB and other big data startups. You can watch him, Bruce, and John Piekos, VoltDB’s  VP of Engineering, here.

[Originally published on Forbes.com]

Posted in Big Data Analytics, Big Data Landscape, Big data market | Leave a comment

A Very Short History of Big Data

In addition to researching A Very Short History of Data Science, I have also been looking at the history of how data became big. Here I focus on the history of attempts to quantify the growth rate in the volume of data or what has popularly been known as the “information explosion” (a term first used in 1941, according to the OED). The following are the major milestones in the history of sizing data volumes plus other “firsts” or observations pertaining to the evolution of the idea of “big data.”

[An updated version of this timeline is at Forbes.com]

Continue reading
Posted in Big Data History, Data Growth | Leave a comment

The Internet of Things: Why now and how big?

Now that it has been established that the Internet of Things is the most hyped “emerging technology” today, and that the term—and the associated technologies—is far from being new, the only question to be answered is Why the sudden surge in interest in 2014?

That’s the question I put to a number of tech luminaries earlier this year. Bob Metcalfe, inventor of the Ethernet and now Professor of Innovation at University of Texas at Austin, is familiar with the sudden prominence of technologies, coming after lengthy incubation periods. Metcalfe points to scribbles like me as the main culprit: “It’s a media phenomenon. Technologies and standards and products and markets emerge slowly, but then suddenly, chaotically, the media latches on and BOOM!—It’s the year of IoT.” Hal Varian, Chief Economist at Google, believes Moore’s Law has something to do with the newfound interest in the IoT: “The price of sensors, processors, and networking has come way down.  Since WiFi is now widely deployed, it is relatively easy to add new networked devices to the home and office.”

Janus Bryzek, known as “the father of sensors” (and a VP at Fairchild Semiconductor), thinks there are multiple factors “accelerating the surge” in interest. First, there is the new version of the Internet Protocol, IPv6, “enabling almost unlimited number of devices connected to networks.” Another factor is that four major network providers—Cisco, IBM, GE and Amazon—have decided “to support IoT with network modification, adding Fog layer and planning to add Swarm layer, facilitating dramatic simplification and cost reduction for network connectivity.” Last but not least, Bryzek mentions new forecasts regarding the IoT opportunity, with GE estimating that the “Industrial Internet” has the potential to add $10 to $15 trillion (with a “T”) to global GDP over the next 20 years, and Cisco  increasing to $19 trillion its forecast for the economic value created by the “Internet of Everything” in the year 2020.  “This is the largest growth in the history of humans,” says Bryzek.

These mind-blowing estimates from companies developing and selling IoT-related products and services, no doubt have helped fuel the media frenzy. But what do the professional prognosticators say? Gartner estimates that IoT product and service suppliers will generate incremental revenue exceeding $300 billion in 2020. IDC forecasts that the worldwide market for IoT solutions will grow from $1.9 trillion in 2013 to $7.1 trillion in 2020.

Other research firms focus on slices of this potentially trillion-dollar market such as connected cars, smart homes, and wearables. Here’s a roundup of estimates and forecasts for various segments of the IoT market:

ABI Research:  The installed base of active wireless connected devices will exceed 16 billion in 2014, about 20% more than in 2013. The number of devices will more than double from the current level, with 40.9 billion forecasted for 2020. 75% of the growth between today and the end of the decade will come from non-hub devices: sensor nodes and accessories. The chart above is from ABI’s research on smart cars.

Acquity Group (Accenture Interactive): More than two thirds of consumers plan to buy connected technology for their homes by 2019, and nearly half say the same for wearable technology. Smart thermostats are expected to have 43% adoption in the next five years (see chart below).

IHS Automotive: The number of cars connected to the Internet worldwide will grow more than sixfold to 152 million in 2020 from 23 million in 2013.

Navigant Research: The worldwide installed base of smart meters will grow from 313 million in 2013 to nearly 1.1 billion in 2022.

Morgan Stanley: Driverless cars will generate $1.3 trillion in annual savings in the United States, with over $5.6 trillions of savings worldwide.

Machina Research: Consumer Electronics M2M connections will top 7 billion in 2023, generating $700 billion in annual revenue.

On World: By 2020, there will be over 100 million Internet connected wireless light bulbs and lamps worldwide up from 2.4 million in 2013.

Juniper Research: The wearables market will exceed $1.5 billion in 2014, double its value in 2013–

Endeavour Partners: As of September 2013, one in ten U.S. consumers over the age of 18 owns a modern activity tracker. More than half of U.S. consumers who have owned a modern activity tracker no longer use it. A third of U.S. consumers who have owned one stopped using the device within six months of receiving it.

Originally published on Forbes.com

Posted in Misc | Leave a comment

When Will Human-Level AI Arrive? Ray Kurzweil (2029) and Rodney Brooks (2117++)

Source: IEEE Spectrum

See also:

AI Researchers Predict Automation of All Human Jobs in 125 Years

Robot Overlords: AI At Facebook, Amazon, Disney And Digital Transformation At GE, DBS, BNY Mellon

Posted in AI | Tagged | Leave a comment

Why Ones and Zeros Are Eating the World

30 years ago today, Steve Jobs unveiled the Macintosh. More accurately, The Great Magician took it out of a bag and let it talk to us. The Macintosh, as I learned from first-hand experience in 1984, was a huge leap forward compared to the PCs of the time. But I couldn’t have written and published the previous words and shared a digitized version of Jobs’ performance so easily, to a potential audience of 2.5 billion people, without two other inventions, the Internet and the Web.

45 years ago this year (October 29, 1969), the first ARPANET (later to be known as the Internet) link was established between UCLA and SRI. 25 years ago this year (March 1989), Tim Berners-Lee circulated a proposal for “Mesh” (later to be known as the World Wide Web) to his management at CERN.

The Internet started as a network for linking research centers. The World Wide Web started as a way to share information among researchers at CERN. Both have expanded to touch today a third of the world’s population because they have been based on open standards. The Macintosh, while a breakthrough in human-computer interaction, was conceived as a closed system and did not break from the path established by its predecessors: It was a desktop/personal mainframe. One ideology was replaced by another, with very little (and very controlled) room for outside innovation. (To paraphrase Search Engine Land’s Danny Sullivan, the big brother minions in Apple’s “1984” Super Bowl ad remind one of the people in Apple stores today).

This is not a criticism of Jobs, nor is it a complete dismissal of closed systems. It may well be that the only way for his (and his team’s) design genius to succeed was by keeping complete ownership of their proprietary innovations. But the truly breakthrough products they gave us—the iPod (and iTunes), and especially the iPhone (and “smartphones”)—were highly dependent on the availability and popularity of an open platform for sharing information, based on the Internet and the Web.

Creating a closed and proprietary system has been the business model of choice for many great inventors and some of the greatest inventions of the computer age. That’s where we were headed towards in the early 1990s: The establishment of global proprietary networks owned by a few computer and telecommunications companies, whether old (IBM, AT&T) or new (AOL). Tim Berners-Lee’s invention and CERN’s decision to offer it to the world for free in 1993 changed the course of this proprietary march, giving a new—and much expanded—life to the Internet (itself a response to proprietary systems that did not inter-communicate) and establishing a new, open platform, for a seemingly infinite number of applications and services.

As Bob Metcalfe told me in 2009: “Tim Berners-Lee invented the URL, HTTP, and HTML standards… three adequate standards that, when used together, ignited the explosive growth of the Web… What this has demonstrated is the efficacy of the layered architecture of the Internet. The Web demonstrates how powerful that is, both by being layered on top of things that were invented 17 years before, and by giving rise to amazing new functions in the following decades.”

Metcalfe also touched on the power and potential of an open platform: “Tim Berners-Lee tells this joke, which I hasten to retell because it’s so good. He was introduced at a conference as the inventor of the World Wide Web. As often happens when someone is introduced that way, there are at least three people in the audience who want to fight about that, because they invented it or a friend of theirs invented it. Someone said, ‘You didn’t. You can’t have invented it. There’s just not enough time in the day for you to have typed in all that information.’ That poor schlemiel completely missed the point that Tim didn’t create the World Wide Web. He created the mechanism by which many, many people could create the World Wide Web.”

“All that information” was what the Web gave us (and what was also on the mind of one of the Internet’s many parents, J.C.R. Licklider, who envisioned it as a giant library). But this information comes in the form of ones and zeros, it is digital information. In 2007, when Jobs introduced the iPhone, 94% of storage capacity in the world was digital, a complete reversal from 1986, when 99.2% of all storage capacity was analog. The Web was the glue and the catalyst that would speed up the spread of digitization to all analog devices and channels for the creation, communications, and consumption of information.  It has been breaking down, one by one, proprietary and closed systems with the force of its ones and zeros.

Metcalfe’s comments were first published in ON magazine which I created and published for my employer at the time, EMC Corporation. For a special issue (PDF) commemorating the 20th anniversary of the invention of the Web, we asked some 20 members of the Inforati how the Web has changed their and our lives and what it will look like in the future. Here’s a sample of their answers:

Guy Kawasaki: “With the Web, I’ve become a lot more digital… I have gone from three or four meetings a day to zero meetings per day… Truly the best will be when there is a 3-D hologram of Guy giving a speech. You can pass your hand through him. That’s ultimate.”

Chris Brogan: “We look at the Web as this set of tools that allow people to try any idea without a whole lot of expense… Anyone can start anything with very little money, and then it’s just a meritocracy in terms of winning the attention wars.”

Tim O’Reilly: “This next stage of the Web is being driven by devices other than computers. Our phones have six or seven sensors. The applications that are coming will take data from our devices and the data that is being built up in these big user-contributed databases and mash them together in new kinds of services.”

John Seely Brown: “When I ran Xerox PARC, I had access to one of the world’s best intellectual infrastructures: 250 researchers, probably another 50 craftspeople, and six reference librarians all in the same building. Then one day to go cold turkey—when I did my first retirement—was a complete shock. But with the Web, in a year or two, I had managed to hone a new kind of intellectual infrastructure that in many ways matched what I already had. That’s obviously the power of the Web, the power to connect and interact at a distance.”

Jimmy Wales: “One of the things I would like to see in the future is large-scale, collaborative video projects. Imagine what the expense would be with traditional methods if you wanted to do a documentary film where you go to 90 different countries… with the Web, a large community online could easily make that happen.”

Paul Saffo: “I love that story of when Tim Berners-Lee took his proposal to his boss, who scribbled on it, ‘Sounds exciting, though a little vague.’ But Tim was allowed to do it. I’m alarmed because at this moment in time, I don’t think there are any institutions our there where people are still allowed to think so big.”

Dany Levy (founder of DailyCandy): “With the Web, everything comes so easily. I wonder about the future and the human ability to research and to seek and to find, which is really an important skill. I wonder, will human beings lose their ability to navigate?”

Howard Rheingold: “The Web allows people to do things together that they weren’t allowed to do before. But… I think we are in danger of drowning in a sea of misinformation, disinformation, spam, porn, urban legends, and hoaxes.”

Paul Graham: “[With the Web] you don’t just have to use whatever information is local. You can ship information to anyone anywhere. The key is to have the right filter. This is often what startups make.”

How many startups have flourished on the basis of the truly great products Apple has brought to the world? And how many startups and grown-up companies today are entirely based on an idea first flashed out in a modest proposal 25 years ago? And there is no end in sight for the expanding membership in the latter camp, now also increasingly including the analogs of the world. All businesses, all governments, all non-profits, all activities are being eaten by ones and zeros. Tim Berners-Lee has unleashed an open, ever-expanding system for the digitization of everything.

We also interviewed Berners-Lee in 2009. He said that the Web has “changed in the last few years faster than it changed before, and it is crazy to for us to imagine this acceleration will suddenly stop.” He pointed out the ongoing tendency to lock what we do with computers in a proprietary jail: “…there are aspects of the online world that are still fairly ‘pre-Web.’ Social networking sites, for example, are still siloed; you can’t share your information from one site with a contact on another site.” But he remained both realistic and optimistic, the hallmarks of an entrepreneur: “The Web, after all, is just a tool…. What you see on it reflects humanity—or at least the 20 percent of humanity that currently has access to the Web… No one owns the World Wide Web, no one has a copyright for it, and no one collects royalties from it. It belongs to humanity, and when it comes to humanity, I’m tremendously optimistic.”

[Originally published on Forbes.com]

Posted in Digitization | Leave a comment

The Startup Unicorn Explosion (Infographic)

CB Insights:

We looked at all still-private unicorns since 2011 and charted them based on when they first joined the unicorn club. While initially the chart shows unicorns being created at a relatively calm pace, the rhythm accelerates noticeably in late 2013 (right around the time Aileen Lee wrote her famous post coining the term unicorn in November 2013). Since then, there has been an explosion in unicorn creation, with over 60 new unicorns in 2015 alone.

See also The Unicorn List (updated in real-time)

Posted in startups | Tagged , | Leave a comment

The Big Data Explosion (Infographic)

Lotsa data in this Infographic about data growth

Continue reading

Posted in Big Data Analytics, Data Growth, Infographics | Leave a comment

The Data Science Interview: Mingsheng Hong, Hadapt

Data scientists are data junkieswhen they see a new data set they are just naturally excited and can’t wait to explore.

Mingsheng Hong is Chief Data Scientist at Hadapt, a Boston-based startup that offers an analytical platform that integrates structured and unstructured data in one cloud-optimized system. Before joining Hadapt, Mingsheng was Field CTO for Vertica. He holds a Ph.D. in Computer Science from Cornell and a BSc in Computer Science from Fudan University. Mingsheng is president of NECINA and is active in St. Baldrick’s Foundation, a volunteer-driven charity that funds research to find cures for childhood cancers. I talked to Mingsheng just before he shaved his head, a visual indicator and act of solidarity expected from successful St. Baldrick’s fundraisers.

As a graduate student, were you thinking of an academic career?

At Cornell, I explored both academic and private industry career tracks. I love research and innovation, and discovered my passion for explaining ideas to people from various backgrounds and getting them excited about these ideas. While that aligns with a more academic track, in the end I decided the private sector was a better fit for me. I’m driven by the challenge of taking an idea and carrying it end-to-end, from idea to product development to sales. During graduate school, I had the opportunity to visit Microsoft for a few summers, and I got a lot of exposure to database R&D and came away with a good feel for the industry. My research work there was commercialized in SQL Server 2008 and 2012, which was very exciting.   Continue reading

Posted in Data Scientists | Leave a comment