Whats the Big Data

Predicting the Presidential Election: What Went Wrong? (Part 2)

Posted on November 11, 2016 by GilPress

polls_exitchange

When asked about what qualities matter most, about four in 10 people picked the ability to bring about change, and Mr. Trump won more than 80% of their votes. Mrs. Clinton was heavily favored by voters who put more value on someone who “cares about people like me,” has good judgment and, especially, has the right experience.

Bloomberg Businessweek

Long before election night, Trump’s data operatives, in particular those contracted from Cambridge Analytica, understood that his voters were different. And to better understand how they differed from Ryan-style Republicans, they set off to study them.

The firm called these Trump supporters “disenfranchised new Republicans”: younger than traditional party loyalists and less likely to live in metropolitan areas. They share Bannon’s populist spirit and care more than other Republicans about three big issues: law and order, immigration, and wages.

They also harbored a deep contempt for the reigning political establishment in both parties, along with a desire to return the country to happier times. Trump was the key that fit in this lock.

Posted in Data Science, Misc | Leave a comment

Predicting the Presidential Election: What Went Wrong?

Posted on November 10, 2016 by GilPress

polls_preselections16

KDnuggets:

…a good lesson for Data Scientists is to question their assumptions and to be especially skeptical when predicting a rare event with limited history using human behavior.

Posted in Data Science, Misc | Tagged KDnuggets | Leave a comment

Machine Learning and AI Market Landscape, 2016

Posted on November 9, 2016 by GilPress

mi-landscape-3-7

Shivon Zilis and James Cham, O’Reilly:

For the first time, a “one stop shop” of the machine intelligence stack is coming into view—even if it’s a year or two off from being neatly formalized. The maturing of that stack might explain why more established companies are more focused on building legitimate machine intelligence capabilities. Anyone who has their wits about them is still going to be making initial build-and-buy decisions, so we figured an early attempt at laying out these technologies is better than no attempt.

Shivon Zilis and James Cham, Harvard Business Review:

If this year’s landscape shows anything, it’s that the impact of machine intelligence is already here. Almost every industry is already being affected, from agriculture to transportation. Every employee can use machine intelligence to become more productive with tools that exist today. Companies have at their disposal, for the first time, the full set of building blocks to begin embedding machine intelligence in their businesses.

And unlike with the internet, where latecomers often bested those who were first to market, the companies that get started immediately with machine intelligence could enjoy a lasting advantage.

Posted in AI, Machine Learning | Tagged HBR, O'Reilly Media | Leave a comment

Country Ranking of IoT Preparedness

Posted on November 8, 2016 by GilPress

iot_idcindex

IDC:

[This is an] updated index ranking the Group of 20 (G20) nations on their preparedness for Internet of Things (IoT) development. The original index was first published in 2013 but this updated index is now comprised of 13 criteria that IDC views as necessary for sustained development of the IoT and reflects each nation’s economic stature, technological preparedness, and business readiness to benefit from the efficiencies linked to IoT solutions.

The United States, South Korea, and the United Kingdom ranked as the three countries most ready to generate and benefit from the IoT. The U.S. scored particularly well on measures such as ease of doing business, government effectiveness, innovation, and cloud infrastructure, as well as GDP and technology spending as a percent of GDP. South Korea, despite a modest GDP, scored extremely well on IoT-specific spending and has a business environment that fosters innovation and promotes attractive investment opportunities. Similarly, the U.K. scored very highly on measures of ease of doing business, government effectiveness, regulatory quality, start-up procedures, innovation, and broadband penetration.

The standout country in the ranking proved to be Australia, which, despite its relatively small GDP, scored exceptionally high on ease of doing business and start-up procedures, government effectiveness and regulatory quality, and innovation and education. Australia’s scores point to a country that has the necessary ingredients for a business environment that is ready for the growth of IoT.

Posted in Internet of Things | Tagged IDC | Leave a comment

Mobile advertising now accounts for nearly half of online ad budgets

Posted on November 4, 2016 by GilPress

mobile_ad

Financial Times:

Spending on mobile advertising in the US soared 89 per cent to $15.5bn in the first half of the year, taking up nearly half of online ad budgets, new data show. Mobile makes up 47 per cent of all online ad expenditures — up from 30 per cent a year ago and far surpassing the 19 per cent share taken by banner ads, according to a report from the Interactive Advertising Bureau and PwC, the professional services firm.

Posted in Misc | Leave a comment

9 Categories of Data Scientists

Posted on November 3, 2016 by GilPress

DataViz:

Those strong in statistics: they sometimes develop new statistical theories for big data, that even traditional statisticians are not aware of. They are expert in statistical modeling, experimental design, sampling, clustering, data reduction, confidence intervals, testing, modeling, predictive modeling and other related techniques.
Those strong in mathematics: NSA (national security agency) or defense/military people working on big data, astronomers, and operations research people doing analytic business optimization (inventory management and forecasting, pricing optimization, supply chain, quality control, yield optimization) as they collect, analyse and extract value out of data.
Those strong in data engineering, Hadoop, database/memory/file systems optimization and architecture, API’s, Analytics as a Service, optimization of data flows, data plumbing.
Those strong in machine learning / computer science (algorithms, computational complexity)
Those strong in business, ROI optimization, decision sciences, involved in some of the tasks traditionally performed by business analysts in bigger companies (dashboards design, metric mix selection and metric definitions, ROI optimization, high-level database design)
Those strong in production code development, software engineering (they know a few programming languages)
Those strong in visualization
Those strong in GIS, spatial data, data modeled by graphs, graph databases
Those strong in a few of the above. After 20 years of experience across many industries, big and small companies (and lots of training), I’m strong both in stats, machine learning, business, mathematics and more than just familiar with visualization and data engineering. This could happen to you as well over time, as you build experience. I mention this because so many people still think that it is not possible to develop a strong knowledge base across multiple domains that are traditionally perceived as separated (the silo mentality). Indeed, that’s the very reason why data science was created.

Posted in Data Science Careers | Tagged dataviz | Leave a comment

Fintech Financing Worldwide 2010-2015

Posted on November 2, 2016 by GilPress

fintech_segment

fintech_financing

Posted in Misc | Tagged Accenture | Leave a comment

Updated Laws of Robotics

Posted on November 1, 2016 by GilPress

futurism_roboticslaws

Posted in Robotics | Tagged Futurism | Leave a comment

The Evolution of Data Scientists, 2006-2016

Posted on October 26, 2016 by GilPress

datascientist_2006

datascientist_2011

datascientist_2016

Gaurav Vohra, CEO & Co-founder Jigsaw Academy:

Here are my predictions for 2021.

Machine learning and deep learning will become much more popular. More data and better processing power will enable a lot more analysis of different data. Those who develop these skills will be in demand.
There will be a lot more data generated through internet of things. This data will be bigger and messier. Data scientists who develop skills to work with IoT data will have an advantage.
Specialized roles will continue to evolve. Specializations will become more logical and some of the confusion around them today will disappear in the next 5 years.
Analytics will play an important role in hiring for analytics (and all other roles). We are already seeing evidence of this and I think data driven hiring is coming very soon.

Posted in Data Science Careers, Data Scientists | Leave a comment

DeepBench from Baidu: Benchmarking Hardware for Deep Learning

Posted on October 23, 2016 by GilPress

Source: Greg Diamos and Sharan Narang, “The need for speed: Benchmarking deep learning workloads,” O’Reilly AI Conference

At the O’Reilly Artificial Intelligence conference, Baidu Research announced DeepBench, an open source benchmarking tool for evaluating the performance of deep learning operations on different hardware platforms. Greg Diamos and Sharan Narang of Baidu Research’s Silicon Valley AI Lab talked at the conference about the motivation for developing the benchmark and why faster computers are crucial to the continued success of deep learning.

The harbinger of the current AI Spring, deep learning is a machine learning method using “artificial neural networks,” moving vast amounts of data through many layers of hardware and software, each layer coming up with its own representation of the data and passing what it “learned” to the next layer. As a widely publicized deep learning project has demonstrated four years ago, feeding such an artificial neural network with images extracted from 10 million videos can result in the computer (in this case, an array of 16,000 processors) learning to identify and label correctly an image of a cat. One of the leaders of that “Google Brain” project was Andrew Ng, who is today the Chief Scientist at Baidu and the head of Baidu Research.

Research areas of interest to by Baidu Research include image recognition, speech recognition, natural language processing, robotics, and big data. Its Silicon Valley AI Lab has deep learning and systems research teams that work together “to explore the latest in deep learning algorithms as well as find innovative ways to accelerate AI research with new hardware and software technologies.”

DeepBench is an attempt to accelerate the development of the hardware foundation for deep learning, by helping hardware developers optimize their processors for deep learning applications, and specifically, for the “training” phase in which the system learns through trial and error. “There are many different types of applications in deep learning—if you are a hardware manufacturer, you may not understand how to build for them. We are providing a tool for people to help them see if a change to a processor [design] improves performance and how it affects the application,” says Diamos. One of the exciting things about deep learning for him (and no doubt for many other researchers) is that “as the computer gets faster, the application gets better and the algorithms get smarter.”

Case in point is speech recognition. Or more specifically, DeepSpeech, Baidu Research’s “state-of-the-art speech recognition system developed using end-to-end deep learning.” The most important aspect of this system is its simplicity, says Diamos, with audio on one end, text on the other end, and a single learning algorithm (a recurring convolutional neural network), sitting in the middle. “We can take exactly the same architecture and apply it to both English and Mandarin with greater accuracy than systems we were building in the past,” says Diamos.

In Mandarin, the system is more accurate in transcribing audio to text than native speakers, as the latter may have difficulty understanding what is said because of noise level or accent. Indeed, the data set used by DeepSpeech is very large because it was created by mixing hours of synthetic noise with the raw audio, explains Narang. The largest publicly available data set is about 2000 hours of audio recordings while the one used by DeepSpeech clocks in at 100,000 hours or 10 terabytes of data.

The approach taken by the developers of DeepSpeech is superior to other approaches argue Narang and Diamos. Traditional speech recognition systems using a “hand-designed algorithm,” get more accurate with more data but eventually saturate, requiring a domain expert to develop a new algorithm. The hybrid approach adds a deep convolutional neural network. The result is better scaling but again the performance eventually saturates. DeepSpeech uses deep learning as the entire algorithm and achieves continuous improvement in performance (accuracy) with larger data sets and larger models (more and bigger layers).

Bigger is better. But to capitalize on this feature (pun intended) of deep learning, you need faster computers. “The biggest bottleneck,” says Narang, “is training the model.” He concludes: “Large data sets, a complex model with many layers, and the need to train the model many times is slowing down deep learning research. To make rapid progress, we need to reduce model training time. That’s why we need tools to benchmark the performance of deep learning training. DeepBench allows us to measure the time it takes to perform the underlying deep learning operation. It establishes a line in the sand that will encourage hardware developers to do better by focusing on the right issues.”

Originally published on Forbes.com

Posted in deep learning | Tagged Baidu | Leave a comment

Gaurav Vohra, CEO & Co-founder Jigsaw Academy:

Categories

Archives