Big Data Friday: Borasky’s Law

  • Murphy’s Law: Anything that can go wrong, will go wrong.
  • O’Toole’s Corollary: Murphy was an optimist.
  • Sturgeon’s Law: 95 percent of everything is crap.
  • Mencken’s Law: Nobody ever went broke underestimating the intelligence of the American public.

Borasky’s Law: Sturgeon and Mencken were optimists, too.

Source: What Hath Von Neumann Wrought?

Posted in Misc | Leave a comment

LinkedIn’s Daniel Tunkelang on How to Interview a Data Scientist

Tunkelang: The O’Reilly Strata Conference brings together an incredible community of people working on big data. This year, I decided to do something different for my presentation. Rather than talk about science or technology, I addressed the practical problem of interviewing the candidates to build teams of data scientists.

[slideshare id=16798687&w=427&h=356&sc=no]

Posted in Big Data Analytics, Big Data Jobs, Data Science, Data Science Careers, Data Scientists | Leave a comment

Vincent Granville’s 66 job interview questions for data scientists

 

  1. What is the biggest data set that you processed, and how did you process it, what were the results?
  2. Tell me two success stories about your analytic or computer science projects? How was lift (or success) measured?
  3. What is: lift, KPI, robustness, model fitting, design of experiments, 80/20 rule?
  4. What is: collaborative filtering, n-grams, map reduce, cosine distance?
  5. How to optimize a web crawler to run much faster, extract better information, and better summarize data to produce cleaner databases?
  6. How would you come up with a solution to identify plagiarism?
  7. How to detect individual paid accounts shared by multiple users?
  8. Should click data be handled in real time? Why? In which contexts?
  9. What is better: good data or good models? And how do you define “good”? Is there a universal good model? Are there any models that are definitely not so good?
  10. What is probabilistic merging (AKA fuzzy merging)? Is it easier to handle with SQL or other languages? Which languages would you choose for semi-structured text data reconciliation?

To see the other 56 questions assessing “the technical horizontal knowledge of a senior candidate at a high level” go here 

Posted in Data Science, Data Science Careers, Data Scientists | Leave a comment

Data Science at Netflix with Elastic MapReduce

[youtube http://www.youtube.com/watch?v=oGcZ7WVx6EI]

Posted in Data Science | Leave a comment

DJ Patil at LeWeb, December 2012

[youtube http://www.youtube.com/watch?v=J_CYKk8q1Ao]

Summary of the presentation by Ben Rooney here

Update: Ben Rooney interviews DJ Patil

[youtube http://www.youtube.com/watch?v=0LtzMhr0ZCM]

Posted in Data Science, Data Scientists | Leave a comment

Past Courses in Big Data Analytics and Data Science: Content Online

Past Courses

in Big Data Analytics and Data Science

Content Online

Analyzing Big Data with Twitter (UC Berkeley, School of Information) (Fall 2012)

Introduction to Data Science (Columbia University, Statistics Department) (Fall 2012

Introduction to  Data Science (UC Berkeley, Computer Science) (Spring 2011)

Posted in Big Data Analytics, Big Data Education, Data Science, Data Science Education | Leave a comment

Social Media & Web Analytics Innovation Summit

Join me in Boston this September 13 & 14 for the exclusive Social Media & Web Analytics Innovation Summit – bringing together the industry’s most innovative leaders and professionals for two days in an open and interactive environment.

The event will combine keynote presentations from over 35 industry experts, with interactive breakout sessions and open discussion. There will also be networking opportunities and workshops to share industry insights and innovation with your peers.

Confirmed Speakers include:

– Vice President, Digital Marketing & Analytics, Discovery
– Vice President, Web Analytics, Amazon
– Head, Digital Marketing, Siemens
– Senior Vice President, Research, NBCUniversal
– Director, Product Intelligence, Salesforce
– Senior Director, Personalization & Targeting, CBS Interactive
– Director, Global Social Media, Ancestry
– Director, Business Intelligence, KPMG
– And many more…

Register online at http://analytics.theiegroup.com/social-boston/registration

Posted in Misc | Leave a comment

Big Data Quotes of the Week: August 10, 2012

“With big data, you have only two concerns, but they are, naturally, big ones: where the data will come from and what your company will do with it. Solve these and you have big data licked… IT projects have to be fully buzzword-compliant or they’ll fail. For a big data project, this means Hadoop. If you don’t want to invest staff time and energy learning this technology, do what my client did: Build a virtual server, install MySQL on it, and assign the name “Hadoop” to the server. When your BDSC (big data steering committee) asks if you’ve installed Hadoop, you can answer in the affirmative with a clear conscience”—Bob Lewis    Continue reading

Posted in Big Data Analytics, Data Scientists, Quotes | Leave a comment

Big Data Events

Big Data and Data Science Events

August – November 2012

Last updated August 6, 2012

Highlights: Partner Events

Big Data Innovation Summit September 13-14, Boston

Predictive Analytics World–Government September 17-18, Washington DC

To get a 15% off of the 2 Day and Combo passes, use this code:   WTBDBP12

Predictive Analytics World September 30-October 4, Boston

To get a 15% off of the 2 Day and Combo passes, use this code:   WTBDBP12

Text Analytics World Oct 3-4, Boston

To get a 15% off of the 2 Day and Combo passes, use this code:   WTBDBP12

Predictive Analytics World  November 6-7, Düsseldorf

To get a 15% off of the 2 Day and Combo passes, use this code:   WTBDBP12

Predictive Analytics World November 27-28, London

To get a 15% off of the 2 Day and Combo passes, use this code:   WTBDBP12

The 13th Annual International Conference on Information Reuse and Integration   August 8-10, Las Vegas

The 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining   August 12-16, Beijing, China     Continue reading

Posted in Big Data Analytics, Predictive analytics | Leave a comment

Top 5 Data Science Influencers: July 22, 2012

1. Vincent Granville, AnalyticBridge,@analyticbridge

2. Alex Popescu, MyNoSQL@al3xandru

3. Gregory Piatetsky, KDnuggests@kdnuggets

4. Harish Kotadia, Infosys, @HKotadia

5.  David Smith, Revolution Analytics@revodavid

Source: Traackr

Posted in Data Science | Leave a comment