Open Data Science Workshops


Big Data and the Internet of Things

By Melanie Jutras


It seems hard to believe, but Big Data is about to become even bigger. This is, in part, due to the increased data being produced by the Internet of Things (IoT). The IoT could potentially involve as many as 50 billion connected devices by the year 2020.(1) Consider the staggering amounts of data that might be produced by this many devices. How can we access this data, how can we analyze it and how can we put it to good use? Dr. Kirk Borne, one of the most knowledgeable Data Science Speakers in the world will be speaking at ODSC East regarding his views on open data and how we can put it to good use.

Learn more about this topic and others just like it at one of our upcoming Open Data Science Workshops, trainings and tutorials.

Although Dr. Borne is an expert in the field of Data Science, he also feels strongly about the importance of people from all professions becoming data literate.(2) Everybody needs to understand what Big Data is and how we can use it. Now at the advent of a data surge due to the Internet of Things data literacy and user friendly tools will become increasingly important.

When we talk about the Internet of Things, people tend to think of personal devices and gadgets used by individuals. While gadgets such as wearable technology and smart home monitoring devices are part of the picture, they barely scratch the surface. The IoT is essentially made up of sensors that can be placed anywhere in order to collect data. Sensors will collect data in industries such as agriculture, automotive and retail.

Among other things, they will be used for security, monitoring and automation. If you stop to think about how much data might be produced from 50 billion connected devices, you will soon realize that the Internet of Things is really not so much about the devices that are connected, rather it is about the enormous data stream that these “things” will produce.

A recent blog post by Dr. Vincent Granville, highlights a number of sensor data set repositories.(3) Data sets that have been collected across many sectors including energy, healthcare, weather and transportation are available for viewing and analysis. One of these data sets published by Microsoft Research provides 15 million data points related to sensor data collected from taxi cabs in order to research driving directions.(4)

More information on this data can be found in the research paper, T-Drive: Driving Directions based on Taxi Traces.(5)

Another sensor data set that is provided involves 160 million observations recorded by 20 thousand weather stations published on datahub by the Linking Open Data Cloud organization.(6) .These are just a couple of examples of data that is being collected every day and available for analysis.

While it is powerful to have the ability to gather millions of pieces of sensor data, the next obvious problem is dealing with the data. How does one go about managing and analyzing such a large data set? There are various products available for analysis and visualization of enterprise data. A list of some of these can be found in a recent Data Science Central blog post, Eight IoT Analytics Products.(7)

Popular commercial products such as Dell Statistica and IBM IoT Platform are highlighted. These are valuable for professionals in many different fields who may not be data science researchers, yet they need to deal with Big Data. Another one of the analytics products highlighted is Intel IoT Analytics Platform.

Their IoT cloud analytics site is provided as a service to the IoT development community. Intel’s Internet of Things group has played a key role in an ongoing project building an open cloud-based platform to accelerate cancer research.(8) This is just one example of using data for social good. We all ought to be thinking about what types of questions can be answered from Big Data and how we can put it to good use. With the amount of expected data available from connected devices, the possibilities for analysis are endless.

Find this blog useful?

Help others read it by commenting and sharing.

  6. Another sensor data set that is provided involves 160 million observations recorded by 20 thousand weather stations published on datahub by the Linking Open Data Cloud organization (6)

This blog was originally posted at