Big Data Buzzwords for 2015 And What They Mean

This was applicable five years ago and we are still introducing people to these terms. We are knee deep into implementations, but the buzz words still hide the real meaning and value.

Vin Vashishta | Originally Published: December 22nd, 2014

Book time with me for Career Coaching or sign up for my Business Strategy for Data Scientists class.

It is the end of the year and everywhere you turn there are predictions for 2015. The predictions for the future of data are some of the toughest to decipher. Analytics and big data easily have the most obscure vernacular with new terms and acronyms being tossed around all the time. If you are looking for some clarity, you are not alone. I have compiled a list of the big data buzzwords for the coming year. Let me know which ones you want more clarity on in the comments and I will get those added.

IoT (Internet of Things) and IoE (Internet of Everything) Now 5G

The Internet of Things is the term used for giving electronics and machines access to the internet and the ability to gather data.

Thousands of startups and the top names in tech are jumping on this trend now and into 2015. This one is not a fad and it will have a BIG impact on 2015 and beyond. It is starting in homes, offices, and manufacturing plants. Everything from thermostats to refrigerators to projectors to assembly line robotics are gathering information and then transmitting that data over the internet for analysis.

What can you get from a refrigerator? Grocery buying and diet habits are among the data points businesses would be interested in. From a feature standpoint, a consumer would be able to auto-generate a shopping list from that data. The hypothetical applications are endless which is why it is generating so much buzz.

The Internet of Everything is the next step for the internet of things, connecting people, data, cities, coffee mugs…literally everything together.

Although companies like Cisco are preparing for this now, the Internet of Everything is probably a few years out (welcome to a few years out). The infrastructure and protocols for the Internet of Everything are being built now because the ramp up needed will be huge. Everything from internet connection speeds to the architecture of the web need to be modified to accommodate that many connections. Many are speculating that the Internet of Everything is the end of the web. In its current form, the World Wide Web cannot support the Internet of Everything and its potential uses. Large infrastructure companies are investing heavily in preparing for the transition. Why? Cisco estimates the Internet of Everything to be “a $19 trillion global opportunity over the next decade.” This trend will be much more transformative than the Internet of Things but is also farther out in the future.

Quantified Self

Quantified Self is the term used to describe the personal, consumer use of data about ourselves and our habits.

Athletes are the biggest participants in Quantified Self. They track data about their diet, exercise regime, weight, body fat, hydration levels, oxygen levels and more to maximize their progress towards specific fitness goals. The same techniques which are being automated for business optimization are also being automated for consumer use. In 2015, apps that allow people to track data points about themselves for fitness, general health, stress management, career planning, education and much more will take off. Sensors and wearables will enable the collection of data for these apps and analytics will help people optimize their daily lives to achieve their goals.

Artificial Intelligence, Neural Networks, and Machine Learning

If you are a data scientist, these terms are part of your job. In 2015, they will enter the vocabulary of nearly everyone who consumes analytics as well. It is good to understand them at a high level so conversations about software that utilize them make more sense.

Machine Learning is just what it sounds like; teaching a machine how to learn something specific then apply that in a useful way.

Machine learning is used to predict an unknown based on what we do know. If I have some demographic information about a person and I want to predict what political party they associate with, I would use a machine learning algorithm (algorithm is a $10 word for equation or set of equations or model). You will hear two subcategories of machine learning: supervised and unsupervised learning. Supervised machine learning uses large datasets to train and hone the model. These datasets already have the correct answer filled in so the model can be trained to fit to the data. Unsupervised learning uses data as well, but the answers are not known in those datasets. We use a few techniques to help a machine learn to spot meaningful patterns in large datasets that help predict a specific trait or steps that lead to the desired outcome.

Think of machine learning in terms of your own learning. In school we learn by seeing example after example of a concept until we can recognize the pattern behind it and apply the concept to novel situations. Later in life we do not always have a teacher and we learn concepts through experience. That is, more or less, how machine learning works too.

Neural Networks are a machine learning tool which works well for complex problem solving.

Neural Networks’ architectures used to be inspired by how biological neural networks function. We have come a long way since 2015. They are incredibly good at solving problems that other machine learning models struggle with. It is a complex topic but that is really all you need to know to understand it at a high level.

Machine Learning and Neural Networks are what are used under the hood of data science applications to turn data into insights. Many players are working on reducing the amount of effort involved in data science. Applications which use Neural Networks and Machine Learning automate what data scientists are creating manually now. It will bring advanced analytics within reach of a lot more businesses in 2015 while driving the costs down significantly.

Artificial Intelligence is a hotly contested term. When does a neural network or machine learning algorithm start to qualify as an Artificial Intelligence? That is under debate. Artificial Intelligence in a practical, business application is quite a way off. I know you will hear it thrown around a lot in 2015 especially in ethical discussions. It is also going to be big at the box office with over a half-dozen AI movies coming out next year. Stephen Hawking and Elon Musk believe that AI poses a threat to humanity while others present more rational views on the impacts of AI. It is such a compelling subject with a deep connection to our understanding of consciousness as well as what it means to be intelligent. This term has legs and practical applications. That combination will make for colorful conversations in 2015 about AI.

Data Wrangling

Data Wrangling is what data scientists have to do with raw data to make it manageable and useful.

Data typically starts out in a variety of forms. With sources ranging from spreadsheets to tweets to emails to 3rd party sources, the way data comes to a data scientist is frequently unusable without a significant amount of work. That is what we refer to as Data Wrangling. Some estimates say that Data Wrangling takes up about 70% to 80% of a data scientist’s time. I can speak from personal experience to say that is not far off and I deeply hate Data Wrangling.

It is an expensive and painful problem which means there are several companies working on a solution. 2015 will see these apps save time and sanity for data science teams.

NoSQL

NoSQL is a type of database that can handle data that is not strictly structured.

The variety of data sources requires new types of databases to handle unstructured data. That can be free text data like tweets or emails. It can also be data that defies traditional relational definitions, where the relationship between one point of data and another is not a straight line. These relationships are significant because they help establish patterns for machine learning so having a database that preserves them is a big help.

As the IoT ramps up, the number of data sources will increase in 2015 making applications that can handle diverse types of data useful to data scientists. Cassandra, MongoDB, CouchBase, HBase and many others fall into this category.

Sentiment Analysis and Intent Analysis

Sentiment Analysis mines what people say on social sites, comments on articles, surveys, and reviews as well as behavioral data to determine how they feel about a product, company, or policy.

Sentiment Analysis has a lot of practical business applications. For marketing departments, it is a window into how customers really feel about products, marketing campaigns and brands. For HR departments it helps build a picture of employee engagement. It can be used to gauge how investors feel about a business. 2015 will be filled with new uses for sentiment analysis as well as new tools to help businesses get it done.

Intent Analysis predicts what people are likely to do based on what they are saying and patterns of activity.

An exciting field of predictive analytics is Intent Analysis. By looking at how groups behave and what they are saying it is possible to paint a picture of what they are most likely to do next. It is also possible to model how they will respond to an ad campaign, a product, or a new company policy. In 2015 intent analysis will be used by IT, marketing, HR, compliance, product management, and many other business groups. Businesses looking for a competitive advantage over their competitors are quickly adopting technologies and hiring the people who can bring the benefits of intent analysis to life. That trend will continue into 2015 and beyond as this technology becomes more widely available and more accurate.