Data Mining

What is Data Mining?

Data mining is the use of programmatic (computer) languages, machine learning, statistical algorithms, databases, and compute power leveraged in discovering patterns and trends in large (big) data sets.

With the term being a misnomer, it’s easy for someone to think that data mining has to do with the extraction (mining) of large data sets. It’s actually patter, trend, and predictive indicator extraction from large data sets.

It’s considered a subset of the broader Business Intelligence world but an area where much focus is being placed and technologies and processes are continuing to be developed, refined and mature. It has roots in the old-world of “Data Warehousing” and “Decisions Support” but is evolving into more complex meaningful millennial world of “Artificial Intelligence”, “Machine Learning”, “Deep Learning” and “Predictive Analytics”.

At a high level, there are 3 phases of data mining that spans various tools/technologies and processes. The first phase is Data Exploration & Staging, the second stage is called Model Building & Validation, and the third phase is Analytics Deployment.

Data Exploration & Staging Phase

Source active and/or inactive data, cleanse, transform, enrich and stage data into centralized data washhouses and /or data lakes. Start leveraging tools like CrossEngage or QilkSense to highlight exploratory analyses using statistical and graphical (data visualization) methods.

Machine & Deep Learning Model Building & Validation

This phase involves building and validating the right types of models to answer the forward-looking questions your organization is looking for answers to. Look at predictive performance (stable and accurate results across variability in the question) to select the models that best answer the types of questions being asked. This can be a complicated process that involves applying various models against the same large/big datasets and analyzing the resulting output for the best and most consisted results in order to choose the best model.

Stacking (Stacked Generalizations), Meta-Learning, Boosting and Bagging (Voting, Averaging) and all examples of models that can be employed to get to patterns and answers for desired questions.>

Along the programmatic side of predictive analysis, machine learning can be implemented. This is based on a set of algorithms that attempt to model high-level abstractions in data. Identified models are turned into algorithms aligned with business and industry insights and strategies to then be scripted into actual programs by Expert Data Scientists and Programmers using leading tools and coding languages. These models can then be set-up to with triggers or particular events in a system such as a website or application (mobile or software) on continuously/real-time as data is streamed into the data mining models.

Analytics Deployment

Now that the models have been built and validated, this phase is to generate the analytics, patterns, trends and predictive results to data mining system were created for. It is all about making the models available to users, so they can create predictive models and make specific business decisions, leverage or embed the models into applications (web or mobile), create self-service reports where users request predictions, compare models, views patterns, and trends or gets the most likely answers/predictions to business questions.