Machine Learning Overview

Machine learning is a type of data analysis where analytical model building is automated. Algorithms are designed to build models with the purpose of uncovering connections that organizations can use to make better decisions without human intervention. Although machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data is a recent development.

At Intersys, our highly skilled consultants will design and build analytical models to extract valuable insights from your data. These models can be designed to perform various different tasks. Some of the use cases we have recently developed for organizations include predictive modeling using deep learning, data segmentation/clustering, and A/B testing implementations.

Machine Learning Use Case #1 – Predictive Modeling Using Deep Learning

 

Outline of the Concept

In certain cases, important relations embedded in your data are hard to extract by means of traditional methods. In such cases, neural nets can be leveraged to develop classifiers. Once the network is trained it then can be used by any business unit with low latency.

Customer/ Business Need

Massive data is stored in data lakes while different business units own particular datasets, making it difficult to combine diverse datasets to find actionable insight. One example is in the field of goods distribution. One common challenge for goods distributors is understanding the appropriate quantity to order of a certain product. This is an optimization problem where you want to avoid either short or excess of inventory. By collecting the history of transactions of similar products and implementing a neural net able to determine the level of demand for that product, it is possible to inform the decision of how many units to order.

Intersys Solution and Application

The first step is to focus attention on a Key Performance Indicator and understand how it is affected. A process of variable selection is conducted, and the selected data is moved to a different layer. Several network topologies are tested using a set combination of key parameters such as the learning rate and cost function. The resulting net is deployed in the production environment by wrapping it inside a RESTful service.

Technologies Used

R, Python, Java, Spring


Machine Learning Use Case #2 – Data Segmentation/Clustering

 

Outline of the Concept

Important individual differences exist among items within your datasets. Understanding and exploiting these differences is key to extract meaningful insight to support decision making.

Customer/Business Need

Organizations apply data segmentation/clustering to more easily identify meaningful segments in their data lakes: best clients, potential new clients, best geographical areas, best products, and opportunities for up-selling to name a few. Once segments are identified, actions such as marketing campaigns can be deployed to more valuable segments rather than to the whole population.

Intersys Solution and Application

Once your data is properly stored in a data lake, an analysis is performed over the different data sources to identify those variables that better distribute groups. An extraction job is created to move the selected data to a different layer where a clustering analysis is performed to classify each item into a single group. The new enriched data is then sent to the right business unit.

Technologies Used

Clustering (Spark ML, R, Python), relational or NoSQL databases


Machine Learning Use Case #3 – A/B Testing: UX Optimization Driven by Solid Science

 

Outline of the Concept

A/B testing is comparing two versions of a web page to see which one performs better. Performance is based on conversion events. There can be macro-conversions (create account, download demo, purchase, etc.) and micro-conversions (spend more than 1 min in a particular page, navigate more than three pages, etc.) events.

Customer/ Business Need

Improve UX to increase conversion. There are three stages were solid science is needed to properly leverage this technique:

  1. Hypothesis definition: Running A/B tests is expensive. Rather than randomly design experiments, it’s much better to find data patterns that can be translated into well-defined hypotheses.
  2. Estimation of test duration: Several variables must be considered when estimating a test duration. It’s also important to understand the difference between the frequentist and the Bayesian approaches.
  3. Implementation: It’s paramount to have in place the required pipelines to proper assign test conditions and to record test outcomes.

Intersys Solution and Application

  1. Hypothesis definition: Techniques such as funnel analysis, classification trees and Goals-Operators-Methods-Selections Rules (GOMS) are applied to backup promising experiments.
  2. Estimation of test duration: A data profile is run to understand the traffic to the different pages. The expected change is estimated based on the hypothesis. If there is enough prior data, a Bayesian approach is used as it requires less time to identify a winner condition.
  3. Implementation: We use feature flags and bucketing to properly assign visitors to the right experimental conditions. Conversions are properly record per condition. In case that a condition wins over the other, an alert is sent to notify users.

Technologies Used

Bayesian and frequentist modeling (hypothesis design), Launch Darkly (feature management & bucketing), Kafka (streaming), Cassandra (storage), Java/Scala (development)

Ready to Get Started?

To learn more about how Intersys can help you with your next data or digital initiative, please fill out the information below and a member of our team will get back with you shortly.

  • This field is for validation purposes and should be left unchanged.