Data Automation Overview

What is Data Automation? Modern data and analytics are redefining the way companies go to market and gain competitive advantage in their space. In order to successfully realize value from data, companies need to quickly move large amounts of structured/unstructured data to multiple data warehouses, data lakes, and enterprise search systems. Once data is gathered, it then needs to be efficiently managed in order to be able to act and build data products on top of it. The need to juggle systems on-prem and in the cloud at the same time only complicates this process for many companies. One important question that needs to be asked as this trend continues is: How can the pace of data ingestion keep pace with the speed of new and changing data sources? The answer: Data Automation.

At Intersys, our highly skilled consultants will create data automation for you by designing and building the infrastructure and data pipelines needed to move your data to the right systems using purpose fit technologies. Being able to reliably provide this data for analytics, search, reporting, IOT and machine learning use cases is key to an enterprise in the age of big data.

Data Automation Use Case #1 – Speed Build of Data Pipelines

 

Outline of the Concept

Big Data tools and technologies have virtually removed the technical limitations to data processing. While this has been a game changer for many organizations, this capability can also be constrained by data engineers being required to code data ingestion jobs.

At Intersys, we understand that data automation is critical to keep up with ever-changing data sources. We create and leverage tools – such as a code generator – to automate the discovery and creation of data ingestion jobs. Our goal is to discover source metadata and automate the build of data ingestion pipelines to a data lake or any other required endpoint. These processes may still need some enhancement and optimization by the data engineering team, but we can significantly reduce the development time and create scalability, getting data to information consumers faster than ever. Automation is a key tenant of the Intersys Continuous Analytics Framework.

Customer/Business Need

Imagine yourself as a business stakeholder that has just launched a new product. There is a new web and mobile application to support this product. There are lots of questions coming into a new chatbot that was created and a lot of buzz about the product on a popular social network. As the primary product owner, you want access to all this data (and more) without delay.

Many times, you’ll have to wait for data models and data pipelines to be created. This can take days, weeks, or even months depending on your development processes. Instead, data automation makes these raw data sources available in hours, on the same day of the request, enabling reporting, visualization, and exploration for stakeholders, analysts, and data scientists.

Intersys Solution and Application

Our mission at Intersys is to use our time-tested principals plus new tools, technologies, and techniques to reduce your speed to data and insight. We use a model-driven approach (Generator) to automate complex, repetitive development tasks. The Intersys data ingestion pipeline framework achieves this goal by enabling automation of the ingestion of data from multiple different sources. Leveraging our generator, you can easily set up the ingestion of data by type and location to be fed into your data lake or data warehouse.

Technologies Used

Intersys has implemented variants of our data ingestion pipeline automation framework at several client locations. While primarily utilized for big data applications (Storm topologies, Spark jobs, Kafka components, Oozie flows, Sqoop jobs), it has also been deployed as a lightweight development tool that can generate a significant portion of a component for almost any given software architecture. Wherever there is a repetitive data development task – data ingestion, APIs, services, and more – Intersys can significantly reduce cost and time associated with these efforts and, more importantly, get data to the people that need it most without delay.


Data Automation Use Case #2 – Fast Access to Processed Data

 

Outline of the Concept

The goal of modern data engineering is to process and store vast amounts of data, often in near-real time. Additionally, making the data accessible is just as crucial once data has been produced. A typical approach to accomplish this is to provide data access through web services or APIs, which must be tested thoroughly to provide low-latency access at sufficient throughput, with reliability levels that consumers require.

Customer/Business Need

Click-stream event data that continually streams into distributed databases can help a business gauge customer behavioral patterns in near-real time and identify trends and anomalies as they emerge. To develop web services and APIs to access this data, detailed knowledge of the domain events is required, as well as know-how for building performant code, queries, and load tests.

Intersys Solution and Application

Intersys has developed a method that has been tested and proven at major banks, insurance companies, retailers, and software development firms. This method can achieve 4x to 40x improvement in development time, reduce defects, and dramatically improve architectural governance over the development process. We start by identifying candidate architectures (if one is not already selected), and then build a single end-to-end implementation using best practices for aspects such as logging, exception handling, monitoring, and performance.

We then use a model-driven development approach to analyze all of the development work-products that were involved in creating the solution: code artifacts, documentation, scripts, unit and integration tests, load/performance tests, and any other text-based artifacts that are required. This set of artifacts is then analyzed and transformed into templates and a model schema. We use specialized tooling for this step which reduces the effort from weeks to one or two days. The end product is an automation tool that uses the model schema and templates to produce web services and APIs in a fraction of the time it would take normally. In some cases, 100% of the code, tests, documentation, and related artifacts can be produced in minutes, which could otherwise take over a week if coded by hand.

A crucial feature of our tooling is that the resulting code can be re-generated (when templates are updated or improved), and any manual changes that developers have made will be preserved. This re-generation feature, since it preserves any manual changes, allows architects to focus on keeping projects updated to the latest infrastructure and developers to focus on business logic. The result is more code – faster, with fewer defects, and a governance process that really works with modern development teams.

Technologies Used

Dropwizard, JMeter, MongoDB, ElasticSearch, Cassandra, Redis, Gramar


Data Automation Use Case #3 – Automate the Load and Management of Cloud/SaaS Applications for Your Customer 360

 

Outline of the Concept

Most organizations today use third-party cloud-based applications to manage a significant part of their business and customer data. Data integration can be more challenging than ever since this black box architecture means fewer standards can be defined and enforced. Data engineering should not only enable this type of integration, but also provide a high degree of automation to ingest critical data from these applications.

Customer/Business Need

Customer data and related events help provide a complete view of customer interactions that can enable a better experience and understand and predict behaviors. Companies must be able to quickly scour their enterprise data and stitch together the true and complete picture of their customers. This can even be a necessity due to regulations such as the General Data Protection Regulation or GDPR. Whether used for competitive advantage or to meet regulatory requirements, you must be able to locate, collect, and manage your customer data.

Intersys Solution and Application

At Intersys, we’ve created Skye – a platform with a connector library for popular third-party applications that store both structured and unstructured data. This library includes document repositories like Microsoft Sharepoint, content management systems like OpenText TeamSite, project management and collaboration tools like Atlassian Confluence and Jira, and many others. Skye leverages an open microservices data pipeline that uses these connectors to acquire data and process it to an endpoint. Our standard flow indexes data into Elasticsearch so that all types of content can be easily and quickly queried across many collections. However, the Skye data ingestion pipeline can load data to many different targets such as a Data Lake, NoSQL Data Store, RDBMS, or other Data Platform. Because the data ingestion pipeline is open, Intersys can easily add data enrichment services to provide even greater value. These include features such as sentiment analysis, data masking, data cleansing, or probabilistic matching.

Skye simplifies and automates the data ingestion processes and provides critical administrative and reporting features for control of and visibility into these processes. The Skye connector library is continually expanding, and Intersys also provides a connector SDK so that more custom connectors to niche third party or custom applications can be created and managed in the Skye platform.

Technologies Used

AWS/Azure, Skye Search, Kafka/Confluent/Kinesis, Elasticsearch, Kibana, Java/JavaScript

Ready to Get Started?

To learn more about how Intersys can help you with your next data or digital initiative, please fill out the information below and a member of our team will get back with you shortly.