Apache NiFi as an Orchestration Engine
Orchestration of services is a pivotal part of Service Oriented Architecture (SOA). Apache NiFi provides a highly configurable simple Web-based user interface to design orchestration framework that can address enterprise level data flow and orchestration needs together. While most other frameworks primarily are for service orchestration only, NiFi can be leveraged for additional benefits such as data provenance and data processing along with secure and durable data flow for IOT, Big Data, and SOA. Even though there are a variety of challenges associated with each type of problem, NiFi can address most of them in fairly similar manner. This article takes a closer look at such scenario from an implementation point of view.
Apache NiFi Introduction
Apache NiFi came out of United States National Security Agency (NSA) and originally named Niagarafiles. In 2014 it was released as open-source software.
Apache NiFi is a data flow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. In simple words, NiFi is built to automate the flow of data between systems. In NiFi’s context, data flow means the automated and managed flow of information between systems.
Refer NiFi documentation for key definitions and component description –
At very high level below diagram shows all core components of NiFi –
NiFi architecture can be understood easily by taking an example of how data flow happens through NiFi –
- When data is first created or pushed through NiFi processor, a FlowFile is created with three attributes (UUID, filename, and path). FlowFile Repository is where NiFi keeps track of the state of what it knows about a given FlowFile that is presently active in the flow.
- The content of FlowFile is stored in Content Repository which is a pluggable and partitioned disk.
- Attributes of FlowFile reside in-memory while content on disk (in Content Repository).
- Provenance Repository stores provenance’s events such as file creation, modification, new components, change of state, etc. At each provenance event, the content of FlowFile is also available if it is alive.
- Content of FlowFile can be accessed with the help of index that is available through FlowFile Repository’s index. If you look at a particular provenance event, you can see value of section, identifier and offset. As shown below, content claim (i.e. content of file) can be located at section 2, identifier 1498408069668-2 and offset 5779 on Content of Repository.
- When FlowFile moves from one processor to other without a change in data, content remains at same partition (i.e. same section, identifier, and offset) and it just adds another provenance event. If content changes, a new FlowFile is created therefore NiFi UI allows looking at the content before and after changes occur. If FlowFile gets split, then only offset changes, and there is no change in file at Content Repository.
- FlowFiles moves from one processor to other through NiFi queues. If the downstream process is busy or disabled, FlowFile remains in queue. As long as a FlowFile is in queue or being processed, it is called Live. A live FlowFile’s content remains persisted in the content repository. Once the FlowFile is in no longer use, it is not Live, and its content is deleted from Content Repository. Although provenance still shows the FlowFile you just can’t access the content (and hence you can also not replay the message).
- FlowFile Repository by default uses Write Ahead Log so if NiFi server goes down, all Live FlowFiles will come back up from their last known state.
Our hypothetical scenario is that we want to use NiFi as orchestration engine for web services, IOT, and message components. For our demo, we will take web services for example. IOT and message components will have a similar architecture with different processor components.
- Download latest version of NiFi from link here – https://nifi.apache.org/download.html
- Unzip or Untar the binary. Under Nifi-home/conf directory there is a nifi.properties file. Change nifi.web.http.port value to a port number that is not already in use.
- To start NiFI, run bin/ run-nifi.bat for windows or bin/nifi.sh start for Mac or Linux. You can also run it as a service
- After few minutes NiFi UI will be available at localhost:<your port>/nifi
NiFi DataFlow Design
For our example, we will be using three applications and two microservices. Below are the main implementation steps (refer template screenshot below for component name) –
- Orchestration Process Group is the main point of communication between application and microservices.
- Each application sends callback URL and Target URL to orchestrator.
- Orchestrator calls Target URL and sends callback URL as a header property to microservice component.
- Orchestrator also generates orchestration correlation ID (OrchestratorCorrId) if it doesn’t exist. This correlation ID will be same throughout the process. The application sends a source correlation ID as well.
- Orchestration correlation ID is indexed via NiFi property file to make it searchable in UI.
- HTTP requests are handled by HTTP-Handler process group. In order to make calls asynchronously, response codes are returned immediately after some basic check. In our demo, we are hard coding it for success. HTTP-Handler process group also extracts and reassign all attributes sent as headers by the caller.
- Microservice Process Group and Post Process group of Microservice component are merely placeholder processors in this demo. They only assign attributes. However, they can be used for service specific processing. Similarly, Failure process group is also a placeholder which can be modified to create general purpose failure handling process.
- Most of the processor’s relationship is auto terminated if not used. They can be used for event specific routing depending on use case.
- Microservices processor group assigns callback URL received from orchestrator to Target URL and call orchestrator. Orchestrator handles this call the same way it handles calls from the application, i.e. call Target URL. For this demo purpose, applications listen to the call from orchestrator and simply log the attributes.
NiFi Template “NiFi-Orchestration-Demo” for this demo is available at link below –
You can upload above template in NiFi and open it on canvas. All hardcoded ports might need to be modified if they are already in use.
This section explains components of the template. Below is the screenshot of the template-
As shown above there are three types of process groups – Application, Orchestrator, and Microservices. Each is similar to their type except the message content and end points.
Expanded view of Application Process Group –
Application process group generates a FlowFile every 10 seconds, assigns attributes such as callback URL, Target URL, etc. and calls orchestrator endpoint. It also has a Listener component to listen to the response sent from orchestrator after microservices have processed it.
Expanded view of Orchestrator Process Group –
Orchestrator handles requests from application and microservices same way. It looks for Target URL and calls that end point along with other attributes and FlowFile content. As shown below, orchestrator also assigns correlation ID, if doesn’t already exist –
Advance properties of Update Attribute Processor –
Expanded view of HTTP-Handler Process Group –
HTTP handler process group is used wherever an end point needs to be exposed for asynchronous call handling. It extracts the attributes from the header and based on some conditions it sends a status code to the caller. If successful it outputs the FlowFile at the same time.
Expanded view of Microservices Process Group –
Microservices process group consists of HTTP-Handler similar to Orchestrator. It also has three other process groups. One for service specific processing, one general post-processing such as assigning attributes, and one for failure handling.
NiFi stores provenances for each event out of the box. You can look at provenances at each processor by right clicking and select Data Provenance. To look at provenance from end to end for a call, we can index orchestrator correlation ID. To creat an e index on an attribute set below property in nifi.properties file.
Once indexed it can be searchable like below (by OrchestratorCorrId)-
After setting up all required configurations, start all process groups. Each application will send a FlowFile every 10 seconds which will get processed by respective microservice and will end up in Log Attribute of application. All calls will be handled by Orchestrator process group.
NiFi is easy to use DataFlow engine that can be designed to work as an orchestrator. It’s out of box data provenance capability, security, open source community support along with high configurability (concurrency, scheduling, durable queue, back pressure, etc.) make it a good choice as a general purpose orchestration engine. In our demo, we looked at REST services only, but AMQP, Kafka or MQTT (for IOT) can be added and plugged in with orchestrator in similar manner. NiFi also has MiNiFi (A subproject of Apache NiFi to collect data where it originates) which can be used along with MQTT to address IOT needs.
In addition to usage as orchestrator, NiFi is very well suited for real-time ETL and DataFlow. It has various parsers (XML, JSON, etc.), converters, executors (SQL, Scripts, etc.), and connectors out of the box that can address pretty much all day-to-day ETL use cases. For Hadoop ecosystems, NiFi has plenty of in-built processors such as GetHDFS, PutHDFS, GetHDFSSequenceFile, PutParquet, FetchParquet, ConvertAvroToORC, StoreKiteDataset, etc. that can help integrate ETL\DataFlow with Big Data solutions.
NiFi DataFlow development is done on UI which can add some inconveniences in terms code reusability and deployment as one has to go to UI to make modifications or deploy the template to the new instance. NiFi Rest API (https://nifi.apache.org/docs/nifi-docs/rest-api/ ) can be useful in addressing this issue by automating deployment and build process.