Cloudera Director – Automating Big Data Needs

OMG! Big Data!   Big Data is not as scary as it sounds. Some data enthusiasts often refer to Big Data as a huge amount of data, and it isn’t a quite accurate definition. For example, a medium-sized organization which is trying to gather insights from their data sets that are a combination of structured and unstructured …

Apache NiFi as an Orchestration Engine

Apache NiFi as an Orchestration Engine Orchestration of services is a pivotal part of Service Oriented Architecture (SOA). Apache NiFi provides a highly configurable simple Web-based user interface to design orchestration framework that can address enterprise level data flow and orchestration needs together. While most other frameworks primarily are for service orchestration only, NiFi can …

Spark RDD vs Spark SQL Performance comparison using Spark Java APIs

Resilient Distributed Dataset (RDD) is the main abstraction of Spark framework while Spark SQL (a Spark module for structured data processing) provides Spark more information about the structure of both the data and the computation being performed, and therefore uses this extra information to perform extra optimizations. Up until Spark 1.6, RDDs used to perform …

Driven by Big Data – Governance & Security

My definition of this topic is “a necessary component of any big data solution with a perfect blend of skepticism and confidence by all of those involved”. Think of this as not just securing a file with permissions and capturing its name and glossary, but about sustainable architecture designs. Our recent blog “Make Data your BAE …

Driven by Big Data – Beyond Enterprise Search

Driven by Big Data – Beyond Enterprise Search Enterprise Search is yet another rapidly growing ecosystem. Combining state of the art technologies to work together has redefined what “search” meant to an organization and its global customers. Being one of the prominent themes in the services business, building a matured knowledge management architecture has always …

Observer Design Pattern – Java

When you write Data Validations, Credit Card Validations, Phone Number Validations, and String Validation, we mostly choose 3rd party Common Utilities, mainly the library from Apache. This is to prevent the wheel reinvention; technically we call it “CODE REUSE” or just “REUSABILITY.” When it comes to Design Patterns, the way I think of it is, it’s  still …

Driven by Big Data – IT Transformation Strategy

IT transformation is inevitable, and the technology refresh cycle is becoming more and more aggressive and competitive. Open source has not only gained trusts of public sector enterprises but also into more regulated businesses and organizations. CIO’s office is constantly pushing for more innovative ideas, cost savings, and auditing their existing systems. Their guiding principles focus on evaluating open …

Driven by Big Data – Blockchain and Device Democracy

The whole concept of a decentralized distributed database, a shared ledger, and a singleton computation framework makes Blockchain one of the most prominent technological discoveries of today. It is easy to relate Blockchain to Bitcoin as it was an early adopter and a starter for various organizations to conduct proof-of-concepts and build private networks of …

Driven by Big Data – Design Patterns

Big Data ecosystem is a never ending list of open source and proprietary solutions, and in my view, nearly all of them share common roots and fundamentals of good old platforms that we grew up with. With that as the basis, our topic for today is about architecture and design patterns in the Big Data …

My First Week in Big Data

Musings of a Java Dude Written By: Vinodh Thiagarajan, Sr. Java Consultant I am a developer and spend all of my time with non-big data items. That is why I felt somewhat lost when I entered the Intersys premise. My mission is to become a Hadoop Developer within a short time period along with a …
older posts