OMG! Big Data!

 

Big Data is not as scary as it sounds. Some data enthusiasts often refer to Big Data as a huge amount of data, and it isn’t a quite accurate definition. For example, a medium-sized organization which is trying to gather insights from their data sets that are a combination of structured and unstructured data, face a Big Data challenge. A challenge that cannot always be overcome with traditional analytic techniques. As per Lisa Arthur, former Chief Marketing Office of Teradata Applications, “Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.

 

What is Cloudera?

 

Cloudera was one of the first to have been founded and is currently the leading provider of Apache Hadoop, catering their services starting from startups and all the way up to enterprises. Cloudera offers software for business critical data challenges including storage, access, management, analysis, security, and search. In short, Cloudera offers a one of its kind unified Platform for Big Data: The Enterprise Data Hub.

Learn more about Cloudera here.

 

What is Cloudera Director?

 

Cloudera Director helps you deploy, scale and manage Apache Hadoop clusters in the cloud of your choice. Its enterprise-grade features deliver a reliable mechanism for establishing product-ready clusters in the cloud for big data workloads and application in a simple, reliable, automated fashion.

 

Why Cloudera Director?

  • Create and dismantle clusters on demand: With Cloudera Director, managing Cloudera Manager instances has become much easier, especially the allocation of the instances and configuration. Organizations can build clusters as per the needs and demolish them after their purpose has been served automatically.
  • Multi-cloud support: Cloudera Director provides a plugin architecture, which supports creating clusters Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform. The open source nature of plugin enables any organization to build their plugins to support the environments, thereby facilitating Cloudera Director instance to run in multiple environments at once.
  • Template Models: Director has the ability to consume any cluster configurations submitted through Director CLI or JSON via Cloudera Director API. These configurations can be customized from the lowest level possible and could be saved for the future builds. The user can save a ton of time by not going through the entire procedure of building the cluster which is already automated for the most part. One more major advantage of using the configurations scripts is the ability to install custom packages and services on the cluster automatically as soon as the cluster is ready.
  • Grows and Shrinks: Director provisionally adds more instances as the demand increases and terminates instances when not in need. This saves a lot of money for the organizations who operate on large scales.
  • Pay for Usage: The user will be billed only for the running CDH services. When the cluster is terminated, the billing stops. This is a key feature for transient clusters.
  • Security: Cloudera Director enables secure deployments of applications. The Director’s DB is encrypted by default, and it helps the user configure CDH clusters with Kerberos authentication.
  • Slick Web UI: The user interface is so powerful and provides a consolidated dashboard for monitoring the health of all the clusters that are under one Director instance. This enables the user to grow/shrink/terminate clusters depending upon the needs.

 

Drawbacks!

  • Cloudera Director uses H2 Embedded Database to store the cluster data. There is no automated procedure to back up H2 Embedded Database, and we need to manually back it up to avoid losing environment and cluster data.
  • Cloudera Director cannot manage a cluster that was Kerberized through Cloudera Manager. A possible solution is to deploy a new Kerberized cluster and copy data from the old cluster to the new.
  • Sometimes Cloudera Director fails to bootstrap a cluster with DNS errors. A possible solution would be to configure the VPC for forward and reverse resolution.

 

Try Cloudera Director:

  • Prerequisites: Beginner level expertise in AWS, Beginner level expertise in Bash scripting, Good understanding of Hadoop Architecture and Cluster setup.
  • Download: https://www.cloudera.com/downloads.html
  • Build a Sample Cluster: https://www.cloudera.com/documentation/director/latest/topics/director_get_started_aws.html#concept_td3_wk5_ht
  • Sample Configuration: https://github.com/cloudera/director-scripts

 

Conclusion:

With Cloudera Director, you can run production-ready Apache Hadoop clusters on Amazon Web Services, Microsoft Azure or Google Cloud Platform – only paying for what you use.

 

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *