NoSQL Solutions

There are several NoSQL solutions available. In this blog, we will see Elastic Search, Cassandra, MongoDB and Hive in brief what they support and their limitations.

NoSQL solutions are a choice to solve problems which are hard or not possible to solve using Relational Databases. One of the main issues is the scalability. With relational databases are expensive to have scalability since the only way to have it is to increase server’s performance, also they don’t support unstructured data searches.
Both can be easily done with NoSQL solutions, and they are considerably less expensive compared with relational engines.

We are now comparing some popular NoSQL engines and provide some of the features they support and some of what they don’t.

 

Open Source Search & Analytics

  • Open source engine
  • Document-oriented
  • Distributed RESTful modern search
  • Analytics engine developed on Apache Lucene and Java
  • Can store massive data
  • Supports near real-time data search on top of it

Supports:

  • Fuzzy and non-fuzzy search queries.
    You can sustain hundreds of searches per second on a collection of millions of documents.
  • Aggregations which can help to analyze the data
    You can use those on tools like Kibana.
  • Different analyzers on data one of which is autocomplete feature. Search with partial word and get results instantly.
  • Schema-free or user-defined schema.
    Elasticsearch supports dynamic mapping which it creates schema itself or user can also create template or mappings to each field in the document.

Disadvantages / limitations

  • Near real-time consistency: As data, your index is available for search after 1 second.
  • SQL like joins
    Uses parent-child or nested relation handling instead
  • Transactions and rollback
    It offers a version control to make sure update on the latest document
  • Updates are expensive as it re-inserts document.

Links – Official web: https://www.elastic.co/products/elasticsearch Manual: https://www.elastic.co/guide/index.html

 

Distributed Database for structured data

  • Created by Facebook and released in 2008
  • Open source
  • Column store support
  • Distributed database support
  • Runs on top of commodity server
  • Provide high availability of data with no single point of failure

Supports:

  • High data availability

Replication mechanics details are abstracted from the user and that makes it easier to interact.

  • Data nodes are masterless – So, they don’t have to use master-slave model for replication, in case of catastrophic failure much easier to replace the node.
  • No single point of failure in a cluster, all nodes created equally.
    Data is distributed across the cluster and each node is capable to handle read and write.
  • Cassandra’s data model mostly feasible for heavy writes.

Disadvantages / limitations

  • The newer version has limited support for aggregations with single partition because its key-value store doing something sum, min, max or avg are very intensive resource consuming operations even those are possible to accomplish.
  • Unpredictable performance.  Cassandra runs lots of background jobs which are not user defined on a cluster. sometimes you will see queries are slower but due to those jobs which makes it hard to debug.
  • CQL query language is limited
    CQL and SQL are pretty much the same but has some limitation so someone with SQL background gets confused about syntax and ability of it.
  • Query level schema model

You need to model your data around queries you going to surface rather than around the structure of data itself.

Links – Official web: http://cassandra.apache.org/  Manual: http://cassandra.apache.org/doc/latest/

 

MongoDB: Open Source Documents Database

  • One of most the popular NoSQL solution.
  • It stores data flexible
  • BSON(Binary JSON) format it is like JSON
  • Multi-schema documents, if schema changes over time its best solution.

Supports:

  • A high volume of data and MapReduce algorithm
  • Allows processing this large amount of data by running them in parallel.
  • Horizontal scalability
  • horizontal scalability splits load among the servers, and by adding servers to the pool it could increase the performance
  • Distributed at core
  • High availability, scalability, and geographic distribution by default build in.
  • High writes speed

Disadvantages / limitations

  • Relationships between documents
  • Indexing column limit
    It could have a maximum of 64 indexes on one collection.
  • Document size limit to 16MB
  • Geospatial indexes cannot be queried
  • Multikey indexes cannot be queried over array filed.
  • Naming restrictions on collection name or DB
    Case sensitivity or field names should not contain null.

Links – Official web: https://www.mongodb.com/ Manual: https://docs.mongodb.com/manual/

 

Hive: NoSQL for bigdata

  • Is used for querying and analyzing large datasets stored on HDFS
  • Hive was initially implemented by Facebook
  • Hive connects itself on Hadoop and enables you to run SQL like queries.
  • When you run any query, it transforms into MapReduce job on the cluster.
  • Hive helps us query distributed data and helps to retrieve results in parallel.

Supports

  • External tables, Is possible to process data without actually storing in HDFS so storage of data can be HDFS or S3
  • Using HiveQL doesn’t require additional knowledge if you are familiar with SQL.
  • Partitioning of data which help to improve the performance of a query.
  • Hive has rule-based optimizer for optimizing logical queries into MapReduce job.
  • Provides query, analysis, and summary of data with easy way.

Disadvantages / limitations

  • Is not the search engine – Full-text search is not possible.
  • Not real-time results for queries – It converts the query into MapReduce Job which creates latency to return the results.
  • No row-level updates Inserts or deletes.
  • Hive not able to differentiates between NULL and null values

Links – Official web: https://hive.apache.org/ Manual: https://cwiki.apache.org/confluence/display/Hive/LanguageManual

 

If you would like to implement Relational Databases, Intersys would be glad to guide you to successful implementation

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *