Recently, I’ve been reading a lot about “small data.” It was a kind of a new term for me, which is defined as the amount of data a human mind can comprehend; if the mind is not able to accomplish this, then the term used is “Big Data.”
We are all aware of how fast technology changes: today you buy a device and tomorrow a newer and better version is released. Big Data looks like this new device. Everyone wants it, but many people don’t understand what it is and what it does.
One of the main problems I personally faced when starting to learn about Big Data was to know where to begin. IT Professionals and companies want to know when or where it is recommendable to switch from traditional SQL Databases to Big Data (No SQL or schema-less) but the truth is there is no such breaking point.
There is a huge amount of data generated on a daily basis from devices all around us. Combine this with ever-evolving infrastructure and software to manage data and a common set of questions begins to emerge, such as:
- Where do we store all this new data?
- How do we process it?
- How do we analyze it?
It would be awesome to answer with certainty: “We need Big Data for this.” But that really is impossible before you have a clear understanding of what Big Data is, what it does, and what it could offer to you. Keep in mind that Big Data solutions are already being used for many, many users without technical knowledge and who do not work at large enterprise companies. On a daily basis, we all use these Big Data solutions even without noticing.
One of the most typical misconceptions about Big Data is that when your SQL database is just too big and its performance becomes poor and slow, you should switch to Big Data. Most think that as a result, the performance on your systems will magically improve. However, it is not a matter of size that pushes companies to switch to Big Data solutions. The truth is that it was designed to provide cheaper and better storage, and quickly process huge and varied amounts of data. Therefore, we can say that Big Data was created for different problematics than many originally believe.
Let’s Talk About What Big Data Is, and Is Not
It is not a big database
Ok it could be. But, as we said, there is not a starting point when we start calling it Big Data. It’s true that it is better at handling large sets of data than SQL Databases, but there also exits big data solutions that involve a relatively small amount of data.
It is a set of processes and techniques
A set of processes and techniques used to process and analyze large and complex sets of data. These processes include capturing data, storage, extraction, analysis, sharing, visualization, and all the steps in between.
It is not a database engine
Big Data is not a single database engine such as MySQL, Oracle or SQL Server. It could be defined more like a set of tools using specific techniques to manage the data storage, analysis and representation.
It is a system able to handle huge amounts of data
Lots of data, combined, coming from lots of sources, storing this data the best way, using distributed computing to provide faster and better answers to lots of destinations.
It is not required for a top of the line system
There are systems that work fine with structured Databases and some might not get any significant improvement by using Big Data. This could be due to business logic, constraints, amount of processed data, etc. A system that does not use Big Data tools is not always considered obsolete. There are environments where Big Data is not the best solution.
It is a real-time response solution
One of the features Big Data offers is to provide real time responses, even if the amount of data to process is huge. This is widely used to suggest searches, things people may like, things to buy, places to visit, articles to read, etc.
It is not just unstructured data
Big data commonly uses unstructured data, but not always. It is fed by all types of data. This could be structured data coming from a SQL Database or pictures, text files, sensor information, etc. Big Data could be defined better as “multi-structured” data.
It is multi-structured Data
It was already mentioned that Big Data mostly uses all known data sources. Some of them could be structured data, some of them could be non-structured. SQL tables, files, images, audio, video, etc. So the combination of all these is what makes it strong enough to provide insights, trends and metrics that traditional SQL data engines are not able to process.
It is not the replacement for SQL Databases
There is a belief that a newer/better system needs to be developed using Big Data instead of SQL Databases, but the truth is that SQL Databases will not disappear anytime soon. The direction of SQL engine usage is going more towards cheaper or free options than licensed ones. Also, SQL engine providers are releasing better free versions and including more interaction and features related to Big Data tools. Additionally, there are tools that work as SQL on Big Data, allowing people managing data fields with SQL Databases to be familiar with the practice and shorten the learning curve on these systems.
It is a set of tools to be used concurrently
The truth is that there are hundreds or maybe thousands of tools that can be used to work with Big Data. The next generations of this tools set will complete the process and result in functionality that allows them to work together to provide solutions needed. For example, you need to use an infrastructure tool to store the data improving the reads, writes and disks usage. (Hadoop, Cassandra, MongoDB, etc.) To perform the reads and filters and the analytics you can use tools such as Spark, ElasticSearch, SkyTree, using programming languages with data features like Scala, Python and R.
- Provide suggestions to your costumers
- Forecast predictions for better decision making
- Present statics to be used by information stakeholders
- And much more
To summarize, Big Data is not about the size of the data you are processing; it is about the tools and techniques you can use to create a specific solution. It’s also not exclusive to large or high-tech companies handling huge amounts of data. There are lots of use cases for “small” systems and they are growing continuously.
Big Data is the right choice if you have needs around:
- Faster and better statistics and insights
- Strategic prediction
- Real time data inference solutions
- Customer profiling strategies
- Processing social media and website data
- Data processes automation
- Monitoring and analyzing customer inputs (emails, comments, reviews, etc.)
- Data visualization
In light of all these, the answer to the when-to-start question is: now. It is true that in order to design and develop Big Data systems and solutions, you need professionals with this kind of knowledge. Here at Intersys, we can deliver the skills and technology necessary to start your company on the path towards Big Data.