So, how’s your Data Warehouse doing? Yes, I know Data Warehousing is so 2007, it’s all about your Data Lake now. What does that even mean? Let me ask you, did you spend any time with anyone other than IT managers and other technologists pondering the question, “Why do I need a Data Lake?” Or did you just wrangle some budget, call a trusted consulting firm because you couldn’t find resources with the right skillset (our operators are standing by) and start hammering away?
Think back to when you built your Data Warehouse (fade in the flashback music). What was the justification? Was it because every kid on the block already had one? Was it because your technology stack was outmoded and slow and you needed to modernize your team’s skills? Or were you just sick of analysts in every department peppering their company PCs with Microsoft Access databases?
Whatever the reason, if you were the one selling the Data Warehouse to the business (“We need a Data Warehouse! It’ll be great! For the first time, we’ll have a unified view of the enterprise! Conformed dimensions and star schemas … just imagine what we could do!”), then you probably settled for one of these rationalizations and your Data Warehouse is now an expensive dinosaur that people rarely use … or if they do, it’s grudgingly … by necessity … because you took their Access databases away. (“You can’t have those anymore,” you said, “There’s no Governance!”)
Then again, maybe you spent a few weeks talking to people outside of your tech bubble – executives, upper-level managers, line-level managers, even the worker bees who assembled products or reports. Maybe you listened to their praises and frustrations and gained a deeper understanding of how your business works and how data could help people do their jobs more efficiently and effectively. Maybe you even estimated how you could save the company money, lower its prices and improve its product. Then perhaps you compiled that information and ran it up against your data assets. What data did they need? What data did you have? Where did the two intersect?
Nah, not likely. Too hard. Too expensive. Too time-consuming. Too much work!
Now let’s cue the organist and fade back to the present. How expensive is the Data Warehouse now? Is the cost justified? Do you even know the return? Do you have all that single-view-of-the-enterprise jazz? A standard view of your customers, cross-departmental reporting, meaningful dashboards and effective trending analysis? Do people tell you the Warehouse is great, but it could be even better if we could only <insert your analytical wish list here>?
If they do, good for you. You are a great asset to your organization.
If they don’t … beware! You are on the precipice of a deep, dark chasm.
There is a boundless ocean of data at your fingertips today – log-level data, event-level data, streaming social media data, cloud data, Internets of Things data, the ever-elusive real-time data – data way beyond the confines of your puny servers. If you’re bold enough to grab a piece of it, you’d better know what your organization needs and why they need it.
Note that I said “they” and not “you.” It’s not about you. If it is, then all the Hadoops and Sqoops and Storms and Flumes and Flinks and Flunks and Plinks and Plunks and Oozies and Floozies can’t help you. Your pristine Data Lake will quickly deteriorate into naught but a Data Landfill or a Data Cesspool. And then you’ll really need a Superfund to clean it up.
Don’t say I didn’t warn you.