As I meet with different customers, teams, and internal experts to deliver our BI and Data Management engagements, I cannot help but notice the prominence of a concern regarding the survival of data warehousing. With the rise of Big Data (and Hadoop), there exist solutions that are faster, more scalable, and real time. The old solutions are slow, difficult to use, and expensive, making the new solutions an obvious choice. People seem to be debating the time DWs will take to go to the grave.
Before we delve into the possibilities of data warehousing living up to the doomsday prediction, let’s get some basic questions straightened out.
What is a Datawarehouse (DWH)?
In the computing world, technically speaking, a data warehouse (DWH), a.k.a an enterprise data warehouse (EDW), is a system used for reporting and data analysis. DWH is one of the core components of a Business Intelligence ecosystem. DWHs are the central repositories where current and historical data are stored. These data are sourced from one or more disparate sources, integrated, cleaned, and semantically unified. The information is used for creating analytical reports, which help in understanding businesses and making decisions based on the Business Data. Typical examples of reports include daily/monthly/yearly product sales, specific trends, etc.
In other words, DWH is an abstraction; a logical representation of integrated, clean and semantified data that an organization uses to make decisions. It is primarily a business process that unites information that can function as a single entity. And we need a set of technologies to implement this data warehousing i.e. for extraction – a method to integrate and clean, store and present.
What is Big Data? What are its origins?
Big Data, by definition, means storing huge volumes of Data, usually in the raw form and not in the processed form. They can be from different type sources (structured and unstructured) - for example, from operations (log reports), user activity (website tracking), or other real-world usage (IOT Devises).
The absence of a target to transform raw data for ensuring it is left as it is. Big Data systems are meant to have more current, original-context sensitive information that can better support line managers and executives. Economics drive Big Data – it costs about a third or less than that of traditional data warehouses. And getting a big data system up and functioning on a public cloud takes about one-tenth of the time.
Big Data systems evolved from systems like Facebook and Twitter whose primary need was massive data storage, not intelligence.
So, from the initial definition, DWH is a "System" that can be implemented with a variety of technologies and tools. A particular technology is not DWH, for example, Hadoop or just a data dump is not DWH; you can accomplish this in multiple ways including on Hadoop, Relational Database, Appliance, etc. We have seen that Big Data Technologies will empower us to have more speed and newer dimensions than we are used to getting from the structured data.
Migration of warehouses to other platforms cannot actually be termed as DWH migration; they are only moving the "Data" into Hadoop, Big Data appliances, NoSQL Databases, columnar databases, and/or Cloud. The movement towards Hybrid data warehouse environments, which in TechM parlance is called Hybrid BI (will cover in my next blog), is one of the key trends in the DWH world. A hybrid or multi-platform environment is the warehouse which can be used for dual purposes – business reporting and analytics. In future, when Big Data technologies will advance further, perhaps to give relational support by current established players by abstraction, insert and data updates capability, we will see a lot more innovation and upgrades.
DWH, as a system for business needs, is going to stay as-is. The other question of whether we will come out of Relational Databases completely, and still get the reports that businesses need, will be determined by technology advancements in the future.
Do you agree? If not, please let me know your opinions in the comments section. I will be curious to know your logic on this topic.