Posted by: Manish Goenka, Vice President and Head of ASEAN, Tech Mahindra.

Modernizing Data Architectures: The Needed Tools and Services

Modernizing the data architecture is the need of the hour, but no one seems to agree on exactly what is needed to bring this about. On May 5th, as part of a YouTube Live session, I sat down with my colleague, Deb Mukherji, Practice Head – Data and Analytics (ASEAN), Tech Mahindra, and Alexander Hoehl, Senior Director Business Development (APAC), Denodo, to discuss this critical topic.

It was an enriching discussion, and I captured the highlights below. If you have any questions, please ask them in the Comments field below, and I will forward them to the appropriate person. 

First, how would you define “data architecture”?

Deb: Data architecture is like the set of rules that govern how we use data. It’s a kind of a framework that runs from data acquisition to data ingestion to data management. Then comes data delivery, followed by data consumption.

Let’s say an organization has a very standard, traditional data architecture. So you have source systems, and then we get the data through a staging area, then we move the data into a data warehouse, and this is where we have the data models, and then finally we have data consumption in terms of BI and analytical reporting.

Then we might have a slightly different architecture, one built around big data and Hadoop. Now again data comes into a landing zone from source systems, and from there the data goes into a processing zone, and from there, instead of creating a third normal form, a standard data warehouse, you can also have data marts straight away for the end-user consumption of data.

Why do we need to modernize the data architecture? Why is the traditional BI/data warehousing architecture not able to keep pace with modern business?

Alex: Over the last 20 or 30 years we have seen significant advances in data visualization tools and also in storage. You have big data, you have IoT data, you have data lakes, you have unstructured data, etc.

But there have not been many developments in the way that data is delivered. Today, we’re using a lot of traditional ETL/ELT processes, which have been developed 30 or 40 years ago, but they were built at a time when data volumes were relatively low, when there was mostly structured data, and when all of data integration technologies made copies of data.

This was a fair approach, but with new data sources, and with ever-faster data creation, IoT data, social media data, sensor data, etc., these kinds of processes were starting to get too slow, and now they’re starting to break. At the same time, the speed of business has increased significantly. You can’t base decisions anymore on four-week-old data; even data that’s an hour or two hours old data might be out of date.

That’s where data virtualization comes in. With data virtualization, you leave data at the data source, and it doesn’t get delivered to the consumer until the moment that the data is needed.

Data virtualization provides you with an abstraction layer, which sits between your data storage and your data consumers, separating the different technologies. This means that data consumers do not have to worry about SQL, about different dialects of SQL, or any details of data access. A data virtualization layer hides all these data complexities from the data consuming applications, and it makes it look like a single database to them. It’s also very easy to add new data sources because you don’t have to worry about the different technologies that are in use.

You also get a metadata management layer. With access to all the different data sources, you can expose information about your data sources in a structured way to your end users and applications, so you have a strong foundation for data governance, as well as strong data cataloging capabilities.

Because you have that layer between the consumers and data storage, you get also a very strong security layer. Everybody who accesses data has to go through the data virtualization layer, so you can implement a very centralized, strong, data security management layer that can include features like data masking. The data virtualization layer provides you with a full audit trail of who accessed which information, which enables companies to easily manage data compliance and governance.

How can data virtualization help to modernize the arch?

Deb: Organizations spend a lot of time bringing data into a central repository so that people can effectively use it to derive insights from the data. This involves not only time and effort on the human resources side, but also on other physical resources in terms of storage and maintenance. So in terms of modernization they ask, “Is there any way to avoid physically moving data from place to place?” And this is where data virtualization comes up in the conversation. 

Alex: When we look at modernizing the data architecture, one of the key benefits that customers can get by implementing data virtualization is agility. We’re talking about up to 90% time savings. With a traditional data warehouse it might take 2 or 3 months to add a data source, but with data virtualization you can do it in less than a week. You don’t need to upload all your data for ad-hoc reporting, you just connect it to the data source.

Large organizations do not want to maintain 6 or 7 different data models for the different applications that use data, and data virtualization provides them with one centralized data model that they can share across all of those different consuming applications. This has a number of benefits: It’s not only the time you save by maintaining one data model compared to 6 or 7, but also the consistency. One of the biggest problems I hear when I talk to customers is “We have three departments that are reporting to the CEO, and all three departments report the same numbers, but they all have different results.” Different data models is a primary cause for these problems. You might have started off with one consistent data model, but over time, things may have changed. At that point, it’s very hard to track where the inconsistencies came from, and data virtualization can help you with that. You also get a more complete picture of your data, and it’s very easy to integrate new data sources.

We see 40 to 45% cost savings over traditional data integration methods, and Gartner makes similar statements.

With one centralized data model, aren’t we compromising on performance?

Alex: That doesn’t have to do with just the data model, it also has to do with the fact that we’re talking about distributed databases, and we’re talking about executing in an environment in which data is both on-premises and in the cloud, and you have to bring this data together.

Thinking about performance actually comes from the traditional approach of data federation. In such a scenario you might have 3 data sources and you need to run a report of revenue by region by product hierarchy. Say you have sales transactions records, product hierarchy records, and maybe territory information in different data sources. What you would traditionally do in such an environment is to first move all of the data over the network into memory. If you had millions of transactional sales records it would take a very long time to move that data across the network. The execution in memory is actually very quick.

What data virtualization can actually do is aggregate the data before it’s transmitted across the network. And that makes all the difference. Imagine you have 5 million sales records and you can aggregate that down to just 10,000 records. Obviously your transmission rate goes up significantly, and that’s where you get incredible performance. I’m proud to say that Denodo has the best performance optimization engine on the market, so we can provide performance that is almost comparable to a single database execution.

Alex, can you can share a few customer examples where this technology has been used successfully?

Alex: Let me share the story of Autodesk, the design software maker. The company traditionally sold software on CDs, and later as downloads. This meant that that Autodesk had very little interaction with their customers. They had sales records, they had support requests, and perhaps some ongoing maintenance, but the interaction with customers was limited.

At some stage, with the emergence of cloud, Autodesk moved to also providing software on-demand, and the company realized that this was a completely different business model, one that provided them with incredible insights in how customers were using the software. But to support this new business model, Autodesk also needed a modern data architecture.

Autodesk was a traditional SAP shop, so for their “old” business, the company used to have an SAP ERP solution. Autodesk had a data warehouse, the company used BusinessObjects, and once Autodesk introduced the new business model, the company had to introduce a data lake and move to the cloud, to capture all this new data. Going forward, the company could not just give up its old business, so Autodesk kept its existing customers and had the new environment for the cloud business, but the company needed an integration layer.

So Autodesk looked at data virtualization to bring data from the old and the new businesses together and provide a single view across the complete business. With data virtualization, Autodesk established a single data model, so the company kept a consistent view across the different data sources, and Autodesk also got a single access point that enabled the company to put very strong security and compliance in place. Data virtualization enabled Autodesk to smoothly move to the new business model without interrupting daily operations, and at the company’s preferred pace.

Another company is Seacoast bank, which is a community bank in Florida. Seacoast had been growing very quickly through acquisitions and was supporting a variety of different banking-related applications. The bank had an online data warehouse in place. But with the acquisitions, as new data sources came in, it became clear that the data warehouse was relatively inflexible. Setting up new data sources and creating reports from them took two to three days, which was too slow for the bank.

So Seacoast decided to introduce data virtualization. The bank kept the data warehouse and created virtual data marts to replace the physical data marts it had before. The structure actually stayed very much the same. The bank had a data mart, or now a logical data mart, for loans, credit cards, accounting, assets, etc. by introducing data virtualization,  Seacoast cut down time down the time required to create reports from new data sources to 2 to 3 hours. The bank saved over 95% of its time and was also able to provide self-service capabilities across the whole organization, boosting productivity for each department. The bank gained real-time data-delivery capabilities, which for load processing, for example, is absolutely critical, as banks need to make these decisions based on the latest data.

Denodo’s role in data virtualization is very well established. What is Tech Mahindra’s role in this partnership with Denodo, and how will Tech Mahindra help thejoint customers?

Deb: Denodo provides the solution, the product. Tech Mahindra is a systems integrator, a solution provider. So we implement the Denodo Platform, and we accomplish that with Denodo-certified employees and engineers. When we engage with customers, we understand exactly what they want to do from a business and technology perspective, and we advise them on how best to use data virtualization to meet their goals.

We are now in a COVID-19 situation and we can’t ignore that. How should the modern CDO navigate pandemics like this?

Deb: Everything is changing dramatically, and the way we look at data, and the way we handle data, is also changing. We will need to be more agile. We will need to be more pragmatic in our approach to data. So how do we become more agile in getting insights out of data if not by modernization?

AI will also significantly change the way we think of analysis. People are looking at chat-bots and AI-driven transport, and people are investing in these areas, as well as in advanced analysis, AI/ML/data science, machine learning, and deep learning, but such activities can only function when you have your data in place, properly managed, curated, and processed for consumption.

Alex: With COVID-19, we are all seeing these portals with case figures, new infections, recovery numbers, ICU beds available, etc. In the fight against COVID-19, data is actually a critical component. All across the world we see governments, health organizations struggling to get this data in place.

Denodo is talking to many healthcare organizations and agencies about this issue, and as a result, we built our own COVID-19 portal (www.denodo.com/en/page/coronavirus-data-portal) to bring live data from across the world together. We also provide our software free of charge to anyone who would like to leverage our portal in that fight.

COVID-19 is not going to go away in the next couple of weeks, so we have to prepare for the long run. And when we’re talking about data, yes, you can, in the short term, build solutions with probably a lot of manual labor, which can collect the data you need for your daily dashboards. But as this is probably a marathon that we’re talking about, you might want to switch to platforms that can help to automate the whole process, that make it less labor-intensive for you to collate this data, and still have the best data available. Data virtualization could be a key part of any kind of solution that could help in the mid or the long run, to fight against this virus.

About The Author

Manish Goenka

Manish Goenka, Vice President & Head of ASEAN, Tech Mahindra is a performance-oriented business leader, Manish has over three decades of experience in the IT industry including global exposure and more than two decades of experience working in the Asia Pacific markets. Proficient in enabling businesses to achieve high growth across enterprise industry segments and operations in the region by combining strategic planning with tactical solutioning expertise, Manish adds value to organizations in the retail, healthcare, BFSI and the public sectors by facilitating digital transformation as they progress by implementing advanced technologies.