Needle in a hay stack? Yes, that is what is done to process, analyze and make sense of tiny patterns in financial transactions amongst millions of transactions to identify terrorist funding. Same is used for identifying few sentences, few key words in multiple languages which the terrorists use to communicate and plan attacks ahead of time and save innocent lives.
You must be wondering what is this trivia about Terrorism got to do with our topic Big Data; Actually, the algorithms and databases/scripting languages are the original Big Data applications. However, the big difference is that these applications are now available for civilian enterprises and start-ups at very affordable prices to use it to analyze and profile customers to provide personalized services!
We can categorize Big data into three broad categories –
- Traditional Transaction data coming out of CRM, Billing, POS type systems;
- Human generated data - E-mails, social media posts, Twitter type messaging applications, photos posted, etc. and
- Machine Generated Data
Machine generated data is not new to us. System generated logs from OS, Storage, middleware and applications are known to us. Now the sheer size and complexity has increased. With it the need to store & retrieve and quickly find patterns and predict future faults has also shot up. We need newer Big Data tools like Hadoop, Google Query and storage technologies such as AWS Redshift to manage this vast amount of Data; With AI algorithms available on Open Source (Even IBM Watson APIs will be available soon), the need to use AI predictive algorithms to predict future issues and also correct issues without human intervention has become of paramount importance.
Several new types of data has been added in this Machine category, namely, Data coming from thousands of sensors embedded in Internet of Things, Data from Power Meters, from Household sensors to measure temperature, from wellness bands used by millions of consumers, etc. The data size may be small but the volumes are very high. And to manage this, not only do we need Big Data technologies such as Hadoop and others, but we also need fast real time decision making and in-memory technologies such as SAP HANA to manage it. With the advent of sensors in Trucks the whole logistics & transportation industry will produce large amounts of sensor data which will also need to be analyzed real time.
Video / Audio data coming from millions of CCTV surveillance cameras is another huge data in out kitty. Image Processing, image recognition and low resolution imaging are some of the new algorithms/technologies that we need to make sense of this data. The need for effective compression technologies for storing this humongous amount of data; GPS and location data coming from billions of Smartphones is also a major source of data in this category. Here again, we need real time analysis / processing, so whether it is couponing for a product or invite to visit the car show room for a test drive, technology has to be kept abreast with the amount of data flowing in.
The volume of Data coming from industrial robots, planes /trains, traffic signals and others is growing each day. And it won’t be a surprise, if very soon a customer service agent will take a call from a Robot complaining about a trouble ticket raised by the Robot about non-availability of an IT system!
Now, what about the Human generated data? Is this less complex? No, it is more complex. While machine generated data is huge in size, it is structured and amenable to deterministic algorithms. But human generated data is not. Multiple languages, different ways of saying same things, sarcastic comments, the maze of web sites and Face book and Twitter friends/followers connections, millions of e-mails with key messages buried under lots of useless text… and the list continues. Here the challenges are more of language processing, pattern recognition, key words extraction, link list generation types of algorithms. To process these we have to move from deterministic to Heuristic algorithms. TechM Health Care / BI groups were able to develop a neat algorithm to make sense of Facebook postings of customers who are first users of a new drug and found patterns and connections to decide which group is more susceptible to side effects etc. The formal customer survey with associated approvals for customer data processing would have taken months. We Humans are very happy to share feedbacks, personal interests, likes /dislikes on the Internet, but we are always guarded when asked to fill up a customer survey.
So, what exactly are we at TechM doing to tackle this situation? We have a dedicated DES or Digital Enterprise Services Group supported by Networks, Mobility, Big Data/Analytics, Cloud and Sensors/Social groups jointly, working together to help customers use these fantastic technologies at an affordable cost. You will hear more & more news in this space from TechM as we move along in this exciting area…
(L Ravichandran is the Chief Operating Officer (Communication Solutions: Americas & Row), and Global head of competencies & IMS at Tech Mahindra. He has an experience of over 35 years in building and delivering businesses to organizations across the globe. Ravi’s strength lies in creation of new business lines and he is known for his capabilities in creating high performance teams, delivering quality solutions and nurturing strong client relationships.)