Posted by: Chitra Subramanian On June 11, 2020.

METADATA in GDPR compliance and enforcement

While the industry is moving towards data democratization and providing access of data to everyone to bring more value out of data, still ensuring data compliance is very essential to comply with data security for critical and sensitive data.

GDPR (General Data Protection Regulation) is Europe's digital privacy legislation. GDPR was approved by European Parliament in April 2016 & it came into force across the European Union in May 2018.

GDPR law enforces below:

  • Companies should protect EU citizens’ personal data
  • Avoid transfer of sensitive data outside EU

What data comes under sensitive data:

  • Social Security Number
  • Driving license number
  • Passport number
  • Biometric information
  • Etc...

Steps to make sure GDPR rules are followed in company:

  • Identify PII (Personally identifiable information) data across enterprise and ensure this identification is approved by stewards
  • Ensure the access of these data to only authorized personals
  • Maintain proper alerts for auto notification of any new PII data and access

For Enterprise level companies, appoint data protection officer (DPO) to ensure data privacy.

What is metadata:

At enterprise level, providing insights on below information.

  • Listing the assets
  • Data movement metrics
  • Execution metrics
  • Usage metrics across enterprise

How metadata helps in identifying PII data:

Data capture, storage, processing & usage of PII data needs additional care due to its sensitiveness.

“Metadata search” helps to locate and identify PII information across enterprise system. Semantic metadata search helps to identify even with similar metadata information related to PII.

“Metadata lineage”helps to identifyall the places where PII data is used both backward and forward. Metadata lineage draws chain of data movement all the way from source system till the consumption layer. This can be across multiple layers in multi-vendor environment in BI data usage value chain in an enterprise.If there are any manipulation or calculation using PII data in any layer can also be identified using attribute level data lineage. This helps to avoid missing of any sensitive data layer from compliance.

Once PII value chain is identified, access restrictions can be enforced for the identified PII data to make sure GDPR rules are satisfied.

Data masking can be enforced for data in rest and motion, once the PII data is identified across the system.

Challenges in identifying PII data without metadata management:

Companies having multi-vendor with multi instance BI environment is very common. Assets created over time are lying without proper governance in place. With these kind of environments, it is humanly impossible to identify and understand sensitive data and data flow across enterprise. It incurs lot of time and cost; accuracy will be a challenge as well.

Any change request (CR) / new project needs to ensure compliance is adhered. Manual way of identifying PII usage is not possible practically. Also, over the period, scanning of entire system may be required again to ensure the usage of PII data due to CI/CD.

While Data monetization increases revenue and market potential of the company, there has to be a clear boundary line to monetize the data due to sensitiveness of data.

Real time/periodic update of PII information and usage:

Accuracy of PII data identification is very important in GDPR, as violation of the law incurs lot of penalties. Metadata management plays a critical role in GDPR and PII data identification. Steward can ensure identification of PII data and usage tagging is proper. Regular report on sensitive data presence and usage can be automated using metadata management.

Also, real time/periodic refresh of metadata ensures the governance and agility. This helps in refreshing PII data identification and usage regularly, thus helps to ensure GDPR compliance.

About The Author

Chitra Subramanian

Chitra is an engineering/technology graduate in computer science with more than 15 years of experience in data & analytics space. Being an all-rounder across multiple analytics technology suites, she has profound understanding of data journey right from ingestion to reporting and cognitive computing/intelligence. As a versatile architect and a core SME in IP/solutions area in tech Mahindra, she is a prolific writer of technical blogs and thought provoking technical articles. Her current role includes digital solution conception, design, build and selling.