Help IFF form it's submission on non-personal data!

aparatbar · July 13, 2020, 5:12am

Folks,

The Ministry of Electronics and IT is inviting suggestions on a draft report on non-personal data. The last date for submission is August 13, 2020.

Our previous views
We have previously indicated concern on non-personal data specially the carve-out (exemption) that exists within a draft of the Data Protection Bill. This would provide that data which is classified as, “non-personal data” will not come within the governance framework of any future regulation of the Data Protection Bill.

Non-personal data is also a contentious issue and can pose problems for large data sets held by government such as the sale of databases that related to drivers licenses and vehicular registration (link).

Let us know what you think!
But more importantly we are interested in hearing from you as we draft our submissions.

What research papers or analysis would you like us to read and consider?
Do you have any views on the benefits and issues of, “non-personal” data?
Any specific comments or suggestions you have on the draft report?

Important links
The link to the consultation is available here, and a PDF of the draft report is available here.

shubhamjain0594 · August 3, 2020, 12:58pm

Hi Apar,

In my understanding so far, identifying what is personal data is very closely related to how difficult it is to re-identify someone from that dataset. Here is some academic research that I feel is relevant to this discussion:

These are one of the first large-scale studies w.r.t uniqueness of certain behavioral datasets and limits of anonymisation. Summary: 4 random spatiotemporal points are enough to uniquely identify an individual.

This paper shows that 15 demographic attributes are enough to uniquely identify 99.98% of Americans in any dataset.

Estimating the success of re-identifications in incomplete datasets using generative models

Do let me know if this is useful and in direction of what you are looking for.

tanyavarshney · August 7, 2020, 3:55pm

In my opinion, the term ‘non-personal data’ is rather ambiguous, specially in the times of modern technology. Here’s why:

A. Personal data has been defined as information which is identifiable to a person. Thus, non-personal data would be information which is not identifiable. The ambiguity kicks in when we are ascertaining what degree of identifiability is needed to categorize information as personal or non-personal data.

B. In this article written by me https://intellectechlaw.in/2020/06/26/blockchain-and-the-data-protection-dilemma-compatibility-of-gdpr-with-blockchain-system/ , I discussed the challenges of blockchain systems to comply with the GDPR. One of the points discussed here is the scope of personal data. Quoting from the article:

"The basic premise for the applicability of GDPR rests on the data processed being “personal data”, that is, the data must be in relation to an identified or identifiable natural person. There are some concerns whether the GDPR would be applicable to blockchain servers where data is stored as encrypted or hashed data. The primary question would be whether the data stored would be identifiable to a natural person. GDPR recognizes the concept of ‘pseudonymisation’, that is, the processing of personal data in a way that it can no longer be identifiable to a specific data subject. However, it must be noted that pseudonymised data is not precluded from the data protection measures under the GDPR but is acknowledged as a recommendation to reduce the risks with respect to privacy of the data subjects. The EPRS also notes that even encrypted data would ‘likely’ qualify as personal data under GDPR as it is difficult to assess whether the encrypted data has been sufficiently anonymised. Additionally, Recital 26 to the GDPR also notes that pseudonymised personal data which could also be attributable to a natural person by use of any additional information would also have to comply with the GDPR obligations. Additional information could include internet protocol addresses, cookie identifiers, radio frequency identification tags, etc. as these may leave traces which may allow data to be attributable to a particular data subject.

Blockchain servers often use public keys which are essentially a string of letters and numbers that represent each user’s data – somewhat similar to an account number. There are also private keys, also letters and numbers, but somewhat similar to passwords. While the data stored in blockchain servers is encrypted or hashed, based on the technical design, such data could also be decrypted by the use of private keys. There is definitely some uncertainty regarding the degree of identifiability the data must have to come under the meaning of personal data. In this regard, the EPRS suggests that the appropriate test would be whether the controller or another person are able to identify the data subject in using all the ‘means reasonably likely to be used’. "

C. In the Indian context, there is a further lack of clarity. The EPRS has suggested using the test of identifying the data subject using all means reasonably likely to be used. Thus, more or less, whether the data has been permanently anonymysed. Meanwhile, no such stance has been taken in India.

D. In machine learning systems, algorithms are fed to AI systems, which is again encrypted data. A pertinent question then is, are these machine learning systems capable of ‘forgetting’ or erasing the data? Whether such data could be decrypted? Here is another relevant article
(https://intellectechlaw.in/2020/07/28/is-india-equipped-to-defend-privacy-in-the-era-of-artificial-intelligence/)

GargiS · August 10, 2020, 5:00pm

I found some research relevant to non-personal data.

The General Data Protection Regulation in the Age of Surveillance Capitalism

We consider the legal status of these ownerless forms of data, arguing that data protection techniques such as anonymization and pseudonymization raise significant concerns over the ownership of behavioral data and its potential use in the large-scale modification of activities and choices made both on and offline.

They who must not be identified—distinguishing personal from non-personal data under the GDPR

Notwithstanding the pivotal importance of the distinction between personal and non-personal data, it can, in practice, be extremely burdensome to differentiate between both categories. This difficulty is anchored in both technical and legal factors. From a technical perspective, the increasing availability of data points as well as the continuing sophistication of data analysis algorithms and performant hardware makes it easier to link datasets and infer personal information from ostensibly non-personal data. From a legal perspective, it is at present not obvious what the correct legal test is that should be applied to categorize data under the GDPR.

GargiS · August 10, 2020, 5:01pm

This is the most radical of the research making a case for how there isn’t a separation between non-personal and personal data.

The law of everything. Broad concept of personal data and future of EU data protection law

a plausible argument that in the near future everything will be or will contain personal data, leading to the application of data protection to everything: technology is rapidly moving towards perfect identifiability of information; datafication and advances in data analytics make everything (contain) information; and in increasingly ‘smart’ environments any information is likely to relate to a person in purpose or effect. At present, the broad notion of personal data is not problematic and even welcome. This will change in future. When the hyperconnected online world of data-driven agency arrives, the intensive compliance regime of the General Data Protection Regulation (GDPR) will become ‘the law of everything’, well-meant but impossible to maintain. By then we should abandon the distinction between personal and non-personal data, embrace the principle that all data processing should trigger protection, and understand how this protection can be scalable.

aparatbar · August 11, 2020, 2:02am

Thank you so much folks! These suggestions incredibly helpful!

@GargiS @tanyavarshney @shubhamjain0594

Rohin_Garg · January 6, 2021, 8:28am

A revised draft report has recently been released based on the suggestions provided in the first round of consultations. We do plan to dig into it soon ourselves, but in the meantime we would once be grateful for any research, articles, or opinions and comments on the report as this would help us better frame our own analysis.

@GargiS @tanyavarshney @shubhamjain0594

Rohin_Garg · January 18, 2021, 12:55pm

We recently provided our comments on the second version of the draft report. In our comments, we highlighted four key issues:

Ambiguity surrounding non-personal data
Issues surrounding anonymisation
Over-extraction of data and a race to the bottom
Growing fatigue and lack of trust in consultation processes

Are there any other issues that we’ve missed out? Please do let us know. You can also share your comments directly here.