Advice on How To Handle Big Data | Information Commissioner’s Office| G+T

29/08/2014

This article was first published on The Privacy Advisor on 26 August 2014.

A key role for privacy regulators around the world is provision of guidance about how to apply privacy principles to design appropriate privacy settings and options into new business applications. Although national privacy laws differ greatly, regulatory guidance is frequently relevant and useful across multiple jurisdictions and different legislative schemes. For this reason, many privacy professionals maintain a stash of copies of discussion papers and reports from privacy regulators around the globe. Privacy professionals advising as to big data applications are spoilt for choice: A flood of reports and analyses spills out from regulators and policy-makers around the globe. Volume of available material always exceeds reading time, so selectivity of reading matter is required.

One regulator that usually has interesting insights is the UK Information Commissioner’s Office (ICO). The ICO was an early voice of reason in the anonymisation debate, in particular with its November 2012 Anonymisation Code of Practice. The ICO recently returned to the topic, publishing in late July a discussion paper, Big Data and Data Protection.

The paper is a thoughtful and balanced analysis of good privacy practice applied to business applications of big data derived from personal information. Much of its 50 pages in length is devoted to discussion of Euro-centric issues, including when an organisation is a “data processor” under which processing is "fair" under the EU Directive 95/46/EC and its national implementation in the UK Data Protection Act 1998.

But this should not obscure the relevance of most of the discussion in the paper in other regulated privacy jurisdictions.

Many alleged anonymisation practices fail through poor understanding of organisational dynamics and because legal form takes precedence over appropriate data governance.

As the ICO notes in the paper, many applications of big data do not require use of personal data. However, a number of important applications may in some cases, including use of personally identifying data from monitoring devices on patients in clinical trials, mobile phone location data, data on purchases made with loyalty cards and biometric data from body-worn devices. There are also concerns expressed about what is sometimes called the "segment of one" marketing: fine-tuning the offer of products of services to an individual based on characteristics such as age, preferences, lifestyle, etc.

Many big data applications are based upon anonymisation of personal information before analytics is conducted. But academic literature continues to grow around studies that demonstrate cases where anonymisation is not effective to prevent the risk of reidentification of individuals.

Examples that have recently gained significant media interest include the uniqueness of our patterns of movements being used to reidentify individuals from anonymised, individual-level mobile phone data and the uniqueness of their gait being used to identify individuals from their Fitbit data streams.

The ICO engages with the frequent assertion that privacy regulation can’t keep up with data analytics. As the commissioner puts it, “We do not accept the argument that data protection principles are not fit for purpose in the context of big data. Big data is not a game that is played by different rules. There is some flexibility inherent in the data protection principles. They should not be seen as a barrier to progress but as the framework to promote privacy rights and as a stimulus to developing innovative approaches to informing and engaging the public.”

The commissioner returns to the core thesis of his office's 2012 Anonymisation Code of Practice, namely, that although it may not be possible to establish with absolute certainty that an individual cannot be identified from a particular dataset in combination with other data that may exist elsewhere, “The issue is not about eliminating the risk of reidentification altogether, but whether it can be mitigated so it is no longer significant. Organisations should focus on mitigating the risks to the point where the chance of reidentification is extremely remote.”

In mitigating that risk, effective and verifiable anonymisation will often be the key feature. Anonymisation may be used when data is shared externally or within an organisation. “For example, an organisation may hold a dataset containing personal data in one data store and produce an anonymised version of it to be used for analytics in a separate area. Whether it remains personal data will depend on whether the anonymisation 'keys' and other relevant data that enable identification are retained by the organisation.”

In this brave new world, organisational dynamics affecting data governance will increasingly be the determinant to whether big data may be ethically handled or is in compliance with laws. The commissioner refers to a Boston Consulting Group paper that argues that the potential gains from using big data are so great that the management of personal data, including issues of how data is collected, consent, purposes and security, is a C-suite issue, that is, an issue that should be addressed at chief officer level within an organisation: “This positioning of personal data issues means that there is a convergence between the data management agenda and the data protection and privacy agenda.”

So What Should Data Analytics Providers Do?

First, put in place an appropriate values-based framework and rigorous governance arrangements to effectively identify and appropriately quarantine uses of personal information from any anonymisation-based data analytics.

Second, ensure transparency of practices and fairness of disclosures by educating people as citizens and as consumers.

“This means explaining the benefits of the analytics, in terms of improved services, more relevant marketing or enhanced rewards, and looking to foster a value exchange, in which people are happy to provide data if they are informed and have trust in how it will be used,” the commissioner says.

Providers should not place undue reliance on formal or complex privacy statements or notices, which may not be read or understood by consumers and therefore don’t foster trust and may also be legally ineffective because they are insufficiently transparent.

Third, consider use of intermediaries including trust certification third parties or others that can assist is demonstrating that anonymisation and other risk mitigation implemented by a corporation is embedded and systemic and demonstrably reliable.

Fourth, be particularly careful in any repurposing of personal information. If an organisation has collected personal data for one purpose and then decides to start analysing it for completely different purposes, or to make it available for others to do so, then it needs to make its users aware of this. As the commissioner notes, “This is particularly important if the organisation is planning to use the data for a purpose that is not apparent to the individual because it is not obviously connected with their use of a service."

Fifth, don’t forgo use minimisation of personal information. Although a key feature of big data is using "all" the data, minimisation of use of personal information requires organisations to be clear from the outset as to what they expect to learn or be able to do by use of personal information, as well as satisfying themselves that the personal information is relevant and its use not excessive in relation to that aim.

The commissioner’s paper is a good and timely read. In this reviewer’s experience, many early data analytics applications simply don’t effectively quarantine uses of personal information from uses of anonymised data. This was partly because many applications were developed in jurisdictions or industry sectors that were not subject to significant privacy regulation and then then applied in more regulated jurisdictions without appropriate localisation to accommodate economy wide privacy rules.

Many alleged anonymisation practices fail through poor understanding of organisational dynamics and because legal form takes precedence over appropriate data governance. Some early applications of customer data analytics have been rudimentary in process design and frankly slipshod in project execution.

From better understanding of the issues and an appreciation by many corporations of the ethical imperative, the position is now quickly changing. Good practitioners of big data analytics will increasingly be able to differentiate themselves from the substandard early movers.