To be identifiable or not to be identifiable – to what extent do our identities merit concealment through law in light of the capabilities of modern technologies?
The UK Information Commissioner’s Office (ICO) has recently published guidance on what to do when handling requests for information in respect of which personal data must first be removed before it can be disclosed safely. In other words, this issue relates to information that requires further processing by a data controller to ensure that individuals cannot be identified from it subsequently by the recipient.
The guidance is particularly relevant to bodies involved in responding to a freedom of information or environmental information request, those authorities proactively publishing data as part of a publication scheme or otherwise making data available, and/or organisations releasing data in response to a subject access request under the Data Protection Act 1998 (DPA). This is because disclosing personal data in such situations (advertently or inadvertently – intention is not relevant) would breach data protection principles. To these ends, the guidance gives some examples of the most common types of inappropriate disclosures of personal data that the ICO has seen, as well as other types of disclosure of which data controllers should be aware.
Under the aptly-named heading “hiding in sight”, the ICO draws attention to the fact that there are many ways in which personal data in a file may not be immediately visible on screen and subsequently disclosed without the error rectified. In particular, the ICO refers to the many hidden dangers of modern software packages – such as Office suite software tools – with the following example: “A chart or summary table might not appear to contain any personal data on the surface, but it could in fact have a copy of the individual data points embedded within and allow this data to be made accessible with nothing more than a couple of clicks.” The ICO then points out specific mistakes that can be made in the redaction process.
To understand the legal debate that underlies the ICO’s guidance and why it is important, we need to return to the definition of personal data under the DPA. The DPA defines ‘personal data’ as “data which relate to a living individual who can be identified from that data or from that data and other information which is in the possession of, or is likely to come into the possession of, the data controller…” (Section 1). The ICO has previously given its interpretation of this definition – see, for example, its personal information online code of practice originally issued in 2010 to provide organisations with a practical approach to protecting individuals’ privacy online. In this document, the ICO’s opines that personal data is being processed where information is collected and analysed with the intention of distinguishing one individual from another and to take a particular action in respect of an individual. This can take place even if no obvious identifiers, such as names, email addresses or account numbers, are held.
A practical difficulty related to this statement is that, under this approach, the data controller has to assess a wide range of ‘non-obvious identifiers’ to make a determination whether they are personal data, the processing of which would subject their data controller(s) to obligations under the DPA. These include online cookie files or IP addresses, that are linked to an internet-connecting device primarily, and only secondarily a particular user (such as through information indicating individuals’ online activity generated through analysis of their use). Furthermore, such a device may in fact be used by a group of users rather than a single user, making a correct identification linkage between the non-obvious identifier and that person even less likely (i.e. there is a reduction in certainty of ‘identifiability’ from the data, whether that legal term is construed narrowly or widely; see Article 2 of the Data Protection Directive and its definition of personal data as meaning “any information relating to an identified or identifiable natural person“, together with Recital 26).
[In its guidance, the ICO refers to such non-obvious identifiers as ‘metadata’ as follows: “So-called meta-data or ‘data about data’ is embedded within the file and can include information such as previous authors, changes made to previous versions, comments or annotations.” In other words, metadata – as referred to in this context – is the background information that details the history of an electronic document: its creation date, amendments, location, who has accessed it and who has printed it. This can remain retrievable long after they appear to have been erased, even though a reasonable effort might be required to retrieve them].
To give an example of a data controller that receives a data subject access request from an individual (whose identity is subsequently verified by the organisation as being who they say they are), there are practical difficulties in granting subjects access to information of this sort. The ICO previously expanded upon these difficulties in its 2010 code of practice as aforementioned (my highlights):
“In particular, there will be many cases where an organisation only holds non-obvious identifiers and either has no interest in, or no certainty of, the “real world” identity of an individual. While these identifiers may be personal data, there is a major privacy risk inherent in granting subject access to information that is only logged against a non-obvious identifier, because the information held is linked to the device used to go online, rather than directly to the person using it. Accordingly, the organisation holding the information may not be able to determine with any degree of certainty whether the information requested is exclusively about the person making the subject access request, or about a group of people using the same device to go online.”
The ICO goes on to recommend that, “Where a reliable link between the subject access applicant and the information held cannot be established, and where, therefore, there is an obvious privacy risk to third parties, the Commissioner would not necessarily seek to enforce the right of subject access unless there is a genuine risk to an individual’s privacy if he fails to do so.”
Returning to its current guidance, it is useful to compare, therefore, how the ICO’s recommendation is particularised with respect to certain “complex file types [that] in their raw form can contain an amount of meta-data that may not be appropriate for disclosure” mentioned therein. These include:
- Emails that contain metadata about sender and recipient(s), for example, as well as a record of the route used for delivery; and,
- Photographic and video images that can contain the GPS coordinates of where the image was taken, time and date, as well as other data about the device (such as smartphone) used to take the image. Indeed, the ICO highlights that the metadata that can be contained within image file formats deserves special mention due to its potential sensitivity in relation to – not just the individuals captured in the image – but also the individual who took the image.
In relation to the latter category, the ICO states that when disclosing images of individuals, consideration should be given to whether the identifying features of any of the other individuals (i.e. third parties) need to be obscured. However, in referring to its earlier advice in its data protection code of practice for surveillance cameras and personal information, the ICO concludes that “In most cases the privacy intrusion to third party individuals will be minimal and obscuring images will not be required. However, consideration should be given to the nature and context of the footage”. In the guidance, the ICO adds to this statement by pointing out that “It can be more complex to obscure information from video in part due to the larger volume of data. CCTV footage stored in proprietary formats or low frame rates may also present difficulties.” [What are we to make of this? The ICO seems to be sitting on the fence here somewhat about the extent of the risk that may be present (including hidden risks), and risk mitigation measures required, where images are involved…].
The guidance also gives an overview of the ICO’s interpretation of how to satisfy the DPA’s legal requirements in removing personal data from information being disclosed. These include the following ‘solutions’:
- Export data to simple file formats such as CSV (Comma Separated Value) and inspect the file to highlight potentially unauthorised disclosures.
- Disclose a printed version of an email message or print-as-PDF version.
- Redact information within an individual image through the use of (what the ICO describes as) a ‘simple’ image editing tool included with most operating systems. However, the ICO also reiterates that achieving the effective redaction of the personal data of third parties from video images is likely to require the use of a specialist software tool. [While the ICO confirms that this task can also be contracted out to another organisation, it also includes a reminder that if it does so and becomes a ‘data processor’, this will require certain measures to be put in place, such as a written contract, in order to comply with Principle 7 of the DPA on managing the security of personal data].
- Ensure that any redacted image is exported to a simple ‘un-layered’ format to ensure that the redactions are permanent.
[Of course, to note, the use of metadata, or other flagging, can also help a data controller to determine the data that can be released automatically in response to a request and that which needs prior assessment by the data controller. This is a point that the ICO doesn’t pick up in this guidance]
In summary, the guidance is pitched at a rather basic level and in places readers may find that the ICO’s guide serves to further muddy already murky waters around when non-obvious identifiers (such as IP addresses) should be deemed personal data under the DPA and their processing therefore subject to data protection principles.
On the other hand, the preference of the ICO for promoting that data controllers take a risk-based perspective depending upon the circumstances at hand when considering whether personal data are contained within information is clear (and consistent with its remarks upon the proposed EU General Data Protection Regulation in this area – see my post here). See, for example, its statement in the guidance that “When considering a request to re-use information that contains personal data or redacted information public sector bodies should not automatically assume that the redactions made to the previously disclosed information are sufficient in the context of responding to an application for re-use”.
Also noteworthy is the fact that the ICO even goes so far as to admit that the long tail of identifiability risk extends way beyond the traditional domain of (obvious or non-obvious) unique identifiers by which data subjects can be identified in the following statement: “It is also worth considering whether information that you have not redacted may still result in someone being identified, clothing, skin or hair colour for example.” This is in keeping with a modern data analytic perspective on forensic science in a big data era. According to this view, the more data points (observations) that can be generated in association with an individual, the more insightful the data correlation that can be generated using data algorithms are likely to be (and hence the more certainty or reliability that can be assigned to a postulated identification connection).
After all, it is no longer a question of whether we are identifiable in our modern lives, but rather when by law our identities merit concealment – and to what extent (while recognising that concealment is an imperfect process), which in turn is dependent on a myriad of risk factors. This was the very topic I discussed at a very interesting workshop involving the UK’s leading identity specialists in Bristol the week before last. For more information, see here.
Alison Knight
Pingback: Law and Media Round Up – 9 November 2015 | Inforrm's Blog