The First-Tier Tribunal and the anonymisation of clinical trial data: a reasoned expression of Englishness…. which would have to be abandoned with the GDPR?

The Queen Mary University of London v (1) The Information Commissioner and (2) Alem Matthees, EA/2015/0269 case decided by the First-Tier Tribunal (Information Rights) (FTT(IR)) on 12 August 2016 is a fascinating decision. [Could it be a stylish expression of Englishness…. or otherness?]

The case-facts concern a freedom of information request for clinical trial patient data that had been collated by researchers from Queen Mary University of London (QMUL) working on the PACE trial investigating treatments for chronic fatigue syndrome. This request was ultimately rejected by QMUL (not once but twice) after the results of an internal review. That rejection was then appealed to the Information Commissioner (IC). The FTT(IR) decision is, in turn, an appeal against that IC decision as contained in a Decision Notice dated 27 October 2015. The IC had decided that QMUL had “incorrectly applied sections 22A, 40(2), 41 and 43(2) of FOIA [Freedom of Information Act] to the withheld information” and had ordered QMUL to “disclose to the complainant the information to which it has applied sections 22A, 40(2), 41 and 43(2) of FOI.” After a lengthy and divided opinion, the FTT(IR) ultimately agrees by majority with the IC.

To understand the issues at stake, we must look first to the nature of the data (its source and how it was collated). The IC briefly describes the PACE trial in the following terms:

“The PACE (Pacing, graded Activity and Cognitive behaviour therapy: a randomised Evaluation) trial was a clinical trial carried out by the University which commenced in 2002. It was a large scale trial to test and compare the effectiveness of four of the main treatments available for people suffering from chronic fatigue syndrome (“CFS”), also known as myalgic encephalomyelitis (“ME”). The trial required the collection of large amounts of medical baseline and treatment results over the period 2005-2010 from the 640 patients who participated in it. Results from the PACE trial have been published in The Lancet. The University’s website (http://www.pacetrial.org/) provides further information and details about the trial. The Commissioner notes that the PACE trial has resulted in some public debate, with some organisations and individuals being opposed to the treatment methods used.” (as per the IC Decision Notice, paras. 12 and 15)

To note, much of the data was calculated from questionnaires completed by trial participants, and did not contain personal identifiers (direct or indirect) such as in respect of their location, gender or ethnicity. Participants were assured strict confidentiality of the data collected from them. On their consent forms, however, they were also informed, and agreed to, the possibility of such data being shared with non-QMUL (independent) scientists; these would be shared on request under confidentiality agreements by way of normal research collaboration.

The issuer of the freedom of the FOIA request, Mr Matthees, requested a “selection of baseline and 52-week follow up data on all 640 individual PACE Trial participants for which the data exists, in a spreadsheet or equivalent file with separate columns for each variable.” He then added “I am requesting only ‘anonymised’ data, I am not requesting any information which can identify individual participants (not even the participant ID numbers if those are deemed to be inappropriate to include, so long as each individual row only contains values from the same participant).” [Did Mr Matthees know what he really meant by anonymised data?].

The FTT(IR) was asked to solve 3 issues:

“a) Should an exemption be applied retrospectively? [QMUL wanted to to rely on section 22A of the FOIA providing an exemption in respect of information intended for future publication as a justification for withholding the requested information despite the fact that this provision had come into effect on 1 October 2014]

b) Is the requested data personal information, and is there evidence that participants could be identified from the requested material?
c) Would disclosure cause sufficient prejudice to QMUL’s research programmes, reputation and funding streams to refuse disclosure?”

To all these questions, the answer is NO!

For the purpose of this post we will only discuss the second question, a crucial question if one is interested in delineating with rigour the contours of the domain of data protection law – and in particular the UK Data Protection Act of 1998 (the DPA) – as it applies only to the processing of “personal data.”

Section 40 of the FOIA is thus at stake here. And for the sake of clarity, this is what it says:

“(2) Any information to which a request for information relates is also exempt information if—

(a) it constitutes personal data which do not fall within subsection (1), and

(b) either the first or the second condition below is satisfied.

(3)The first condition is—

(a) in a case where the information falls within any of paragraphs (a) to (d) of the definition of “data” in section 1(1) of the Data Protection Act 1998, that the disclosure of the information to a member of the public otherwise than under this Act would contravene—

(i) any of the data protection principles, or

(ii) section 10 of that Act (right to prevent processing likely to cause damage or distress), and

(b) in any other case, that the disclosure of the information to a member of the public otherwise than under this Act would contravene any of the data protection principles if the exemptions in section 33A(1) of the Data Protection Act 1998 (which relate to manual data held by public authorities) were disregarded.

(4)The second condition is that by virtue of any provision of Part IV of the Data Protection Act 1998 the information is exempt from section 7(1)(c) of that Act (data subject’s right of access to personal data).”

Although the IC and the FTT(IR) seem to be guided by good intentions, their motivation and the very way their reasoning is formulated in this case – in concluding that the data requested was indeed non-personal data – raises important questions. This is important, in particular, in the light of the debate triggered by the soon-to-be applicable General Data Protection Regulation (GDPR) [for an overview of the debate see our previous post].

The appellant, QMUL, made the point that “The data [to be released] is pseudonymised, not anonymised, and therefore is likely to constitute personal data. The data has “individual-level granularity” that gives rise to the “relatively high risk” of re-identification.” While it is debatable whether, in fact, the data under scrutiny should be considered pseudonymised [or even anonymised!] data under the GDPR (see Article 4(5) of the GDPR and its definition of pseudonymisation), the GDPR is obviously not yet applicable as it only comes into effect in May 2018. By comparison, nonetheless, the ICO said in its 2012 Code of practice on anonymisation, at page 21, that effective anonymisation can be reached through the route of pseudonymisation (which it defines as “[t]he process of distinguishing individuals in a dataset by using a unique identifier which does not reveal their ‘real world’ identity”) if it is carried out effectively. Besides, the IC takes the view on the facts of this case that “anonymization is plainly capable of rendering those individuals non-identifiable for two reasons: (i) The pool of participants is large, and the incidence of chronic fatigue/ME in the general UK population is around 1 %, therefore the class of potential trial participants ‘vastly exceeds’ the number of actual participants, rendering identification realistically impossible; (ii) The information is not directly linked to the individuals, as it comprises wide-ranging scores derived from participants’ self-reporting.” Is the IC, and therefore the FTT(IR) in endorsement, of the view that the question whether data has been rendered anonymised should only be answered by looking at the data and the data only? This answer should be negative and clearly expressed.
Three forms of identifiability from data were put forward by QMUL: (a) participants could self-identify; (b) those with details prior knowledge of participants (such as friends, family or medical practitioners – see the next bullet) could identify them; or (c) motivated intruders (such as campaigners or journalists) could identify participants and link this data to other information previously released. In terms of the first two routes to identification, the IC discarded these outright. First, because: “for an individual to be identifiable …it must be reasonably likely that another person can identify them from that information and other information that may be available to them”. Second, because, “it is unlikely that close friends and family will be motivated intruders”. [Why do we get the feeling that the IC is building up the size of the legal arguments just to knock them straight down again, because they don’t like the sound of them in essence? Remember, the definition of “personal data” under the Data Protection Directive (Article 2) is “any information relating to an identified or identifiable natural person … an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity” (emphasis added). And where is the reference to section 1 of the DPA (with its definition of personal data)? And what about a statement on what it means for the risk of re-identification to be “greater than remote,” to quote the standard of assessment under the DPA referred to by the IC and the FTT(IR)? The FTT(IR) remains strangely silent on these points…]
Although the impartiality of the expert evidence of Professor Ross Anderson (Professor of Security Engineering at the University of Cambridge) is questioned by the FTT(IR), the former does raise a very interesting point, which should not be dismissed too quickly. The motivated intruder test – described by the ICO in its Code of Practice at page 22 – is an early attempt to develop the foundations of a re-identification risk-based approach to anonymisation. Yet Professor Anderson comments that, “The ‘motivated intruder’ test is too weak to be applied to the present instance, as it must be assumed that the ‘attackers’ will have access to NHS systems or at very least care data information.” In other words, he is questioning the ICO’s guidance that, in assessing the level of re-identification risk that arises from the potential of an unknown party attempt to re-identify data subjects from data in combination with other information, it is always justified to start from the premise that “a motivated intruder is not assumed to have any prior knowledge, specialist knowledge or equipment or to resort to criminality to access data” (page 23 of the ICO Code of Practice). One reason that seems to justify this assumption being adopted by the IC and the FTT(IR) in this case is that the “hypothesis that identification is possible through combining that information with NHS data (involving an NHS employee both having breached their professional, legal and ethical obligations and also having the skill and inclination to so do) is implausible” in particular because “the notion of such a profound ethical and legal breach by an NHS employee is without warrant.” Although such a statement does show that both the IC and the FTT(IR) have a relatively broad understanding of the data environment (which goes beyond the data itself), it is questionable whether the analysis of the legal components of the data environment should always stop here. Contractual obligations between issuers and recipients of datasets should be taken seriously, in particular in cases in which transformed data (i.e. personal data transformed through the application of anonymisation techniques to it) retains individual-level data points from which the nameless individuals that they impart knowledge about may still be singled out. After all, QMUL was happy to share its data with other researchers for further analysis despite its argument that the data was personal data.
The IC seems to be starting from the assumption that it was for QMUL, the initial data controller, to demonstrate that the anonymisation practice followed was not satisfactory. Clearly, in refusing the request, QMUL had to evidence its argument that the data at stake was personal data in its opinion. But should this also have entailed it demonstrating that the data had not been effectively anonymised so that the risk of re-identification in the future (by any third party whatsoever) was remote? Couldn’t we consider that the confidentiality agreements concluded with future, independent academic recipients of the data – who would go through an approval process before receiving the data – were crucial to mitigate the re-identification risk? NO, say the IC and the FTT(IR)! In fact, the opposite was argued by the IC: QMUL’s provision of the data to independent scientists by way of research collaboration according to the limitational terms of signed confidentiality agreements was held to amount to an acknowledgement that anonymisation was effective. Why is that? Could it be that it is because the greatest deterrent against re-identification risk is the application of the DPA itself? Lastly, does it make sense to say that anonymisation aims to ensure confidentiality and … obligations of confidentiality vanishes with anonymisation?