A major health data plan is on the verge of being called off, to never have a chance again. It is supposed to anonymise all the patient records in the National Health Service (NHS) in the UK, linking them together into one single, giant database, and making them available under controlled use conditions to health researchers and (controversially) to commercial companies too. Public outcry has led to the plan being delayed for six months.
In an article published in The Guardian last week, Ben Goldacre, a medical doctor and high-profile media commentator on science matters, rightly identifies what the point is: in principle, the public accepts release of data for scientific purposes, but resists commercial exploitation. And rightly so: medical knowledge results from the study of several cases, and the higher the availability of cases, the more accurate the results; in the era of big data, it is also clear that aggregation and sharing of a wealth of data such as those held by the NHS is a unique opportunity for medical science to discover ways of saving lives. On the other hand, use of data for any other purposes looks much more opaque, and people understandably feel it might lead to discrimination and potentially negative individual consequences, for example if disclosure of the health history of a person results in higher insurance premiums, or rejection of job applications.
It is a recent, worldwide trend to rely more and more on administrative records, such as the health records of the NHS, for statistical and research purposes. It is not just health but also education (school records), pensions, taxation, prisons and so on. Digitization of public services and better record-keeping, together with improved computational and data storage capacity, make it possible to aggregate and link multiple sources: for example in the case of health, these could be visits to the GP, hospital episodes, vaccination, treatments received, blood test results and so on. This is a gold mine for research, as it enables, for example, to correlate blood tests and treatments on a very large scale, so as to make a more accurate assessment of the possible side effects of a drug. These data are cheaper to assemble and maintain that old-fashioned surveys, and are widely thought to be more accurate: a patient who is asked to tell their medical history to an interviewer may forget, misrepresent or even deliberately conceal something, while official records would tell the full story. The trend toward greater use of administrative data in public services parallels the exponential growth of big-data record-keeping by private firms.
There is a lot of uncertainty, though, on the way administrative records are handled and on the protection of confidentiality. Individuals are still rarely asked for their informed consent to have their data (whether it is medical, educational or tax / pension related) re-used afterwards, contrary to what is routinely done for surveys, experiments and clinical trials. Though the practice of seeking informed consent is slowly spreading, it still does not cover older data. Re-use is also a complex matter. While the principle of re-use for non-commercial research only is widely shared, it is sometimes difficult to define what research is. Shall we interpret it restrictively to include only University-like institutions? What about charities, research services in government departments or other public agencies, trade unions, and international instuitutions like the OECD? Sometimes it may even seem illogical to leave out private companies altogether — you may think, for example, that use of data by pharmaceutical companies serves the public good, and should be authorised, if it helps to better assess the side effects of a drug.
Use of public-sector data for research purposes may seem to many a complex matter, of interest only to a bunch of lawyers and academics; but in fact, it is something that affects us all, all the more so as use of administrative data grows in size and importance. We want to facilitate research that benefits us as patients, students, pensioners or tax-payers, and we do not want our personal information to be misused. We are all caught in this tension, and this is why we should all care.