It is often believed that use of secondary data relieves the researcher from the burden of applying for ethical approval – and sometimes, from thinking about ethics altogether. But the whole process of research involves ethical considerations, whether or not any primary data collection is involved. This starts from the initial design of the study, which should aim at the public good (and at the very least should do no harm) and continues until communication of results, which should ensure transparency, publicness and replicability. More specifically, what ethical issues will the data collection and analysis stages involve, when secondary data are used?
Secondary data are usually defined as those that were collected as part of a different research, with purposes other than those of the present study. They may be official statistical data (census for example, but also, increasingly, administrative data), data gathered by commercial operators (time series of stock prices for example), and researchers’ data from past projects. They are more often quantitative, although secondary analysis of qualitative data is becoming more and more common.
Weighing risks and benefits
Use of secondary data is in itself, a highly ethical practice: it maximizes the value of any (public) investment in data collection, it reduces the burden on respondents, it ensures replicability of study findings and therefore, greater transparency of research procedures and integrity of research work. But the value of secondary data is only fully realized if these benefits outweigh the risks, notably in terms of re-identification of individuals and disclosure of sensitive information.
For this to happen, use of secondary data must meet some key ethical conditions:
- Data must be de-identified before release to the researcher
- Consent of study subjects can be reasonably presumed
- Outcomes of the analysis must not allow re-identifying participants
- Use of the data must not result in any damage or distress
The value and expertise of data service organisations
Major public and non-for profit data producers such as national statistical institutes (Britain’s ONS, France’s INSEE, etc.), and large research-led data collection enterprises (such as the European Social Survey, ESS) are aware of these aspects and have set up services and infrastructures that archive, manage and release data for secondary analysis, fully in line with the above principles. Examples are the UK Data Service, France’s Quetelet, and members of the CESSDA consortium of data archives in other European countries, with funding from the EU as well as national research councils or ministries.
So after all, it is relatively easy for a researcher to comply with the basic principles of ethical use of secondary data if data is accessed through one of these infrastructure services: the burden is shifted from the researcher onto the data service organisation, so to speak, and the researcher should just follow the guidance provided. Under these conditions, one might argue that the need for ethical approval can safely be waived (though there is no uncontroversial view on this point).
Issues are more likely to arise with data that were collected outside of such frameworks, namely without ethical approval, or with ethical approval not including provisions for later researchers to engage in secondary analysis. In such cases, and especially if the data are at micro level, researchers should be particularly careful in their consideration of possible risks – and ethical approval may well be needed.
Highly aggregate data (such as macroeconomic data and financial time series) are by their very nature less likely to involve risks of re-identification of individuals or disclosure of sensitive information. But especially if data come from private-sector sources, intellectual property issues might arise, as well as potential conflicts of interest; and there is a risk of mis-interpretation if the data were not appropriately documented by the original collector.
Primary data collections that open the way to secondary analyses
These considerations are not only important for researchers who engage in secondary research themselves, but also for those who do primary data collection, and aim to archive their data and make them available for future re-use. Archiving can, and should, be done by all – not just the largest data collection consortia. In such cases, it is essential to take advice from a professional data service (like a CESSDA member organisation) from the early stages of the research, to plan the whole data lifecycle in a way that is ethical, legal, and value-maximising.
- UK Data Service (third edition 2011), Managing and Sharing Research Data, Sage.
- A. Hundepool et al (2012), Statistical Disclosure Control, Wiley (see especially Ch. 2).
- ESRC (2015), Framework for Research Ethics.
- UK Data Service (2015), List of resources on data-related legal and ethical issues.
- V. Morrow, J. Boddy, R. Lamb (2014), The ethics of secondary data analysis, Novella Working Paper.
- L. Bishop (2012), Ethical issues in the secondary analysis of qualitative data, Presentation.
- A. Grinyer (2009), The ethics of the secondary analysis and further use of qualitative data, Social Research Update, 56.