It is often believed that use of secondary data relieves the researcher from the burden of applying for ethical approval – and sometimes, from thinking about ethics altogether. But the whole process of research involves ethical considerations, whether or not any primary data collection is involved. This starts from the initial design of the study, which should aim at the public good (and at the very least should do no harm) and continues until communication of results, which should ensure transparency, publicness and replicability. More specifically, what ethical issues will the data collection and analysis stages involve, when secondary data are used?
Secondary data are usually defined as those that were collected as part of a different research, with purposes other than those of the present study. They may be official statistical data (census for example, but also, increasingly, administrative data), data gathered by commercial operators (time series of stock prices for example), and researchers’ data from past projects. They are more often quantitative, although secondary analysis of qualitative data is becoming more and more common.
Weighing risks and benefits
Use of secondary data is in itself, a highly ethical practice: it maximizes the value of any (public) investment in data collection, it reduces the burden on respondents, it ensures replicability of study findings and therefore, greater transparency of research procedures and integrity of research work. But the value of secondary data is only fully realized if these benefits outweigh the risks, notably in terms of re-identification of individuals and disclosure of sensitive information.
For this to happen, use of secondary data must meet some key ethical conditions:
- Data must be de-identified before release to the researcher
- Consent of study subjects can be reasonably presumed
- Outcomes of the analysis must not allow re-identifying participants
- Use of the data must not result in any damage or distress