Some time ago, I wrote a post on ethical issues in research with secondary data – a somewhat grey area, where students and scholars often feel guidance is insufficient. Even more complex is research with internet data – neither primary nor secondary strictly speaking, but “big” data. A recent case fuelled an international debate on how researchers should deal with data that are, apparently, accessible to all on the web: a Danish graduate student published a large dataset of users of the online dating site OkCupid (he apparently did so without any institutional backing, and Aarhus University where he studies, is now on the case). Michael Zimmer, a specialist of information studies and the policy and ethics of online research, properly summarizes the issues in a recent Wired article:
- Don’t say that “the data are already public”. The fact that OkCupid users knowingly share some personal information, does not mean they consent to it being used for purposes other than interactions with other users on that site. By scrapping data, one may be able to put together the whole history of users’ presence on that platform, revealing more of their life or personality than they themselves are aware of. More dangerously, data extracted in this way might in some cases be matched with other information, thereby potentially becoming much more disclosive than what the persons concerned ever intended or agreed. And the disclosure may be aggravated by releasing the data outside the platform.
- As Michael Zimmer says clearly, Public does not equal Consent. Some web services include provisions for re-utilisation (notably for research purposes) in their ToU, and in such cases, consent may safely be presumed. It is not always the case, though, and researchers should be very careful not to presume consent when there is no ground for it.
- The key principle of research ethics should always be “Do no Harm”. One may think that OkCupid users were unwise to disclose personal information in an insufficiently protected way: they didn’t expect their personal information to be so easily exported outside the site. But precisely for this reason, researchers should not take advantage of their weakness to intervene in ways (like releasing their data to everybody) that may cause them more harm than their sheer presence on OkCupid.
In addition to Zimmer’s comments, I would add a point on institutional responsibilities. Aarhus University was quick to state that they do not endorse the student, that he is not an employee of the University, and that he acted on his own. Fair enough. And perhaps the student’s act was not even research after all — but just an attempt to reach fame in this digital world where attention is a scarce commodity. Yet as a former programme leader for a doctoral study programme, I feel this case urges us all (not just at Aarhus but at all Universities) to rethink our duties. We do teach them ethics and check their research proposals for ethics; and yet, we should perhaps do more.
On the one hand, we often confine ourselves to teaching about old-style primary and secondary data collection, while we should raise more awareness of ethical issues with new (big) data. On the other hand, we should insist more on the generality of ethical principles. Ethical criteria and values learned for the specific purposes of research, like the do-no-harm principle, do not cease to apply when we do other things. What students are taught as part of a degree, especially a doctoral degree, should be part of their life more broadly, whether they are acting in a professional (research) or a personal capacity. In this sense, this case is a call for more attention to ethics in all (research) degrees.
Sure, this is an extreme case — but it clearly reveals the broader perils of not placing ethics at the centre of all the data buzz – data science, data research, and so on – of today. This is by the same token a direct call for more ethical education in doctoral degrees. For all its potential, big data does not necessarily imply an improvement in our understanding of the world surrounding us: left uncontrolled, it can just produce bad science.