Research ethics in secondary data: what issues?

It is often believed that use of secondary data relieves the researcher from the burden of applying for ethical approval – and sometimes, from thinking about ethics altogether. But the whole process of research involves ethical considerations, whether or not any primary data collection is involved. This starts from the initial design of the study, which should aim at the public good (and at the very least should do no harm) and continues until communication of results, which should ensure transparency, publicness and replicability. More specifically, what ethical issues will the data collection and analysis stages involve, when secondary data are used?

Secondary data are usually defined as those that were collected as part of a different research, with purposes other than those of the present study. They may be official statistical data (census for example, but also, increasingly, administrative data), data gathered by commercial operators (time series of stock prices for example), and researchers’ data from past projects. They are more often quantitative, although secondary analysis of qualitative data is becoming more and more common.

Weighing risks and benefits

Use of secondary data is in itself, a highly ethical practice: it maximizes the value of any (public) investment in data collection, it reduces the burden on respondents, it ensures replicability of study findings and therefore, greater transparency of research procedures and integrity of research work. But the value of secondary data is only fully realized if these benefits outweigh the risks, notably in terms of re-identification of individuals and disclosure of sensitive information.

For this to happen, use of secondary data must meet some key ethical conditions:

  • Data must be de-identified before release to the researcher
  • Consent of study subjects can be reasonably presumed
  • Outcomes of the analysis must not allow re-identifying participants
  • Use of the data must not result in any damage or distress

Continue reading “Research ethics in secondary data: what issues?”

Big data and history

archive_pic
A paper archive – more and more often replaced by digitised versions today.

Yesterday at Biblithèque Nationale de France, I took part in a panel discussion  on longue durée in history, organised by the Revue Annales – Histoire et Sciences Sociales. Of course I am not a historian, and I wouldn’t be able to tell whether one interpretation of longue durée is better than another. But historians are now raising questions that are common to the social sciences and humanities more generally: how to benefit from big data and how to re-think the political engagement of the researcher. So I was there to talk about big data and how they change not just research practices and methods, but also researchers’ position relative to power, politics, and industry. This questions cross disciplinary boundaries, and all may benefit from dialogue.

BellinghamCB001
Collection of older sources is now often online and enables application of new methods.

What ignited the historians’ debate was an attempt by two leading scholars, David Armitage and Jo Guldi, to restore history’s place as a critical social science, based on (among other things) increased availability of large amounts of historical data and the digital tools necessary to analyze them. Before their article in Annales, they published a full book in open access, the History Manifesto, where they develop their argument in more detail. Their writing is deliberately provocative, and indeed triggered strong (and sometimes very negative) reactions. Yet the sheer fact that so many people took the trouble to reply, proves that they stroke a chord.

What do they say about big data? They highlight the opportunity of accessing large and rich archives and to expand research beyond any previous limitations. Their enthusiasm may seem excessive but it is entirely understandable insofar as their goal is to shake up their colleagues. My approach was to take their suggestion seriously and ask: what opportunities and challenges do data bring about? How would they affect research, especially for historians?

Continue reading “Big data and history”

“Data for Humanity”: a simple message, but so necessary

The recent VW emissions scandal says it all: even a large company can’t get away with behaviours that disrespect key societal values. Protection of the  environment is among these values today, so much so that not only public authorities step in to defend it, but even markets punish the transgressors.

Data protection is not (yet) such a value. Admittedly, some associations, individuals, and government officials fight for it, but the larger public is still unsure. It’s not that people don’t care, but that uncertainty as to what data are actually collected, for what usages, and by whom, is overwhelming; and it becomes difficult to identify the best course of action.

In this context, a new initiative is most welcome: an open letter on “Data for Humanity“, initiated by two scholars of the University of Frankfurt, pleads for a more responsible use of data. The message is simple: Do no harm. And if you can, on top of it, do something good. It’s so simple, and so necessary.

Sure, the world won’t change after this letter, but it will be a first step. Even the promotion of environmental protection started with simple, basic declarations, 30-40 years ago; and it was by insisting and perseverating, that it finally gained the conscience of everybody.

New publications on big data and official statistics

National Statistical Institutes (NSIs) have long been the recognised repositories of all socio-economic information, mandated by governments to collect and analyse data on their behalf. The development of big data is shaking this world. New actors are coming in and commercially-oriented, privately-produced information challenges the monopoly of NSIs. At the same time, NSIs themselves can tap into digital technologies and produce “big” data. More generally, these new sources offer a range of opportunities, challenges and risks to the work of NSIs.

OpendataThe Statistical Journal of the IAOS, the flagship journal of the International Association for Official Statistics, has published a special section on big data – of particular interest to the extent that it is free of charge!

Fride Eeg-Henriksen and Peter Hackl introduce this special section by defining big data and emphasising its interest for official statistics. But it is crucial,  albeit admittedly not easy, to separate the hype around big data from its actual importance.

The other papers are concrete examples of how big data may be integrated into official statistics:

Continue reading “New publications on big data and official statistics”

The data of my friend are my data

The rise of digital data, particularly data from the internet, is to be understood in social relational perspective. Online interactions – from email exchanges to use of VOIP services and participation in social media such as Facebook, Twitter and LinkedIn – make people’s social connections explicit and visible. The “social network”, once a metaphor used only in a small sub-field within sociology, is now familiar to everybody as the archetype of computer-mediated social interaction. Digital devices systematically record network structures, so that social ties become an essential part of every individual profile, and users are more and more aware of them.

One consequence of this is the booming popularity of network analysis concepts, which support the algorithms that handle digital data: for example, centrality measures are at the heart of search engine functionalities, and transitivity measures found “friend-of-a-friend” algorithms in social media. In passing, social network analysis itself which had been originally developed for small-sized, non-digital datasets (like surveys about friendship in schools) has undergone a major upgrade to account for social data from the web.

FOAFMore importantly, the relational nature of digital data and the underlying possibilities to use social network analysis, open up new avenues for data collection. If user B publishes a post on, say, their Facebook wall, comments and “likes” received from their friends A, D and E will be connected to the profile of B, accessible and visible from it; in other words, it is possible to retrieve information on A, D or E through the profile of just B. In general social networks, a friend of my friend is my friend; in digital networks, the data of my friends are my data.

Continue reading “The data of my friend are my data”

Philosophy of data science

The “Impact of Social Science” blog of the London School of Economics has, in the past few weeks, published a  series on “Philosophy of data science“. Each installment is an interview conducted by sociologist Mark Carrigan with a key contributor to the social science reflection on data.

BigData

Continue reading “Philosophy of data science”

The power of survey data: Eurostat Users’ Conference

survey3In the age of big data, social surveys haven’t lost their appeal and interest. Surveys are the instrument through which governments, for a long time, have gathered information on their population and economy to inform their choices. Interestingly, surveys conducted by, or for, governments are the best in terms of quality and coverage: because significant resources are invested in their design and realization, and especially because participation can be made compulsory by law (they are “official”), their sampling strategies are excellent and their response rates are extremely high. (Indeed, official government surveys are practically the only case in which the “random sampling” principles taught in theoretical statistics courses are actually applied). In short, these are the best “small data” available — and their qualities make them superior to many a (usually messy) big data collection. It is for this reason that surveys from official statistics have always been in high demand by social researchers.

Continue reading “The power of survey data: Eurostat Users’ Conference”

Data and social networks: empowerment and new uncertainties (in Italian)

I gave a presentation on the topic of “Data and social networks: empowerment and new uncertainties” at the Better Decisions Forum on Big Data and Open Data that took place in Rome on 12 November 2014. The event brought together six speakers from different backgrounds on a variety of topics related to data, and participants were businesspeople, public administration managers, journalists, data and computer scientists.

Here is a video of my talk:

 

 

Unfortunately as you will have noticed, the slides are not always very clearly visible, so it’s better to download them from their original source:

slide-1-638

 

My interview before my talk:

 

 

See? I am trying to stick to my 1st-January commitment of blogging more this year…

“Pro” ana? Sociability and support in eating disorder online communities

This article was first published on Discover Society, November 2014.

Last June, a group of Italian MPs proposed jail terms and fines for authors of so-called “pro-ana” (anorexia) and “pro-mia” (bulimia) websites. These are self-styled online communities on eating disorders which are viewed as promoting extreme dieting and unhealthy eating practices. France and the United Kingdom preceded Italy’s attempt to pass restrictive legislation as far back as 2008-9, and many internet service providers also endeavoured to ban these contents.

But the potential spread of health-hazardous behaviours is probably only one side of the coin, and these websites might also channel health-enhancing assistance, advice, and support (Yeshua-Katz & Martins 2013). In fact a closer look reveals that website users carefully manage their online socialisation to address their health challenges. Online social spaces enable discussion around the illness and constitute a complement, albeit an admittedly imperfect one, to formal healthcare services. There is no rejection of standard health norms in the name of some extreme ideal of thinness but rather a need – or perhaps, a cry – for extra support.

A social science approach brings out these results. The effect of web interactions on health does not only depend on website contents, but also on how people actually use them, share them, and access resources through them. The social, rather than just clinical dimension of eating disorders, recognized long before the advent of the web (Bell 1985, Orbach 1978), becomes ever more relevant in the current context and calls for a more comprehensive view of the “ana” and “mia” social universe.

SupportANAMIA(Credit: Roberto Clemente)

Continue reading ““Pro” ana? Sociability and support in eating disorder online communities”