Small Data to study the Web: The ANAMIA project
We have just published the results of our research project ANAMIA, studying the personal networks and online interactions of persons with eating disorders (“ana” and “mia” in web jargon). The report has just come out:
The ana-mia webosphere had remained opaque for long, with little data available for a science-based understanding of it. As a result, misconceptions proliferated and policy-makers hesitated — threatening censorship but without devising solutions to reach out and support a population in distress. Our study has been the first to overcome these limitations and reveal the social environment, actual eating practices and digital usages of persons with eating disorders in the English and French web.
Our challenge, when we started the project, was how to go about it. We chose to stay away from big data – which, for example, we could have obtained by mining subjects’ networks of interactions from Facebook, their blogs or other web services. One reason for this is ethical: while data mining can offer a wealth of information for research, it bypasses informed consent. Put differently, information provided by web service users would be used by researchers, without users being aware of that. Although researchers would take precautionary measures, in particular by anonymising the data, we felt we were not (or not yet) in a position to choose such an approach. Being the first to reach out this community that has often feared unsolicited intrusion, suffered from social stigma and even from censorship, we felt we first needed some form of human contact with them; gaining their trust was our priority.
The other reason is scientific/substantive. We did not just want to learn about the online networks of ana-mia subjects, but also about their relationships more generally – at school, at work, in the family. Otherwise, we would not have been able to tell what is the effect of Internet participation on their overall sociability, and how Internet compares to other contexts in terms of access to health-related information and support. We didn’t have a way to automatically map all these other contexts, individual by individual; we would rather have to ask them to tell us who is around them.The home page of the questionnaire.
So, we opted for a very classical solution – questionnaires and interviews, the traditional toolbox of the social scientist. But we had to do with a web phenomenon and we needed to adapt our approach to that! Indeed we revisited the classical questionnaire to make it better suited for the task: it was distributed as an online survey including an original, specifically designed software application (ANAMIA_EGOCENTER) for respondents to draw their personal networks online.Interface of the ANAMIA-EGOCENTER web application allowing respondents to draw their “egocentric network” of acquaintances. The white dot at the centre of the target is the respondent (ego), the blue dots are their acquaintances (alters). The buttons to the right of the screen allow the respondent to draw links between alters and to group them. A tutorial on how to use the tool is available here; the software code, available under GNU General Public Licence (GPLv2), is here.
We collected about 300 questionnaires in this way, from English- and French-speaking respondents. Definitely these are small data sizes, but the information provided is rich and fits well our needs. We also invited a sub-set of these respondents (about 10%) to an in-depth interview to better understand who are the persons in their networks – when they first met them, what they do together, whether they talk to them about their eating disorder. Again, interviews are very classical but we adapted them by conducting them via VOIP (Skype and similar tools). This is not just because it was handier or cheaper to do so, but also because it was a less intrusive way of getting in touch with a sensitive population who might have been uncomfortable if we had visited them at home. And by definition they are highly computer-literate, so this was not a problem either.
We developed other software tools within this project: an agent-based model (ANAMIA_F, programmed in Netlogo) to simulate the social process through which individual views and social influence shape the evolution of shared health orientations in an online forum; and a set of data visualisation tools (ANAMIA_PERSONAL, ANAMIA_CORPUS) to synthesize the results of our personal networks data collection and make them comparable to one another.
The take-away message is that traditional small-data methods can still be useful to understand social realities today, including web phenomena; and they can be upgraded and improved with information technologies, so that they are better fit for their purposes. Big data bring a new and untapped promise with them, but we still have much to learn from small data.