New publications on big data and official statistics

National Statistical Institutes (NSIs) have long been the recognised repositories of all socio-economic information, mandated by governments to collect and analyse data on their behalf. The development of big data is shaking this world. New actors are coming in and commercially-oriented, privately-produced information challenges the monopoly of NSIs. At the same time, NSIs themselves can tap into digital technologies and produce “big” data. More generally, these new sources offer a range of opportunities, challenges and risks to the work of NSIs.

OpendataThe Statistical Journal of the IAOS, the flagship journal of the International Association for Official Statistics, has published a special section on big data – of particular interest to the extent that it is free of charge!

Fride Eeg-Henriksen and Peter Hackl introduce this special section by defining big data and emphasising its interest for official statistics. But it is crucial,  albeit admittedly not easy, to separate the hype around big data from its actual importance.

The other papers are concrete examples of how big data may be integrated into official statistics:

Continue reading “New publications on big data and official statistics”

The power of survey data: Eurostat Users’ Conference

survey3In the age of big data, social surveys haven’t lost their appeal and interest. Surveys are the instrument through which governments, for a long time, have gathered information on their population and economy to inform their choices. Interestingly, surveys conducted by, or for, governments are the best in terms of quality and coverage: because significant resources are invested in their design and realization, and especially because participation can be made compulsory by law (they are “official”), their sampling strategies are excellent and their response rates are extremely high. (Indeed, official government surveys are practically the only case in which the “random sampling” principles taught in theoretical statistics courses are actually applied). In short, these are the best “small data” available — and their qualities make them superior to many a (usually messy) big data collection. It is for this reason that surveys from official statistics have always been in high demand by social researchers.

Continue reading “The power of survey data: Eurostat Users’ Conference”

Data and social networks: empowerment and new uncertainties (in Italian)

I gave a presentation on the topic of “Data and social networks: empowerment and new uncertainties” at the Better Decisions Forum on Big Data and Open Data that took place in Rome on 12 November 2014. The event brought together six speakers from different backgrounds on a variety of topics related to data, and participants were businesspeople, public administration managers, journalists, data and computer scientists.

Here is a video of my talk:

 

 

Unfortunately as you will have noticed, the slides are not always very clearly visible, so it’s better to download them from their original source:

slide-1-638

 

My interview before my talk:

 

 

See? I am trying to stick to my 1st-January commitment of blogging more this year…

Sociology in 2014: rediscovering methods

I was back last week from the annual conference of British Sociological Association (BSA) in Leeds, and as usual, I try to put down my impressions as long as they’re still fresh in my mind. I wasn’t very quick, though, and the BSA’s members newsletter has already come out with comments and short reports about the plenaries, the prizes awarded, and the conference overall. While the conference is described as having been “very vibrant and sociable”, with “exciting conversations” during the breaks and a “diverse mixture of topics” that “reflected the breadth of interests”. My own feelings, I confess, are a bit more mixed.

Leeds
In 2012, the BSA conference was followed by a lively debate after an article, by Aditya Chakrabortty on the Guardian, where he complained about the discipline’s lack of engagement with the financial crisis. He pointed to the BSA press releases featuring research on “older bodybuilders”, and to time devoted to the “holistic massage industry” at the conference, as evidence of what he saw as a retreat from public space. The BSA took the criticism very seriously and, apart from responding to the Guardian, put in place a massive effort to encourage public engagement. The 2013 conference was entitled “Engaging Sociology” and many sessions were dedicated to showing that the profession means it. Confrontation and comparison with economics was open and clear. A major project on social class was presented with all honours. The Sociology journal released a call for papers for a special issue to “Sociology and the Global Economic Crisis”.

Leeds2This year, the “Changing Society” title aimed to stress continuity with last year’s efforts; yet it seems to me that we are back to business as usual. I had the impression that many paper presentations were on topics similar to the body builders and massage that Chakrabortty talked about. That’s why, as I said, my feelings are mixed.

Continue reading “Sociology in 2014: rediscovering methods”

Qualitative networks

Social Network Analysis (SNA) is booming, and many think it’s because of internet networks and big data. Yet social networks themselves are not new: people have always formed ties to one another, and online platforms such as Facebook, Twitter and LinkedIn only offer channels for networked interactions to occur. Counts and fancy visualisations of myriad likes and shares do not tell the whole story either: networks are primarily about exploring how ties connect us as individuals and as organisations or groups, and how our social relationships affect our lives and behaviours.

In this sense, smaller studies can still have much to teach us. These include not only quantitative, but also qualitative approaches. “Social” networks involve a world of meanings, feelings, relationships, attractions, dependencies, which have traditionally been at the heart of qualitative research and are amenable to a mixed-methods approach.

In this perspective, with the Social Network Analysis Group of the British Sociological Association (BSA-SNAG), I am organising a one-day small conference on “Mixed Methods Approaches to Social Network Analysis”, exploring how the combination of SNA and qualitative methods can enrich and deepen our understanding of network content in conjunction with network structure. The event will take place on 12 May 2014 at Middlesex University, London, and the programme is available here; to register online (deadline 30 April!) click here.

Small data and big models: Sunbelt 2014

Uh, it’s been a while… I should have written more regularly! All the more so as many things have happened this month, not least the publication of our book on the End-of-Privacy hypothesis. Well, I promise, I’ll catch up!

Meanwhile, a short update from St Pete Beach, FL, where the XXXIV Sunbelt conference is just about to end. This is the annual conference of the International Network for Social Network Analysis and in the last few years, I noticed some sort of tension between the (let’s call it like that — no offense!) old-school of people using data from classical sources such as surveys and fieldwork, and big data people, usually from computer science departments and very disconnected from the core of top social network analysts, mostly from the social sciences. This year, though, this tension was much less apparent, or at least I did not find it so overwhelming. There weren’t many sessions on big data this time, but a lot of progress with the old school — which in fact is renewing its range of methods and tools very fast. No more tiny descriptives of small datasets as was the case in the early days of social network analysis, but ever more powerful statistical tools allowing statistical inference (very difficult with network data — I’ll go back to that in some future post), hypothesis testing, very advanced forms of regression and survival analysis. In this sense, a highly interesting conference indeed.  We can now do theory-building and modeling of networks at a level never experienced before, and we don’t even need big data to do so.

The keynote speech by Jeff Johnson, interestingly, was focused on the contrast between big and small data. Johnson has strong ethnographic experience with small data, including in very exotic settings such as scientific research labs at the South Pole and fisheries in Alaska. He combined social network analysis techniques, sometimes using highly sophisticated mathematical tools, with fieldwork observation to gain insight into, among other things, the emergence of informal roles in communities. His key question here was, can we bring ethnographic knowing to big data? And how can we do so?

My own presentation (apart from a one-day workshop I offered on the first day, where I taught the basis of social network analysis) took place this afternoon. I realize, and I am pleased to report, that it was in line with the small-data-but-sophisticated-modeling mood of the conference. It is a work derived from our research project Anamia, using data from an online survey of persons with eating disorders to understand how the body image disturbances that affect them are related to the structure of their social networks. The data were small, because they were collected as part of a questionnaire; but the survey technique used was advanced, and the modeling strategy is quite complex. For those who are interested in the results, our slides are here:

Training in European data: EU-SILC

Official statistical surveys are still the best sources of data in terms of quality. Practically, they are the only ones that apply random sampling and the legal obligation to respond makes the actual sample very close to the targeted one. No other approach to data collection can hope to do as well.

The European Union Statistics on Income and Living Conditions (EU-SILC) is an instrument aiming at collecting timely and eurostat1comparable cross-sectional and longitudinal multidimensional microdata on income, poverty, social exclusion and living conditions. It started in 2003 with a small group of participant countries, and was enlarged in 2004. It is one of the richest sources of information on the daily life conditions of Europeans.

EU-SILC data are available for research use, but many barriers exist and these data are actually underutilized. On the one hand, the fact that access is legally authorised does not make it practically straightforward – the application process can be lengthy and costly. On the other hand, the very handling of data requires some specific knowledge and skills.

The Data without Boundaries European initiative, aimed at moving forward research access to official data, organises a training programme on EU‐SILC with a specific focus on the longitudinal component. Local organization lies with Réseau Quetelet, host of the training course is GENES ‐ Groupe des Écoles Nationales d’Économie et Statistique both in Paris (France).

Continue reading “Training in European data: EU-SILC”

Small Data to study the Web: The ANAMIA project

We have just published the results of our research project ANAMIA, studying the personal networks and online interactions of persons with eating disorders (“ana” and “mia” in web jargon). The report has just come out:

Documents

Report: Young internet users and eating disorder websites: beyond the notion of “pro-ana” (pdf, 92 pp, in French)

Infographic: results and recommendations of the ANAMIA project (pdf, in French)

Summary (in English!)

The ana-mia webosphere had remained opaque for long, with little data available for a science-based understanding of it. As a result, misconceptions proliferated and policy-makers hesitated — threatening censorship but without devising solutions to reach out and support a population in distress. Our study has been the first to overcome these limitations and reveal the social environment, actual eating practices and digital usages of persons with eating disorders in the English and French web.

Fig1

Visualization of the personal networks of four individuals with, respectively, EDNOS (Eating Disorders Not Otherwise Specified, top panel, left), anorexia nervosa (top, right), bulimia nervosa (bottom, left), binge eating (bottom right). Hollow circles represent their face-to-face acquaintances, filled circles their online ones. Colours indicate relational proximity to the subject (green: intimate, blue: very close, yellow: close, red: somewhat close). Source: ANAMIA project report.

Continue reading “Small Data to study the Web: The ANAMIA project”

Three tools to visualize personal network data – continued

Yesterday, Antonio Casilli and I gave our promised talk on network data visualization. It was an opportunity to discuss the extension of the tools we developed within a given research project to other network studies, and to reflect on the contribution as well as the limitations of data visualizations. Here are our slides:

Three tools to visualize personal networks

Data visualization techniques are enjoying ever greater popularity, notably thank to the recent boom of Big Data and our increased capacity to handle large datasets. Network data visualization techniques are no exception. in fact, appealing diagrams of social connections (sociograms) have been at the heart of the field of social network analysis since the 1930s, and have contributed a lot to its success. Today, all this is evolving at unprecedented pace.

In line with these tendencies, the research team of the project ANAMIA (a study of the networks and online sociability of persons with eating disorders, funded by the French ANR) of which I was one of the investigators, have developed new software tools for the visualization of personal network data, with different solutions for the three stages of data collection, analysis, and dissemination of results.

Specifically:

– ANAMIA EGOCENTER is a graphical version of a name generator, to be embedded in a computer-based survey to collect personal network data. It has turned out to be a user-friendly, highly effective interface for interacting and engaging with survey respondents;

Continue reading “Three tools to visualize personal networks”