Special RFS issue on Big Data

Revue Française de Sociologie invites article proposals for a special issue on “Big Data, Societies and Social Sciences”, edited by Gilles Bastin (PACTE, Sciences Po Grenoble) and myself.

Focus is on two inextricably interwoven questions: how do big data transform society? How do big data affect social science practices?

Substantive as well as epistemological / methodological contributions are welcome. We are particularly interested in proposals that examine the social effects and/or the scientific implications of big data based on first-hand experience in the field.

The deadline for submission of extended abstracts is 28 February 2017; for full contributions, it is 15 September 2017. Revue Française de Sociologie accepts articles in French or English.

Further details and guidelines for submission are in the call for papers.

New: Paris Seminar on the Analysis of Social Processes and Structures (SPS)

Together with colleagues Gianluca Manzo, Etienne Ollion, Ivan Ermakoff, and Ivaylo Petev, I organize a new inter-institutional seminar series in sociology.

This new Social Processes and Structures (SPS) Seminar aims to take stock of the debates within the international scientific community that have repercussions for the practice of contemporary sociology, and that renew the ways in which we construct research designs, i.e., the ways in which we connect theoretical claims, data collection and methods to assess the link between data and theory. Several observations motivate this endeavor. Increasing interactions between social sciences and disciplines such as computer science, physics and biology outline new conceptual and methodological perspectives on social realities. The availability of massive data sets raises the question of the tools required to describe, visualize and model these data sets. Simulation techniques, experimental methods and counterfactual analyses modify our conceptions of causality. Crossing sociology’s disciplinary frontiers, network analysis expands its range of scales. In addition, the development of mixed methods redraws the distinction between qualitative and quantitative approaches. In light of these challenges, the SPS seminar discusses studies that, no matter their subject and disciplinary background, provide the opportunity to deepen our understanding of the relations between theory, data and methods in social sciences.

The inaugural session took place on 20 November 2016; the “regular” series starts this Friday, 27 January, and will continue until June, with one meeting per month.

All sessions take place at Maison de la Recherche, 28 rue Serpente, 75006 Paris, room D040, 5pm-7pm. All interested students and scholars are welcome, and there is no need to register in advance.

Big data, big money: how companies thrive on informational resources

Information oils the economy – as we know since the path-breaking research of George Akerlof, Michael Spence and Joseph Stiglitz in the 1970s – and information can be extracted from data. Today, increased availability of “big” data creates the opportunity to access ever more information – for the good of the economy, then.

But in practice, how do companies extract value from this increasingly available information? In a nutshell, there are three ways in which they can do so: matching, targeted advertising, and market segmentation.

Matching is the key business idea of many recently-created companies and start-ups, and consists in helping potential parties to a transaction to find each other: driver and passenger (Uber), host and guest (Airbnb), buyer and seller (eBay), and so on. It is by processing users’ data with suitable algorithms that matching can be done, and the more detailed are the data, the more satisfactory the matching. Firms’ business model is usually based on taking a fee for each successful transaction (each realized match).

Targeted advertising is the practice of selecting, for each user, only the ads that correspond at best to their tastes or practices. Publicizing diapers to the general population will be largely ineffective as many people do not have young children; but targeting only those with young children is likely to produce better results. Here, the function of data is to help decide what to advertise to whom; useful data are people’s socio-demographic situation (age, marriage, children…), their current or past practices (if you bought diapers last week, you might do that again next week), and any declared tastes (for example as a post on Facebook or Twitter). How this produces a gain is obvious: if targeted adverts are more effective, sales will go up.

Data, health online communities and the collaborative economy: my tour of Québec

This November gave me the opportunity to give talks and participate in scientific events throughout Québec.

comsanteI started in Montréal, with a seminar at ComSanté, the health communication research centre of Université du Québec à Montréal (UQAM), where I presented my recently published book on websites on eating disorders. While most media attention focused on controversial “pro-anorexia” contents, presented as an undesirable effect of online free speech, I made the point that this part of the webosphere is rather to be seen as a symptom of the effects of current transformations of healthcare systems under austerity policies. Cuts in public health spending encourage patients to be active, informed and equipped, but the resulting social pressure creates paradoxical behaviors and risk-taking.

Also in Montréal, I was invited to a discussion with economic journalist Diane Bérard on the growth and crisis of theecocoll collaborative economy. About 50 people attended the event, co-organised by co-working space L’Esplanade, OuiShare Montréal and the journal Les Affaires. Diane summarized the essentials of the event in a blog post just the day after, and noted six main points:

  • The Uber case dominates discussions and divides the audience – though the collaborative economy is not (just) Uber.
  • The discussion gets easily polarized – a result of the tension between commercial and non-commercial goals of the collaborative economy.
  • We still know little of the business models of these platforms and the external factors that facilitate or hinder their success.
  • Sharing is in fact a niche market – now probably declining after the first enthusiasms.
  • The key issue for the future is work – its transformations, and how it is re-organizing itself.
  • Collaborative principles advance even outside the world of digital platforms, and sometimes permeate more traditional sectors. The near future of collaboration are sharing cities.

Twitter networks at the OuiShare Fest Barcelona 2016

Twitter conversations are one way through which participants in an event engage with the programme, comment and discuss about the talks they attend, prolong questions-and-answers sessions. Twitter feeds have become part of the official communication strategy of major events and serve documentation and information purposes, both for attendees and for outsiders. While tweeting is becoming more an more a prerogative of “official” accounts in charge of event communication, it is also a potential tool in the hands of each participant, allowing anyone to join the conversation at least in principe. Earlier, I have discussed how the Twitter discussion networks formed at the OuiShare Fest 2016, a major gathering of the collaborative economy community that took place last May in Paris, were one opportunity to see such mechanisms in place.

Here is a similar analysis, performed after the OuiShareFest Barcelona – the Spanish-language version of the event that I had the chance of attending last week. The size of this event is smaller than its Paris counterpart but nonetheless impressive: I mined 3497 tweets with the official hashtag of the event, #OSfestBCN, mostly written during the two days of the event (my count stopped the day after). Do Twitter #OSfestBCN conversations describe the community?

First, when did people tweet? As often happens, there are more tweets on the first than the second day of the event, and there are more tweets during the first hours of each day, though the difference between morning and afternoon is not dramatic; tweeting declines only at night, when the fest’s activities are suspended. Online activity is not independent of what happens on the ground – quite on the contrary, it follows the timings of physical activity.


Who tweeted most? Obviously the official @OuiShare_es account, who published 630 tweets – nine times as many as the second in the ranking. Those who follow immediately are all individuals, who have between 50-70 tweets each.

Who tweeted with whom? What interests me most are conversations – who interacts with whom. The most explicit way of seeing this with Twitter data is to look at replies: who replied to whom. This corresponds to a small social network of 134 tweeters (the coloured points in the next Figure). Ties among them are represented as lines in the figure, and the size of points depends on the number of their incoming ties, that is, the number of replies received. Beyond the official @OuiShare_es account, several tweeters receive a lot of replies:  they are mostly speakers, track leaders, or otherwise important actors in the community.


Now, who tweeted about whom? This is also an important aspect of Twitter conversations. We can capture it with the social network of mentions, associating each tweeter with those they mentioned, and counting the number of times they did so. This will be a larger network (with 2553 mentions) compared to the net of replies, as mentions can be of many types and also include retweets.

The below figure represents the network of mentions. As before, the colored points are tweeters (the larger they, the more often they have been mentioned by others), while lines between them are mentions (the thicker they are, the higher the number of times a user has mentioned another). Colors represent a measure called “modularity”, which identifies clusters of nodes whereby internal connections are stronger than the connections they have with nodes in other clusters; so for example, a purple node is more likely to have mentioned other purple nodes, than blue nodes.

Modularity is computed based only on counts of ties, without considering the nature of their conversations (what the mention is about) ou other qualities of nodes (gender, nationality, language of tweeters, etc.). And yet, it clearly identifies specific sub-communities. The very numerous, central purple nodes are the OuiShare community: connectors, activists, and others close to the organization especially within Spain. The green nodes at the bottom-left are the catalan community, including representatives of local authorities,notably the Barcelona municipality. The blue nodes at the bottom are different actors and groups from other parts of Spain. The few black nodes on the left are the international OuiShare community, and the sparse orange ones at the top are other international actors.


This analysis is part of a larger research project, “Sharing Networks“, led by Antonio A. Casilli and myself, and dedicated to the study of the emergence of communities of values and interest at the OuiShare Fest 2016. Twitter networks will be combined with other data on networking – including informal networking which we are capturing through a (perhaps old-fashioned, but still useful!) survey.

The analyses and visualizations above were done with the package TwitteR in R as well as Gephi.

Online health communities: data for doctors, patients and families

Online health communities have been demonstrated to be an important part of the self-empowering experience of today’s patients. While most attention so far has been devoted to self-styled health communities, where patients autonomously share expertise and experience, today policymakers and healthcare providers are harnessing the power of this very idea and are offering similar solutions themselves.

Earlier this week at the OuiShare Fest Barcelona – a major get-together of the Spanish-speaking collaborative economy community in Europe – a few of these initiatives were presented.

keyboard-and-stethoscopeSocial Diabetes is a small company founded by and for patients, that offers a mobile app for online, real-time health monitoring services. Diabete sufferers can use it to optimally adjust their insulin dosage based on their carb count and blood sugar levels; in some cases, they can also track their exercise and patterns of behavior to receive alerts whenever relevant. Patients can share this information with their doctors, also through the app; and can discuss with other patients. This is an example of a user-based innovation where autonomous patients take the initiative, aiming to take control of their health and life. Still, physicians have been allowed in: the platform has a medical advisory board, and individual doctors can register as users to follow their patients.

Are we all data laborers?

autonomyI gave today a talk at AUTONOMY, a major festival of urban mobility in Paris, where new technologies are at center stage, from driverless cars to electric scooters, bike-sharing solutions, and connected infrastructure for the smart city. I had been asked to talk about labor in digital platforms, such as those offering mobility services.

Digital platforms are often thought of in terms of automation, but it islogos clear that there is labor too: we all have in mind the example of the couriers and drivers of the “on-demand” economy. But there’s more: I’ll show how platforms involve the labor of everyone, including passengers and users of all types. By labor, I mean here human activity that produces data and information – the key source of value for platforms. It is often an implicit, invisible activity of which we may not even be aware – as we tend to focus more on consumption aspects as we talk routinely about “car pooling” or “car sharing”, rather than looking at the underlying productive effort. This is what scholars call “digital labor”.

Four eco-systems

Specialist Antonio Casilli distinguishes four forms of digital labor in platforms, and I am now going to briefly outline them.

