Big data and the hypothesis of the end of privacy

In the late 2000s, voices suggesting that our societies might be nearing the ‘end of privacy’ became increasingly deafening. Our cultural, political and regulatory environment was on the verge of major transformation – so went the narrative. Businesses rejoiced as notoriously, less privacy and more information oils the economy.

In a video interview with Italian media Idee Sottosopra, I review the courses of action taken by various stakeholders, in particular Internet companies, and examine their conflicts and controversies. I show how the very concept of privacy, inherited from a long legal and judicial tradition, should be revised and redefined to appropriately describe today’s online interactions.

Overall, there is no deterministic and inevitable tendency to exclude privacy from our societies, but rather a tension between social forces for and against privacy, which has accompanied the advent of the digital economy and especially social media. The positions of stakeholders, especially users, are often ambiguous, and social media companies attempted to leverage this ambiguity to their own advantage.

Yet civil society reactions have been stronger and stronger, and after initial David-vs-Goliath attempts of individuals and small associations, more and more authoritative institutions have taken seriously the defence of privacy. We are no longer left to costly and little-visible individual choices, and especially after entry into force of GDPR in Europe, we have now an unprecedented opportunity to act at a more systemic level.

Big Data. L’ipotesi della fine della della Privacy | Società Digitale | Idee Sottosopra

New ANR Project HUSH: Human supply chain behind smart technologies

Together with sociologist Antonio A. Casilli and economist Ulrich Laitenberger, I have recently received ANR (French National Research Agency) funding for a new study of human inputs – mostly platform-mediated work in the production of artificial intelligence solutions. In our project called HUSH (Human supply chain behind smart technologies) we aim to shed light on the whole ecosystem linking platforms, workers and their clients demanding data-related and algorithmic services.

For this project, we are now looking for a

PhD researcher in digital economics

The position provides the opportunity to focus strongly on research, in a very active environment. The team has collaborations with different online platforms and has collected data sets from the web, which can be used by the applicant for their thesis. The focus of the current position is to work on the economic aspects of platform-mediated work, using quantitative analyses. Two other PhD students (in sociology) have already been recruited for this project and work on related topics.

The starting date is January 2020 (a later starting date is also possible). As per national regulations, the annual stipend will be about 1,600 euros per month, with possibility to obtain a complement for extra activities such as teaching. Social security and professional training are provided. Additional funding is available to present your research at international conferences and workshops. The position will be based at the new campus of Telecom Paris in Palaiseau, in the direct neighborhood of École Polytechnique and ENSAE.

Your profile

Applicants should have successfully completed a Master’s degree in economics, socio/economic data science or related disciplines, or expect completion at the beginning of the year 2020. They should have a strong interest in digital platforms, from the perspective of industrial organization or labor economics, and have an empirical focus (econometrics, data science). They should aim at developing programming skills and have an interest in the evaluation of internet data. Fluency in English is required; knowledge of French is advantageous, but not essential.

Telecom Paris and IP Paris

Telecom Paris is part of the newly founded Institute Polytechnique (IP) Paris, together with Ecole Polytechnique, ENSTA, ENSAE and Telecom Sud. The department of social sciences and economics (SES) at Telecom Paris studies the impact of the digitization on economic activity and society. For more information, please see

How to apply

Please submit a cover letter, a curriculum vitae, a transcript of records (listing all subjects taken and their grades), and contact details of one to two referees by November 15, 2019 to Ulrich Laitenberger ( ).

Update: applications open until December 15, 2019.

Le moment big data des sciences sociales: quel accès aux données du web et des médias sociaux ?

Table ronde, Sciences Po Paris, 6 décembre 2018, 18h00


Pour que la recherche en sciences sociales puisse pleinement tirer profit des grandes bases de données numériques, un verrou reste à lever : l’accès à ces données est limité, inégalement distribué, et entouré d’un flou juridique et déontologique. Nous proposons d’en discuter à l’occasion de la parution du numéro spécial de la Revue Française de Sociologie sur “Big data, sociétés et sciences sociales” (n. 59/3). Cette table ronde réunit les chercheur.e.s avec d’autres parties prenantes publiques et

Avec :

  • Garance Lefèvre, Policy senior associate, Uber
  • Roxane Silberman, Conseillère scientifique, Centre d’Accès Sécurisé aux Données (CASD)
  • Sophie Vulliet-Tavernier, Directrice des relations avec les publics et la recherche, Commission Nationale de l’Informatique et des Libertés (CNIL)
  • Les auteurs du numéro spécial.

Modérateurs : Gilles Bastin (Univ. Grenoble Alpes) et Paola Tubaro (CNRS), coordinateurs du numéro spécial.

Entrée libre et gratuite, dans la limite des places disponibles: pour s’inscrire, cliquez ici.

Accès : Sciences Po, salle Goguel. Entrée par le 27 rue Saint-Guillaume, 75007 Paris (traverser le jardin et prendre l’ascenseur jusqu’au dernier étage). La table ronde est organisée par la Revue Française de Sociologie en collaboration avec les Presses de Sciences Po. Elle sera suivie d’un pot.


More than complex: large and rich network structures

I co-organize this Satellite to the NETSCI2018 Conference in Paris, 12 June 2018. We are now accepting submissions of proposals for presentations.

Information on the Satellite

In traditional research paradigms, sociology handles small but rich networks where the richness of network attributes is derived from the specific buildup of the data collection process. In the sociological approach, differences among nodes and edges are key to describe network properties and the ensuing dynamical social processes. Instead, the complex systems tradition deals with large but poor networks. Assuming statistical equivalence of graph entities, a mean field treatment serves to describe the aggregate properties of the network. Today’s network datasets contain an unprecedented quantity of relational information at all, and between all, the possible levels: individuals, social groups, political structures, economical actors, etc. We finally deal with large and rich network structures that expose the implicit limitations of the two abovementioned approaches: the traditional methods from social science cannot be upscaled because of their algorithmic complexity and those from complex systems lose track of the complex nature of the actors, their relationships and their processes. This workshop has the aim of developing an interdisciplinary reflection on how methods from social science could be upscaled to large network structures and on how methods from complex systems could be downscaled to deal with small heterogeneous structures.

We are proud that five prominent international scholars are our invited speakers: Camille Roth, SciencesPo Paris; Matthieu Latapy, LIP6UPMC Paris; Alessandro Lomi, ETH Zurich; Fariba Karimi, GESIS Cologne; Noshir Contractor, Northwestern University.


We invite abstracts of published or unpublished work for contributed talks to take place at the satellite symposium. We expect a broad range of topics to be covered, across theory, methodology, and application to empirical data, relating to an interdisciplinary reflection on how methods from social science could be upscaled to large network structures and on how methods from complex systems could be downscaled to deal with small heterogeneous structures.

Submission can be made through our website.

Submissions are required to be at most 650 words long and should include the following information: title of the talk, author(s), affiliation(s), email address(es), name of the presenter, abstract. Papers or submissions longer than 1 page will not be accepted.

Important dates

Abstract submission deadline is March 25, 2018. Notification of acceptance will be no later than April 23, 2018.

All participants and accepted speakers will have to register through the NETSCI2018 website.

Open Data: What’s new in 2017?

I am now in Montréal, where I participated, last Friday, in a panel on Open Data at “Science & You” international conference. It was interesting for me to reflect on how the picture has changed since my previous panel on the same topic – in Kiev in 2012. Back then, we were busy trying to convince public administrations that data opening was good for transparency and could help improve services to communities. Since then, a lot of attempts have been made in numerous countries – local authorities often pioneering the process, followed only later by central governments (one example cited in my panel was Québec City). What is made open is typically information from public registers (first names of newborns, records of road accidents) and increasingly, from technological devices and sensors (bus traffic information).

There are some conditions to be met for a dataset to be said “open”:

  • Technically, it needs to be “raw”, detailed, digital and reusable. The French Interior Ministry released results of the first round of the recent presidential elections within a few days, at polling station level. This is sufficiently detailed (with over 69,000 polling stations throughout the country), raw (allowing aggregations, comparisons etc.), and digital/reusable (so much so that the newspaper Le Monde could develop a user-friendly application to let readers easily check results in their neighborhoods). Some would also insist that “open” data should be released in non-proprietary formats (better .csv than .xls, for example).
  • Legally, the data must come with a license that allows re-use by third parties (typically within the Creative Commons family). Ideally, no type of reuse should be ruled out (including somewhat controversially, commercial / for-profit reuse).
  • Economically, the data should be available to all for free (or at least with minimal charges if data preparation requires extra work or expenses).

If in the past few years, a lot of thought has been devoted to the “ideal” conditions for data opening and how this would positively affect public service, the data landscape has now significantly changed.

Continue reading “Open Data: What’s new in 2017?”

Science XXL: digital data and social science

I attended last week (unfortunately only part of) an interesting workshop on the effects of today’s abundance and diversity of digital data on social science practices, aptly called “Science XXL“. A variety of topics were discussed and different research experiences were shared, but I’ll just summarize here a few lessons learned that I find interesting.

  • Digital data are archive data. Data retrieved automatically from the digital traces of individual actions, such as those mined from the APIs of platforms such as Twitter, are unlike survey data in that they were not originally recorded for research purposes. The researcher must select relevant records on the basis of some understanding of the conditions under which these data were produced. Perhaps ironically, digital data share these characteristic with data from historical or literary archives.
  • Digital data are not necessarily “big”, in the sense that their volume is often small (at least in social science research so far!), even though they may share other characteristics of big data such as velocity (being generated on the fly as people use digital platforms) or variety (being little or not structured).
  • Digital data can help fill gaps in survey data, for example when survey sampling is not statistically representative: detail and volume can provide extra information that supports general conclusions.
  • Non-clean data, outliers and aberrant observations may be very informative, revealing details that would escape attention if researchers focused only on the average or center of the distribution (the normal law cherished in classical statistical approaches). Special cases are no longer a prerogative of qualitative research.
  • Data analysis is a key ingredient of “computational social science” a field that is growing in importance after an initial phase in which it was largely confined to agent-based simulation and complexity theory.

Big data, big money: how companies thrive on informational resources

Information oils the economy – as we know since the path-breaking research of George Akerlof, Michael Spence and Joseph Stiglitz in the 1970s – and information can be extracted from data. Today, increased availability of “big” data creates the opportunity to access ever more information – for the good of the economy, then.

But in practice, how do companies extract value from this increasingly available information? In a nutshell, there are three ways in which they can do so: matching, targeted advertising, and market segmentation.

Matching is the key business idea of many recently-created companies and start-ups, and consists in helping potential parties to a transaction to find each other: driver and passenger (Uber), host and guest (Airbnb), buyer and seller (eBay), and so on. It is by processing users’ data with suitable algorithms that matching can be done, and the more detailed are the data, the more satisfactory the matching. Firms’ business model is usually based on taking a fee for each successful transaction (each realized match).

Targeted advertising is the practice of selecting, for each user, only the ads that correspond at best to their tastes or practices. Publicizing diapers to the general population will be largely ineffective as many people do not have young children; but targeting only those with young children is likely to produce better results. Here, the function of data is to help decide what to advertise to whom; useful data are people’s socio-demographic situation (age, marriage, children…), their current or past practices (if you bought diapers last week, you might do that again next week), and any declared tastes (for example as a post on Facebook or Twitter). How this produces a gain is obvious: if targeted adverts are more effective, sales will go up.

Continue reading “Big data, big money: how companies thrive on informational resources”

Special RFS issue on Big Data

Revue Française de Sociologie invites article proposals for a special issue on “Big Data, Societies and Social Sciences”, edited by Gilles Bastin (PACTE, Sciences Po Grenoble) and myself.

Focus is on two inextricably interwoven questions: how do big data transform society? How do big data affect social science practices?

Substantive as well as epistemological / methodological contributions are welcome. We are particularly interested in proposals that examine the social effects and/or the scientific implications of big data based on first-hand experience in the field.

The deadline for submission of extended abstracts is 28 February 2017; for full contributions, it is 15 September 2017. Revue Française de Sociologie accepts articles in French or English.

Further details and guidelines for submission are in the call for papers.