Posts Tagged ‘ Big data ’

Are we all data laborers?

autonomyI gave today a talk at AUTONOMY, a major festival of urban mobility in Paris, where new technologies are at center stage, from driverless cars to electric scooters, bike-sharing solutions, and connected infrastructure for the smart city. I had been asked to talk about labor in digital platforms, such as those offering mobility services.

Digital platforms are often thought of in terms of automation, but it islogos clear that there is labor too: we all have in mind the example of the couriers and drivers of the “on-demand” economy. But there’s more: I’ll show how platforms involve the labor of everyone, including passengers and users of all types. By labor, I mean here human activity that produces data and information – the key source of value for platforms. It is often an implicit, invisible activity of which we may not even be aware – as we tend to focus more on consumption aspects as we talk routinely about “car pooling” or “car sharing”, rather than looking at the underlying productive effort. This is what scholars call “digital labor”.

Four eco-systems

Specialist Antonio Casilli distinguishes four forms of digital labor in platforms, and I am now going to briefly outline them.

Continue reading

Data and theory: substitutes or complements? Lessons from history of economics

EEToday, my chapter on “Formalization and mathematical modelling” is published in a new series of three reference books on History of Economic Analysis (edited by G. Faccarello and H. Kurz, Edward Elgar). The chapter draws heavily on key ideas I developed as part of my thesis on the origins of mathematical economics. But this was a long time ago and reading it again today, I see it in a different light. I notice in particular that economics developed its distinctive mathematical flavour, which makes it neatly stand out relative to the other social sciences, at times in which social research was data-poor – and it did so not despite data paucity, but precisely because of it. William S. Jevons, a 19th-century forefather of the discipline who was clearly aware of the relevance of maths, wrote in 1871:

“The data are almost wholly deficient for the complete solution of any one problem”


“we have mathematical theory without the data requisite for precise calculation”

Continue reading

Ethical issues in research with online data

Some time ago, I wrote a post on ethical issues in research with secondary data – a somewhat grey area, where students and scholars often feel guidance is insufficient. Even more complex is research with internet data – neither primary nor secondary strictly speaking, but “big” data. A recent case fuelled an international debate on how researchers should deal with data that are, apparently, accessible to all on the web: a Danish graduate student published a large dataset of users of the online dating site OkCupid (he apparently did so without any institutional backing, and Aarhus University where he studies, is now on the case). Michael Zimmer, a specialist of information studies and the policy and ethics of online research, properly summarizes the issues in a recent Wired article:

  • Don’t say that “the data are already public”. The fact that OkCupid users knowingly share some personal information, does not mean they consent to it being used for purposes other than interactions with other users on that site. By scrapping data, one may be able to put together the whole history of  users’ presence on that platform, revealing more of their life or personality than they themselves are aware of. More dangerously, data extracted in this way might in some cases be matched with other information, thereby potentially becoming much more disclosive than what the persons concerned ever intended or agreed. And the disclosure may be aggravated by releasing the data outside the platform.

Continue reading

First steps toward “Data Inclusion”

The concept of “data inclusion” is new and still slowly seeking its way in our linguistical habits, but it is gaining ground in the minds of those who care for disadvantaged, low-income, or otherwise underserved segments of society. A recent report of the US Federal Trade Commission (FTC) does precisely this. Looking at the commercial use of big data analytics, it considers cases in which big data analytics lead companies to make choices that are detrimental to the most vulnerable segments of society, for example by excluding them from credit or from employment opportunities. Instead, it asks how big data may be used in inclusive ways.

A first set of recommendations they make is for companies to be well aware of the regulations: on financial and credit reporting, equal opportunities, consumer protection. The second set of recommendations, though specifically aimed at research done in (or for) companies, is of relevance for public research as well, and consists in asking key questions about the quality of data and models, and about the reliability and validity of results:

  • How representative is your data set? In popular discourse, big data carry a promise of exhaustivity, which however is rarely fulfilled in practice (see this great FT article by Tim Hartford). In fact, big data sets are not necessarily statistically representative of the population they refer to, and  information may be disproportionately missing about specific, possibly disadvantaged, populations.
  • Does your data model account for biases? Selection effects, which occur whenever some members of the population are less likely to be included in the sample than others, must be controlled for in order for results to be generalizable.
  • How accurate are your predictions based on big data? The issue is that most research with big data is predictive without being able to uncover the social or economic mechanisms underlying observed correlations, so that interpretation of results is potentially misleading. The report does not say, though, that recent developments in machine learning that support causality reasoning may alleviate this problem in the not-so-far future.
  • Does your reliance on big data raise ethical or fairness concerns? In all honesty, this is not specifically a question for research on big data, but for research in general. If a company’s analysis of employees’ behavior lead to solutions that involve forms of, say, racial or gender-based behavior, then that analysis shouldn’t be used – whether it’s done with “big” or “small” data.

It is important that major regulators like the FTC are taking notice. Big data open the way to major improvements in our life conditions, but not because data-driven analysis will take the lead over current best practices in research. Regulations, awareness of statistical issues and potential pitfalls, and ethics are ever more necessary for big data to fulfill their potential.

New year, new job, new life…

keep-calm-you-start-a-new-job-mondayYes I must admit it: I didn’t keep my new-year-2015 promise of posting more often on my blog… and the annual report I received yesterday from WordPress, showing a couple of peaks of activity and frigthening silence the rest of the year, isn’t something I would be proud to share… but I have a justification! Seriously, it’s not just an excuse – it’s that I’ve been busy trying to change life… and yes, I managed. On Monday 4 January, I’ll start an exciting new position as senior research scientist at the National Center of Scientific Research (CNRS, or in French, Centre national de la recherche scientifique) in Paris. CNRS can be loosely compared to what is, in other countries, a National Research Council, but there’s more to it than international comparisons might vaguely suggest: this is probably the single most desired job in French academia, with a mission “to contribute to the development of knowledge… in all fields that contribute to the advancement of society“. In plain words, that’s basically pure research with almost no teaching apart from some PhD supervision… a dream that would hardly be possible in the UK, where I was before.

I’ll be at the Lab for Computer Science (LRI, Laboratoire de Recherche en Informatique, UMR8623) on the Saclay Computer+sciencecampus, and I’ll work with the A&O (Learning and Optimization) research team. The interesting thing is that mine is an interdisciplinary position, designed to facilitate dialogue and collaboration between the social sciences and computer science around big data and their use for the advancement of knowledge, policy, and more generally society. I have been especially selected by the sociology section of CNRS to work in a computer science research centre. There, I am asked to develop my personal, long-term research project on the “sharing economy” of digital platforms and how they create value from the social ties in which economic action is embedded: this will require blending my research on data, social networks and the digital economy with machine learning and optimization approaches (more on this later … yes on this blog! promise!).

eusn2016What else will I do this year at LRI? I am in the organising committee of the Second European Social Networks Conference which will take place in Paris next June, I am finishing a book on so-called “pro-anorexia” websites as the conclusion of my past project ANAMIA, and I am in the Editorial Board of Revue Française de Sociologie.

I won’t entirely forget England though… I’ll keep my doctoral students at Greenwich and continue my engagement at UCL’s Institute of Education as external examiner. Come on, you can’t just disappear after six years! Indeed, I’ll always remember those six years as most productive and fulfilling ones. And however happy I am now to join CNRS, I’ll never forget the expressions of love, sympathy and friendliness I received from colleagues and students when I left Greenwich in December. The cards, the presents, the parties… all beyond any expectations I might have had before! Thank you Greenwich. And well, yes, a big thank you to all those who made it possible – both those in London who made me have a great time far from home for so long, and those in Paris who helped me come back, not without effort, and have welcomed me now.

A great new year is about to start, and I promise I’ll document it more… 😉

International Program in Survey and Data Science

A new, master’s level programme of study in Survey and Data Science is to be offered jointly by the University of Mannheim, the University of Maryland, the University of Michigan, and Westat. Applications for the first delivery are accepted until 3 January, for a start in Spring 2016. Prospective students are professionals with a first degree, at least one year of work experience, and some background in statistics or applied mathematics. All courses are delivered in English, fully online, to small classes (it’s not a MOOC!). Tuition is free, thank to support from German public funds at least for the first few cohorts.

What is most interesting about this master is its twofold core, involving both more classical survey methodology and today’s trendy data science. Fundamental changes in the nature of data, their availability, the way in which they are collected, integrated, and disseminated, have found many professionals unprepared. These changes are partly due to “big” data from the internet and digital devices becoming increasingly predominant relative to “small” data from surveys. Big data offer the benefit of fast, low-cost access to an unprecedented wealth of informational resources, but also bring challenges as these are “found” rather than “designed” data: less structured, less representative, less well documented (if at all…). In part, these changes are also due to the world of surveys changing internally, with new technical challenges (regarding for example data preservation, in a world of pre-programmed digital obsolescence), legislative issues (such as those triggered by greater awareness of privacy protection), increased demand by multiple users, and a growing need to merge surveys and data from other (such as business and administrative) sources. It is therefore necessary, as the promoters of this new study programme rightly recognize, to prepare students for the challenges of working both with designed data from surveys and with big data.

It will be interesting to see how data science, statistics, and social science / survey methodology feed into each other and support each other (or fail to do so…). There is still work to be done to develop techniques for analyzing data that allow us to gain insights more thoroughly, not just more quickly, and help us develop solid theories, rather than just uncovering new relationships that might eventually turn out to be spurious.

Read more

Databeers now in London

In the midst of the chaos and sadness of the past week, a more leisurely note: the first of a new “Databeers” series of events in London yesterday evening, following a format that has been experiencing a huge success in Spain, Italy and other countries. The event is very informal, and getting to know other data enthusiasts is the main goal. There are a few flash talks with free beers and networking time.


The next Data Beers London event is on 25 February 2016.


Big data and history


A paper archive – more and more often replaced by digitised versions today.

Yesterday at Biblithèque Nationale de France, I took part in a panel discussion  on longue durée in history, organised by the Revue Annales – Histoire et Sciences Sociales. Of course I am not a historian, and I wouldn’t be able to tell whether one interpretation of longue durée is better than another. But historians are now raising questions that are common to the social sciences and humanities more generally: how to benefit from big data and how to re-think the political engagement of the researcher. So I was there to talk about big data and how they change not just research practices and methods, but also researchers’ position relative to power, politics, and industry. This questions cross disciplinary boundaries, and all may benefit from dialogue.


Collection of older sources is now often online and enables application of new methods.

What ignited the historians’ debate was an attempt by two leading scholars, David Armitage and Jo Guldi, to restore history’s place as a critical social science, based on (among other things) increased availability of large amounts of historical data and the digital tools necessary to analyze them. Before their article in Annales, they published a full book in open access, the History Manifesto, where they develop their argument in more detail. Their writing is deliberately provocative, and indeed triggered strong (and sometimes very negative) reactions. Yet the sheer fact that so many people took the trouble to reply, proves that they stroke a chord.

What do they say about big data? They highlight the opportunity of accessing large and rich archives and to expand research beyond any previous limitations. Their enthusiasm may seem excessive but it is entirely understandable insofar as their goal is to shake up their colleagues. My approach was to take their suggestion seriously and ask: what opportunities and challenges do data bring about? How would they affect research, especially for historians?

Continue reading

“Data for Humanity”: a simple message, but so necessary

The recent VW emissions scandal says it all: even a large company can’t get away with behaviours that disrespect key societal values. Protection of the  environment is among these values today, so much so that not only public authorities step in to defend it, but even markets punish the transgressors.

Data protection is not (yet) such a value. Admittedly, some associations, individuals, and government officials fight for it, but the larger public is still unsure. It’s not that people don’t care, but that uncertainty as to what data are actually collected, for what usages, and by whom, is overwhelming; and it becomes difficult to identify the best course of action.

In this context, a new initiative is most welcome: an open letter on “Data for Humanity“, initiated by two scholars of the University of Frankfurt, pleads for a more responsible use of data. The message is simple: Do no harm. And if you can, on top of it, do something good. It’s so simple, and so necessary.

Sure, the world won’t change after this letter, but it will be a first step. Even the promotion of environmental protection started with simple, basic declarations, 30-40 years ago; and it was by insisting and perseverating, that it finally gained the conscience of everybody.

New publications on big data and official statistics

National Statistical Institutes (NSIs) have long been the recognised repositories of all socio-economic information, mandated by governments to collect and analyse data on their behalf. The development of big data is shaking this world. New actors are coming in and commercially-oriented, privately-produced information challenges the monopoly of NSIs. At the same time, NSIs themselves can tap into digital technologies and produce “big” data. More generally, these new sources offer a range of opportunities, challenges and risks to the work of NSIs.

OpendataThe Statistical Journal of the IAOS, the flagship journal of the International Association for Official Statistics, has published a special section on big data – of particular interest to the extent that it is free of charge!

Fride Eeg-Henriksen and Peter Hackl introduce this special section by defining big data and emphasising its interest for official statistics. But it is crucial,  albeit admittedly not easy, to separate the hype around big data from its actual importance.

The other papers are concrete examples of how big data may be integrated into official statistics:

Continue reading