Posts Tagged ‘ Big data ’

Open Data: What’s new in 2017?

I am now in Montréal, where I participated, last Friday, in a panel on Open Data at “Science & You” international conference. It was interesting for me to reflect on how the picture has changed since my previous panel on the same topic – in Kiev in 2012. Back then, we were busy trying to convince public administrations that data opening was good for transparency and could help improve services to communities. Since then, a lot of attempts have been made in numerous countries – local authorities often pioneering the process, followed only later by central governments (one example cited in my panel was Québec City). What is made open is typically information from public registers (first names of newborns, records of road accidents) and increasingly, from technological devices and sensors (bus traffic information).

There are some conditions to be met for a dataset to be said “open”:

  • Technically, it needs to be “raw”, detailed, digital and reusable. The French Interior Ministry released results of the first round of the recent presidential elections within a few days, at polling station level. This is sufficiently detailed (with over 69,000 polling stations throughout the country), raw (allowing aggregations, comparisons etc.), and digital/reusable (so much so that the newspaper Le Monde could develop a user-friendly application to let readers easily check results in their neighborhoods). Some would also insist that “open” data should be released in non-proprietary formats (better .csv than .xls, for example).
  • Legally, the data must come with a license that allows re-use by third parties (typically within the Creative Commons family). Ideally, no type of reuse should be ruled out (including somewhat controversially, commercial / for-profit reuse).
  • Economically, the data should be available to all for free (or at least with minimal charges if data preparation requires extra work or expenses).

If in the past few years, a lot of thought has been devoted to the “ideal” conditions for data opening and how this would positively affect public service, the data landscape has now significantly changed.

Continue reading

Science XXL: digital data and social science

I attended last week (unfortunately only part of) an interesting workshop on the effects of today’s abundance and diversity of digital data on social science practices, aptly called “Science XXL“. A variety of topics were discussed and different research experiences were shared, but I’ll just summarize here a few lessons learned that I find interesting.

  • Digital data are archive data. Data retrieved automatically from the digital traces of individual actions, such as those mined from the APIs of platforms such as Twitter, are unlike survey data in that they were not originally recorded for research purposes. The researcher must select relevant records on the basis of some understanding of the conditions under which these data were produced. Perhaps ironically, digital data share these characteristic with data from historical or literary archives.
  • Digital data are not necessarily “big”, in the sense that their volume is often small (at least in social science research so far!), even though they may share other characteristics of big data such as velocity (being generated on the fly as people use digital platforms) or variety (being little or not structured).
  • Digital data can help fill gaps in survey data, for example when survey sampling is not statistically representative: detail and volume can provide extra information that supports general conclusions.
  • Non-clean data, outliers and aberrant observations may be very informative, revealing details that would escape attention if researchers focused only on the average or center of the distribution (the normal law cherished in classical statistical approaches). Special cases are no longer a prerogative of qualitative research.
  • Data analysis is a key ingredient of “computational social science” a field that is growing in importance after an initial phase in which it was largely confined to agent-based simulation and complexity theory.

Big data, big money: how companies thrive on informational resources

Information oils the economy – as we know since the path-breaking research of George Akerlof, Michael Spence and Joseph Stiglitz in the 1970s – and information can be extracted from data. Today, increased availability of “big” data creates the opportunity to access ever more information – for the good of the economy, then.

But in practice, how do companies extract value from this increasingly available information? In a nutshell, there are three ways in which they can do so: matching, targeted advertising, and market segmentation.

Matching is the key business idea of many recently-created companies and start-ups, and consists in helping potential parties to a transaction to find each other: driver and passenger (Uber), host and guest (Airbnb), buyer and seller (eBay), and so on. It is by processing users’ data with suitable algorithms that matching can be done, and the more detailed are the data, the more satisfactory the matching. Firms’ business model is usually based on taking a fee for each successful transaction (each realized match).

Targeted advertising is the practice of selecting, for each user, only the ads that correspond at best to their tastes or practices. Publicizing diapers to the general population will be largely ineffective as many people do not have young children; but targeting only those with young children is likely to produce better results. Here, the function of data is to help decide what to advertise to whom; useful data are people’s socio-demographic situation (age, marriage, children…), their current or past practices (if you bought diapers last week, you might do that again next week), and any declared tastes (for example as a post on Facebook or Twitter). How this produces a gain is obvious: if targeted adverts are more effective, sales will go up.

Continue reading

Special RFS issue on Big Data

Revue Française de Sociologie invites article proposals for a special issue on “Big Data, Societies and Social Sciences”, edited by Gilles Bastin (PACTE, Sciences Po Grenoble) and myself.

Focus is on two inextricably interwoven questions: how do big data transform society? How do big data affect social science practices?

Substantive as well as epistemological / methodological contributions are welcome. We are particularly interested in proposals that examine the social effects and/or the scientific implications of big data based on first-hand experience in the field.

The deadline for submission of extended abstracts is 28 February 2017; for full contributions, it is 15 September 2017. Revue Française de Sociologie accepts articles in French or English.

Further details and guidelines for submission are in the call for papers.

Data, health online communities and the collaborative economy: my tour of Québec

This November gave me the opportunity to give talks and participate in scientific events throughout Québec.

comsanteI started in Montréal, with a seminar at ComSanté, the health communication research centre of Université du Québec à Montréal (UQAM), where I presented my recently published book on websites on eating disorders. While most media attention focused on controversial “pro-anorexia” contents, presented as an undesirable effect of online free speech, I made the point that this part of the webosphere is rather to be seen as a symptom of the effects of current transformations of healthcare systems under austerity policies. Cuts in public health spending encourage patients to be active, informed and equipped, but the resulting social pressure creates paradoxical behaviors and risk-taking.

Also in Montréal, I was invited to a discussion with economic journalist Diane Bérard on the growth and crisis of theecocoll collaborative economy. About 50 people attended the event, co-organised by co-working space L’Esplanade, OuiShare Montréal and the journal Les Affaires. Diane summarized the essentials of the event in a blog post just the day after, and noted six main points:

  • The Uber case dominates discussions and divides the audience – though the collaborative economy is not (just) Uber.
  • The discussion gets easily polarized – a result of the tension between commercial and non-commercial goals of the collaborative economy.
  • We still know little of the business models of these platforms and the external factors that facilitate or hinder their success.
  • Sharing is in fact a niche market – now probably declining after the first enthusiasms.
  • The key issue for the future is work – its transformations, and how it is re-organizing itself.
  • Collaborative principles advance even outside the world of digital platforms, and sometimes permeate more traditional sectors. The near future of collaboration are sharing cities.

Continue reading

Are we all data laborers?

autonomyI gave today a talk at AUTONOMY, a major festival of urban mobility in Paris, where new technologies are at center stage, from driverless cars to electric scooters, bike-sharing solutions, and connected infrastructure for the smart city. I had been asked to talk about labor in digital platforms, such as those offering mobility services.

Digital platforms are often thought of in terms of automation, but it islogos clear that there is labor too: we all have in mind the example of the couriers and drivers of the “on-demand” economy. But there’s more: I’ll show how platforms involve the labor of everyone, including passengers and users of all types. By labor, I mean here human activity that produces data and information – the key source of value for platforms. It is often an implicit, invisible activity of which we may not even be aware – as we tend to focus more on consumption aspects as we talk routinely about “car pooling” or “car sharing”, rather than looking at the underlying productive effort. This is what scholars call “digital labor”.

Four eco-systems

Specialist Antonio Casilli distinguishes four forms of digital labor in platforms, and I am now going to briefly outline them.

Continue reading

Data and theory: substitutes or complements? Lessons from history of economics

EEToday, my chapter on “Formalization and mathematical modelling” is published in a new series of three reference books on History of Economic Analysis (edited by G. Faccarello and H. Kurz, Edward Elgar). The chapter draws heavily on key ideas I developed as part of my thesis on the origins of mathematical economics. But this was a long time ago and reading it again today, I see it in a different light. I notice in particular that economics developed its distinctive mathematical flavour, which makes it neatly stand out relative to the other social sciences, at times in which social research was data-poor – and it did so not despite data paucity, but precisely because of it. William S. Jevons, a 19th-century forefather of the discipline who was clearly aware of the relevance of maths, wrote in 1871:

“The data are almost wholly deficient for the complete solution of any one problem”

yet:

“we have mathematical theory without the data requisite for precise calculation”

Continue reading