Artificial Intelligence and Globalization: Data Labor  and Linguistic Specificities (AIGLe)

We organized the one-day conference AIGLe on 27 October 2022 to present the outcomes of interdisciplinary research conducted by our DiPLab teams in French-speaking African countries (ANR HuSh Project) and Spanish-speaking countries in Latin America (CNRS-MSH TrIA Project). Both initiatives study the human labor necessary to generate and annotate the data needed to produce artificial intelligence, to check outputs, and to intervene in real time when algorithms fail. Researchers from economics, sociology, computer science, and linguistics shared exciting new results and discussed them with the audience.

AIGLe is part of the project HUSh (The HUman Supply cHain behind smart technologies, 2020-2024), funded by ANR, and the research project TRIA (The Work of Artificial Intelligence, 2020-2022), co-financed by the CNRS and the MSH Paris Saclay. This event, under the aegis of the Institut Mines-Télécom, was organized by the DiPLab team with support of ANR, MSH Paris-Saclay and the Ministry of Economy and Finance.

PROGRAM
9:00 – 9:15 Welcome session

9:15 – 10:40 – Session 1 – Maxime Cornet & Clément Le Ludec (IP Paris, ANR HUSH Project): Unraveling the AI Production Process: How French Startups Externalise Data Work to Madagascar. Discussant: Mohammad Amir
Anwar (U. of Edinburgh)

10:45 – 11:00 Coffee Break

11:00 – 12:30 – Session 2 – Chiara Belletti and Ulrich Laitenberger (IP Paris, ANR HUSH Project): Worker Engagement and AI Work on Online Labor Markets. Discussant: Simone Vannuccini (U. of Sussex)

12:30 – 13:30 Lunch Break

13:30 – 15:00 Session 3 – Juana-Luisa Torre-Cierpe (IP Paris, TRIA Project) & Paola Tubaro (CNRS, TRIA Project): Uninvited Protagonists: Venezuelan Platform Workers in the Global Digital Economy. Discussant:
Maria de los Milagros Miceli (Weizenbaum Institut)

15:15 – 15:30 Coffee Break

15:30 – 17:00 Session 4 – Ioana Vasilescu (CNRS, LISN, TRIA Project), Yaru Wu (U. of Caen, TRIA Project) & Lori Lamel (LISN CNRS): Socioeconomic profiles embedded in speech : modeling linguistic variation in
micro-workers interviews
. Discussant: Chloé Clavel (Télécom Paris, IP Paris)

Learners in the loop: hidden human skills in machine intelligence

I am glad to announce the publication of a new article in a special issue of the journal Sociologia del lavoro, dedicated to digital labour.

Today’s artificial intelligence, largely based on data-intensive machine learning algorithms, relies heavily on the digital labour of invisibilized and precarized humans-in-the-loop who perform multiple functions of data preparation, verification of results, and even impersonation when algorithms fail. This form of work contributes to the erosion of the salary institution in multiple ways. One is commodification of labour, with very little shielding from market fluctuations via regulative institutions, exclusion from organizational resources through outsourcing, and transfer of social reproduction costs to local communities to reduce work-related risks. Another is heteromation, the extraction of economic value from low-cost labour in computer-mediated networks, as a new logic of capital accumulation. Heteromation occurs as platforms’ technical infrastructures handle worker management problems as if they were computational problems, thereby concealing the employment nature of the relationship, and ultimately disguising human presence. My just-published paper highlights a third channel through which the salary institution is threatened, namely misrecognition of micro-workers’ skills, competencies and learning. Broadly speaking, salary can be seen as the framework within which the employment relationship is negotiated and resources are allocated, balancing the claims of workers and employers. In general, the most basic claims revolve around skill, and in today’s ‘society of performance’ where value is increasingly extracted from intangible resources and competencies, unskilled workers are substitutable and therefore highly vulnerable. In human-in-the-loop data annotation, tight breakdown of tasks, algorithmic control, and arm’s-length transactions obfuscate the competence of workers and discursively undermine their deservingness, shifting power away from them and voiding the equilibrating role of the salary institution.

Following Honneth, I define misrecognition as the attitudes and practices that result in people not receiving due acknowledgement for their value and contribution to society, in this case in terms of their education, skills, and skill development. Platform organization construes work as having little value, and creates disincentives for micro-workers to engage in more complex tasks, weakening their status and their capacity to be perceived as competent. Misrecognition is endemic in these settings and undermines workers’ potential for self-realization, negotiation and professional development.

My argument is based on original empirical data from a mixed-method survey of human-in-the-loop workers in two previously under-researched settings, namely Spain and Spanish-speaking Latin America.

An openly accessible version of the paper is available from the HAL repository.

Big data and the hypothesis of the end of privacy

In the late 2000s, voices suggesting that our societies might be nearing the ‘end of privacy’ became increasingly deafening. Our cultural, political and regulatory environment was on the verge of major transformation – so went the narrative. Businesses rejoiced as notoriously, less privacy and more information oils the economy.

In a video interview with Italian media Idee Sottosopra, I review the courses of action taken by various stakeholders, in particular Internet companies, and examine their conflicts and controversies. I show how the very concept of privacy, inherited from a long legal and judicial tradition, should be revised and redefined to appropriately describe today’s online interactions.

Overall, there is no deterministic and inevitable tendency to exclude privacy from our societies, but rather a tension between social forces for and against privacy, which has accompanied the advent of the digital economy and especially social media. The positions of stakeholders, especially users, are often ambiguous, and social media companies attempted to leverage this ambiguity to their own advantage.

Yet civil society reactions have been stronger and stronger, and after initial David-vs-Goliath attempts of individuals and small associations, more and more authoritative institutions have taken seriously the defence of privacy. We are no longer left to costly and little-visible individual choices, and especially after entry into force of GDPR in Europe, we have now an unprecedented opportunity to act at a more systemic level.

Big Data. L’ipotesi della fine della della Privacy | Società Digitale | Idee Sottosopra

New ANR Project HUSH: Human supply chain behind smart technologies

Together with sociologist Antonio A. Casilli and economist Ulrich Laitenberger, I have recently received ANR (French National Research Agency) funding for a new study of human inputs – mostly platform-mediated work in the production of artificial intelligence solutions. In our project called HUSH (Human supply chain behind smart technologies) we aim to shed light on the whole ecosystem linking platforms, workers and their clients demanding data-related and algorithmic services.

For this project, we are now looking for a

PhD researcher in digital economics

The position provides the opportunity to focus strongly on research, in a very active environment. The team has collaborations with different online platforms and has collected data sets from the web, which can be used by the applicant for their thesis. The focus of the current position is to work on the economic aspects of platform-mediated work, using quantitative analyses. Two other PhD students (in sociology) have already been recruited for this project and work on related topics.

The starting date is January 2020 (a later starting date is also possible). As per national regulations, the annual stipend will be about 1,600 euros per month, with possibility to obtain a complement for extra activities such as teaching. Social security and professional training are provided. Additional funding is available to present your research at international conferences and workshops. The position will be based at the new campus of Telecom Paris in Palaiseau, in the direct neighborhood of École Polytechnique and ENSAE.

Your profile

Applicants should have successfully completed a Master’s degree in economics, socio/economic data science or related disciplines, or expect completion at the beginning of the year 2020. They should have a strong interest in digital platforms, from the perspective of industrial organization or labor economics, and have an empirical focus (econometrics, data science). They should aim at developing programming skills and have an interest in the evaluation of internet data. Fluency in English is required; knowledge of French is advantageous, but not essential.

Telecom Paris and IP Paris

Telecom Paris is part of the newly founded Institute Polytechnique (IP) Paris, together with Ecole Polytechnique, ENSTA, ENSAE and Telecom Sud. The department of social sciences and economics (SES) at Telecom Paris studies the impact of the digitization on economic activity and society. For more information, please see https://www.telecom-paris.fr/fr/lecole/departements-enseignement-recherche/sciences-economiques-sociales/structure/economie-gestion

How to apply

Please submit a cover letter, a curriculum vitae, a transcript of records (listing all subjects taken and their grades), and contact details of one to two referees by November 15, 2019 to Ulrich Laitenberger ( laitenberger@enst.fr ).

Update: applications open until December 15, 2019.

Le moment big data des sciences sociales: quel accès aux données du web et des médias sociaux ?

Table ronde, Sciences Po Paris, 6 décembre 2018, 18h00

RFS2018

Pour que la recherche en sciences sociales puisse pleinement tirer profit des grandes bases de données numériques, un verrou reste à lever : l’accès à ces données est limité, inégalement distribué, et entouré d’un flou juridique et déontologique. Nous proposons d’en discuter à l’occasion de la parution du numéro spécial de la Revue Française de Sociologie sur “Big data, sociétés et sciences sociales” (n. 59/3). Cette table ronde réunit les chercheur.e.s avec d’autres parties prenantes publiques et
privées.

Avec :

  • Garance Lefèvre, Policy senior associate, Uber
  • Roxane Silberman, Conseillère scientifique, Centre d’Accès Sécurisé aux Données (CASD)
  • Sophie Vulliet-Tavernier, Directrice des relations avec les publics et la recherche, Commission Nationale de l’Informatique et des Libertés (CNIL)
  • Les auteurs du numéro spécial.

Modérateurs : Gilles Bastin (Univ. Grenoble Alpes) et Paola Tubaro (CNRS), coordinateurs du numéro spécial.

Entrée libre et gratuite, dans la limite des places disponibles: pour s’inscrire, cliquez ici.

Accès : Sciences Po, salle Goguel. Entrée par le 27 rue Saint-Guillaume, 75007 Paris (traverser le jardin et prendre l’ascenseur jusqu’au dernier étage). La table ronde est organisée par la Revue Française de Sociologie en collaboration avec les Presses de Sciences Po. Elle sera suivie d’un pot.

 

More than complex: large and rich network structures

I co-organize this Satellite to the NETSCI2018 Conference in Paris, 12 June 2018. We are now accepting submissions of proposals for presentations.

Information on the Satellite

In traditional research paradigms, sociology handles small but rich networks where the richness of network attributes is derived from the specific buildup of the data collection process. In the sociological approach, differences among nodes and edges are key to describe network properties and the ensuing dynamical social processes. Instead, the complex systems tradition deals with large but poor networks. Assuming statistical equivalence of graph entities, a mean field treatment serves to describe the aggregate properties of the network. Today’s network datasets contain an unprecedented quantity of relational information at all, and between all, the possible levels: individuals, social groups, political structures, economical actors, etc. We finally deal with large and rich network structures that expose the implicit limitations of the two abovementioned approaches: the traditional methods from social science cannot be upscaled because of their algorithmic complexity and those from complex systems lose track of the complex nature of the actors, their relationships and their processes. This workshop has the aim of developing an interdisciplinary reflection on how methods from social science could be upscaled to large network structures and on how methods from complex systems could be downscaled to deal with small heterogeneous structures.

We are proud that five prominent international scholars are our invited speakers: Camille Roth, SciencesPo Paris; Matthieu Latapy, LIP6UPMC Paris; Alessandro Lomi, ETH Zurich; Fariba Karimi, GESIS Cologne; Noshir Contractor, Northwestern University.

Contributions

We invite abstracts of published or unpublished work for contributed talks to take place at the satellite symposium. We expect a broad range of topics to be covered, across theory, methodology, and application to empirical data, relating to an interdisciplinary reflection on how methods from social science could be upscaled to large network structures and on how methods from complex systems could be downscaled to deal with small heterogeneous structures.

Submission can be made through our website.

Submissions are required to be at most 650 words long and should include the following information: title of the talk, author(s), affiliation(s), email address(es), name of the presenter, abstract. Papers or submissions longer than 1 page will not be accepted.

Important dates

Abstract submission deadline is March 25, 2018. Notification of acceptance will be no later than April 23, 2018.

All participants and accepted speakers will have to register through the NETSCI2018 website.

Open Data: What’s new in 2017?

I am now in Montréal, where I participated, last Friday, in a panel on Open Data at “Science & You” international conference. It was interesting for me to reflect on how the picture has changed since my previous panel on the same topic – in Kiev in 2012. Back then, we were busy trying to convince public administrations that data opening was good for transparency and could help improve services to communities. Since then, a lot of attempts have been made in numerous countries – local authorities often pioneering the process, followed only later by central governments (one example cited in my panel was Québec City). What is made open is typically information from public registers (first names of newborns, records of road accidents) and increasingly, from technological devices and sensors (bus traffic information).

There are some conditions to be met for a dataset to be said “open”:

  • Technically, it needs to be “raw”, detailed, digital and reusable. The French Interior Ministry released results of the first round of the recent presidential elections within a few days, at polling station level. This is sufficiently detailed (with over 69,000 polling stations throughout the country), raw (allowing aggregations, comparisons etc.), and digital/reusable (so much so that the newspaper Le Monde could develop a user-friendly application to let readers easily check results in their neighborhoods). Some would also insist that “open” data should be released in non-proprietary formats (better .csv than .xls, for example).
  • Legally, the data must come with a license that allows re-use by third parties (typically within the Creative Commons family). Ideally, no type of reuse should be ruled out (including somewhat controversially, commercial / for-profit reuse).
  • Economically, the data should be available to all for free (or at least with minimal charges if data preparation requires extra work or expenses).

If in the past few years, a lot of thought has been devoted to the “ideal” conditions for data opening and how this would positively affect public service, the data landscape has now significantly changed.

Continue reading “Open Data: What’s new in 2017?”

Science XXL: digital data and social science

I attended last week (unfortunately only part of) an interesting workshop on the effects of today’s abundance and diversity of digital data on social science practices, aptly called “Science XXL“. A variety of topics were discussed and different research experiences were shared, but I’ll just summarize here a few lessons learned that I find interesting.

  • Digital data are archive data. Data retrieved automatically from the digital traces of individual actions, such as those mined from the APIs of platforms such as Twitter, are unlike survey data in that they were not originally recorded for research purposes. The researcher must select relevant records on the basis of some understanding of the conditions under which these data were produced. Perhaps ironically, digital data share these characteristic with data from historical or literary archives.
  • Digital data are not necessarily “big”, in the sense that their volume is often small (at least in social science research so far!), even though they may share other characteristics of big data such as velocity (being generated on the fly as people use digital platforms) or variety (being little or not structured).
  • Digital data can help fill gaps in survey data, for example when survey sampling is not statistically representative: detail and volume can provide extra information that supports general conclusions.
  • Non-clean data, outliers and aberrant observations may be very informative, revealing details that would escape attention if researchers focused only on the average or center of the distribution (the normal law cherished in classical statistical approaches). Special cases are no longer a prerogative of qualitative research.
  • Data analysis is a key ingredient of “computational social science” a field that is growing in importance after an initial phase in which it was largely confined to agent-based simulation and complexity theory.