Another article has just been published! Another one that is based on a DiPLab-based group collaboration (with A.A. Casilli, M. Fernández Massi, J. Longo, J. Torres Cierpe and M. Viana Braz) and that uses data from multiple countries. It is entitled ‘The digital labour of artificial intelligence in Latin America: a comparison of Argentina, Brazil, and Venezuela’ and is part of a special issue of Globalizations on ‘The Political Economy of AI in Latin America’. This article lifts the veil on the precarious and low-paid data workers who, from Latin America, engage in AI preparation, verification, and impersonation, often for foreign technology producers. Focusing on three countries (Argentina, Brazil, and Venezuela), we use original mixed-method data to compare and contrast these cases in order to reveal common patterns and expose the specificities that distinguish the region.
The analysis unveils the central place of Latin America in the provision of data work. To bring costs down, AI production thrives on countries’ economic hardship and inequalities. In Venezuela and to a lesser extent Argentina, acute economic crisis fuels competition and favours the emergence of ‘elite’ (young and STEM-educated) data workers, while in more stable but very unequal Brazil, this activity is left to relatively underprivileged segments of the workforce. AI data work also redefines these inequalities insofar as, in all three countries, it blends with the historically prevalent informal economy, with workers frequently shifting between the two. There are spillovers into other sectors, with variations depending on country and context, which tie informality to inequality.
Our study has policy implications at global and local levels. Globally, it calls for more attention to the conditions of AI production, especially workers’ rights and pay. Locally, it advocates solutions for the recognition of skills and experience of data workers, in ways that may support their further professional development and trajectories, possibly also facilitating some initial forms of worker organization.
The version of record is here, while an open-access preprint is available here.
I am thrilled to announce that an important article has just seen the light. Entitled ‘Where does AI come from? A global case study across Europe, Africa, and Latin America’, it is part of a special issue of New Political Economy on ‘Power relations in the digital economy‘. It is the result of joint work that I have done with members of the Diplab team (A.A. Casilli, M. Cornet, C. Le Ludec and J. Torres Cierpe) on the organisational and geographical forces underpinning the supply chains of artificial intelligence (AI). Where and how do AI producers recruit workers to perform data annotation and other essential, albeit lower-level supporting tasks to feed machine-learning algorithms? The literature reports a variety of organisational forms, but the reasons of these differences and the ways data work dovetails with local economies have remained for long under-researched. This article does precisely this, clarifying the structure and organisation of these supply chains, and highlighting their impacts on labour conditions and remunerations.
Framing AI as an instance of the outsourcing and offshoring trends already observed in other globalised industries, we conduct a global case study of the digitally enabled organisation of data work in France, Madagascar, and Venezuela. We show that the AI supply chains procure data work via a mix of arm’s length contracts through marketplace-like platforms, and of embedded firm-like structures that offer greater stability but less flexibility, with multiple intermediate arrangements that give different roles to platforms. Each solution suits specific types and purposes of data work in AI preparation, verification, and impersonation. While all forms reproduce well-known patterns of exclusion that harm externalised workers especially in the Global South, disadvantage manifests unevenly depending on the structure of the supply chains, with repercussions on remunerations, job security, and working conditions.
Marketplace- and firm-like platforms in the supply chains for data work in Europe, Africa, and Latin America. Dark grey countries: main case studies, light grey countries: comparison cases. Organisational modes range from almost totally marketplace oriented (darker rectangle, Venezuela) to almost entirely firm oriented (lighter rectangle, Madagascar). AI preparation (darker circle) is ubiquitous, but AI verification (darker triangle) and AI impersonation (darker star) tend to happen in ‘deep labour’ and firm-like organisations where embeddedness is higher.
We conclude that responses based only on worker reclassification, as attempted in some countries especially in the Global North, are insufficient. Rather, we advocate a policy mix at both national and supra-national levels, also including appropriate regulation of technology and innovation, and promotion of suitable strategies for economic development.
The Version of record is here, while here is an open access preprint.
My great regret is that I always have very little time to write posts, and the emptiness of this blog does not reflect the numerous, great and stimulating scientific events and opportunities that I have enjoyed throughout 2024. As a last-minute remedy (with a promise to do better next year…hopefully), I try to summarize the landmarks here, month by month.
In January, I launched the Voices from Online Labour (VOLI) project, which I coordinate with a grant of about €570,000 from the French National Agency for Research. This four-year initiative brings together expertise from sociology, linguistics, and AI technology across multiple institutions, including four French research centres, a speech technology company, and three international partners.
In February with the Diplab team, I spent two exciting days at the European Parliament in Brussels, engaging in profound discussions with and about platform workers as part of the 4th edition of the Transnational Forum on Alternatives to Uberization. I chaired a panel with data workers and content moderators from Europe and beyond, aiming to raise awareness about the difficult working conditions of those who fuel artificial intelligence and ensure safe participation to social media.
In March, three publications saw the light. One is a solo-authored chapter, in French, on ‘Algorithmes, inégalités, et les humains dans la boucle‘ (Algorithms, inequalities, and the humans in the loop) in a collective book entitled ‘Ce qui échappe à l’intelligence artificielle‘ (What AI cannot do). The other two are journal articles that may seem a little less close to my ‘usual’ topics, but they are important because they constitute experiments in research-informed teaching. One is a study of the 15-minute city concept applied to Paris, realized in collaboration with a colleague, S. Berkemer of Ecole Polytechnique, and a team of brilliant ENSAE students. The other is an analysis of the penetration of AI into a specific field of research, neuroscience, showing that for all its alleged potential, it created a confined subfield but did not entirely disrupt the discipline. The study, part of a larger project on AI in science, was part of the PhD research of S. Fontaine (who has now got his degree!), also co-authored with his co-supervisors F. Gargiulo and M. Dubois.
In April, I co-published the final report from the study realized for the European Parliament, ‘Who Trains the Data for European Artificial Intelligence?‘. Despite massive offshoring of data tasks to lower-income countries in the Global South, we find that there are still data workers in Europe. They often live in countries where standard labour markets are weaker, like Portugal, Italy and Spain; in more dynamic countries like Germany and France, they are often immigrants. They do data work because they lack sufficiently good alternative opportunities, although most of them are young and highly educated.
I then attended two very relevant events. On 30 April-1 May, I was at a Workshop on Driving Adoption of Worker-Centric Data Enrichment Guidelines and Principles, organised by Partnership on AI (PAI) and Fairwork in New York city to bring together representatives of AI companies, data vendors and platforms, and researchers. The goal was to discuss options to improve working conditions from the side of the employers and intermediaries. On 28 May, I was in Cairo, Egypt, to attend the very first conference of the Middle East and Africa chapter of INDL (International Network on Digital Labour), the research network I co-founded. It was a fantastic opportunity to start opening the network to countries that were less present before, and whose voices we would like to hear more.
August is a quieter month (but I greatly enjoyed a session at the Paralympics in Paris!), so I’ll jump to September. Lots of activities: a trip to Cambridge, UK, and a workshop on disinformation at the Minderoo Centre for Technology and Democracy; a workshop on Invisible Labour at Copenhagen Business School in Denmark; and a one-day conference on gender in the platform economy in Paris. Another publication came out: a journal article, in Spanish, on Argentinean platform data workers.
At the end of October, and until mid-November, I travelled to Chile for the seventh conference of the International Network on Digital Labour (INDL-7), which I co-organised. It was an immensely rewarding experience. I took the opportunity to strengthen my linkages and collaborations with colleagues there. It was a very intense, and super-exciting, time: after INDL-7 (28-30 October), I spent a week in Buenos Aires, Argentina, where I co-presented work in progress at the XV Jornadas de Estudios Sociales de la Economía, UNSAM. I then returned to Chile where I gave a keynote at the XI COES International Conference in Viña del Mar, Chile, on 8 November, and another at the ENEFA conference in Valdivia (Chile) on 14 November. I also gave a talk as part of the ChiSocNet series of seminars in Santiago, 11 November.
Within the Horizon-Europe project AI4TRUST, we published a first report presenting the state of the art in the socio-contextual basis for disinformation, relying on a broad review of extant literature, of which the below is a synthesis.
What is disinformation?
Recent literature distinguishes three forms:
‘misinformation’ (inaccurate information unwittingly produced or reproduced)
‘disinformation’ (erroneous, fabricated, or misleading information that is intentionally shared and may cause individual or social harm)
‘malinformation’ (accurate information deliberately misused with malicious or harmful intent).
Two consequences derive from this insight. First, the expression ‘fake news’ is unhelpful: problematic contents are not just news, and are not always false. Second, research efforts limited to identifying incorrect information alone, without capturing intent, may miss some of the key social processes surrounding the emergence and spread of problematic contents.
How does mis/dis/malinformation spread?
Recent literature often describes the characteristics of the process of diffusion of mis/dis/malinformation in terms of ‘cascades’, that is, the iterative propagation of content from one actor to others in a tree-like fashion, sometimes with consideration of temporality and geographical reach. There is evidence that network structures may facilitate or hinder propagation, regardless of the characteristics of individuals: therefore, relationships and interactions constitute an essential object of study to understand how problematic contents spread. Instead, the actual offline impact of online disinformation (for example, the extent to which online campaigns may have inflected electoral outcomes) is disputed. Likewise, evidence on the capacity of mis/dis/malinformation to spread across countries is mixed. A promising perspective to move forwards relies on hybrid approaches mixing network and content analysis (‘socio-semantic networks’).
What incentivizes mis/dis/malinformation?
Mis/dis/malinformation campaigns are not always driven solely by political tensions and may also be the product of economic interest. There may be incentives to produce or share problematic information, insofar as the business model of the internet confers value upon contents that attract attention, regardless of their veracity or quality. A growing, shadow market of paid ‘like’, ‘share’ and ‘follow’ inflates the rankings and reputation scores of web pages and social media profiles, and it may ultimately mislead search engines. Thus, online metrics derived from users’ ratings should be interpreted with caution. Research should also be mindful that high-profile disinformation campaigns are only the tip of the iceberg, low-stake cases being far more frequent and difficult to detect.
Who spreads mis/dis/malinformation?
Spreaders of mis/dis/malinformation may be bots or human users, the former being increasingly controlled by social media companies. Not all humans are equally likely to play this role, though, and the literature highlights ‘super-spreaders’, particularly successful at sharing popular albeit implausible contents, and clusters of spreaders – both detectable in data with social network analysis techniques.
How is mis/dis/malinformation adopted?
Adoption of mis/dis/malinformation should not be taken for granted and depends on cognitive and psychological factors at individual and group levels, as well as on network structures. Actors use ‘appropriateness judgments’ to give meaning to information and elaborate it interactively with their networks. Judgments depend on people’s identification to reference groups, recognition of authorities, and alignment with priority norms. Adoption can thus be hypothesised to increase when judgments are similar and signalled as such in communication networks. Future research could target such signals to help users in their contextualization and interpretation of the phenomena described.
Multiple examples of research in social network analysis can help develop a model of the emergence and development of appropriateness judgements. Homophily and social influence theories help conceptualise the role of inter-individual similarities, the dynamics of diffusion in networks sheds light on temporal patterns, and analyses of heterogeneous networks illuminate our understanding of interactions. Overall, social network analysis combined with content analysis can help research identify indicators of coordinated malicious behaviour, either structural or dynamic.
I had the privilege and pleasure to visit Madagascar in the last two weeks. I had an invitation from Institut Français where I participated in a very interesting panel on “How can Madagascar help us rethink artificial intelligence more ethically?”, with Antonio A. Casilli, Jeremy Ranjatoelina et Manovosoa Rakotovao. I also conducted exploratory fieldwork by visiting a sample of technology companies, as well as journalists and associations interested in the topic.
A former French colony, Madagascar participates in the global trend toward outsourcing / offshoring which has shaped the world economy in the past two decades. The country harnesses its cultural and linguistic heritage (about one quarter of the population still speak French, often as a second language) to develop services for clients mostly based in France. In particular, it is a net exporter of computing services – still a small-sized sector, but with growing economic value.
Last year, a team of colleagues has already conducted extensive research with Madagascan companies that provide micro-work and data annotation services for French producers of artificial intelligence (and of other digital services). Some interesting results of their research are available here. This time, we are trying to take a broader look at the sector and include a wider variety of computing services, also trying to trace higher-value-added activities (like computer programming, website design, and even AI development).
It is too early to present any results, but the big question so far is the sustainability of this model and the extent to which it can push Madagascar higher up in the global technology value chain. Annotation and other lower-level services create much-needed jobs in a sluggish economy with widespread poverty and a lot of informality; however, these jobs attract low recognition and comparatively low pay, and have failed so far to offer bridges toward more stable or rewarding career paths. More qualified computing jobs are better paid and protected, but turnover is high and (national and international) competition is tough.
At policy level, more attention should be brought to the quality of these jobs and their longer-term stability, while client tech companies in France and other Global North countries should take more responsibility over working conditions throughout their international supply chains.
Most of my current research aims to unpack artificial intelligence (AI) from the viewpoint of its commercial production, looking in particular at the human resources needed to prepare the data it needs – whence my studies on the data work and annotation market. However, for once, I am focusing on AI as a set of scientific theories and tools, regardless of their market positioning; indeed, I have joined a team of science-of-science specialists to study the disciplinary origins and subsequent spread of AI over time.
In a newly published, open-acces article, we unveil the disciplinary composition of AI, and the links between its various sub-fields. We question a common distinction between ‘native’ and ‘applicative’ disciplines, whereby only the former (typically confined to statistics, mathematics, and computer science) produce foundational algorithms and theorems for AI. In fact, we find that the origins of the field are rather multi-disciplinary and benefit, among others, from insights from cognitive science, psychology, and philosophy. These intersecting contributions were most evident in the historical practices commonly known as ‘symbolic systems’. Later, different scientific fields have become, in turn, the central originating domains and applicators of AI knowledge, for example operations research, which was for a long time one of the core actors of AI applications related to expert systems.
While the notion of statistics, mathematics and computer science as native disciplines has become more relevant in recent times, the spread of AI throughout the scientific ecosystem is uneven. In particular, only a small number of AI tools, such as dimensionality reduction techniques, are widely adopted (for example, variants of these techniques have been in use in sociology for decades). But if transfer of AI is largely ascribable to multi-disciplinary interactions, very few of them exist. We observe very limited collaborations between researchers in disciplines that create AI and researchers in disciplines that only (or mainly) apply AI. The small core of multi-disciplinary champions who interact with both sides, and the presence of a few multi-disciplinary journals, sustains the whole system.
Inter- and multi-disciplinary interactions are essential for AI to thrive and to adequately support scientific research in all fields, but disciplinary boundaries are notoriously hard to break. Strategies to better reward inter-disciplinary training, publications, and careers, are thus essential. Of course the potential for AI to significantly advance knowledge is still (largely) to be proven, and there have been disappointing experiences with, for example, the comparatively limited effectiveness of these tools in research on Covid-19. In all cases, the status quo is not ideal, and important steps forward are now needed.
We establish these results by analyzing a large corpus of scientific papers published between 1970 and 2017, extracted from Microsoft Academic Graph through the AI keywords used by the authors, and explored with different relational structures among the scientometric data (keyword co-occurrence network, authors’ collaboration network).
Full citation: Floriana Gargiulo, Sylvain Fontaine, Michel Dubois, Paola Tubaro. A meso-scale cartography of the AI ecosystem. Quantitative Science Studies, 2023; doi: https://doi.org/10.1162/qss_a_00267
AI is not just a Silicon Valley dream. It relies among other things, on inputs from human workers who generate and annotate data for machine learning. They record their voice to augment speech datasets, transcribe receipts to provide examples to OCR software, tag objects in photographs to train computer vision algorithms, and so on. They also check algorithmic outputs, for example, by noting whether the outputs of a search engine meet users’ queries. Occasionally, they take the place of failing automation, for example when content moderation software is not subtle enough to distinguish whether some image or video is appropriate. AI producers outsource these so-called “micro-tasks” via international digital labor platforms, who often recruit workers in Global-South countries, where labor costs are lower. Pay is by piecework, without any no long-term commitment and without any social-security scheme or labor protection.
In a just-published report co-authored with Matheus Viana Braz and Antonio A. Casilli, as part of the research program DiPlab, we lifted the curtain on micro-workers in Brazil, a country with a huge, growing, and yet largely unexplored reservoir of AI workers.
We found among other things that:
Three out of five Brazilian data workers are women, while in most other previously-surveyed countries, women are a minority (one in three or less in ILO data).
9 reais (1.73 euros) per hour is the average amount earned on platforms.
There are at least 54 micro-working platforms operating in Brazil.
One third of Brazilian micro-workers have no other source of income, and depend on microworking platforms for subsistence.
Two out of five Brazilian data workers are (apart from this activity) unemployed, without professional activity, or in informality. In Brazil, platform microwork arises out of widespread unemployment and informalization of work.
Three out of five of data workers have completed undergraduate education, although they mostly do repetitive and unchallenging online data tasks, suggesting some form of skill mismatch.
The worst microtasks involve moderation of violent and pornographic contents on social media, as well as data training in tasks that workers may find uncomfortable or weird, such as taking pictures of dog poop in domestic environments to train data for “vacuuming robots”.
Workers’ main grievances are linked to uncertainty, lack of transparency, job insecurity, fatigue and lack of social interaction on platforms.
As part of a large, interdisciplinary European research project, we are seeking a motivated, open-minded student to join CNRS (specifically, the Centre for Research in Economics and Statistics, CREST) in Palaiseau, France, for three years.
The thesis aims to model the production and dissemination of ‘fake news’ in situations of uncertainty and socio-economic inequality. A rich sociological literature suggests that actors contextualise messages received and emitted as questions or answers, interpret them according to their recipients and senders, and assess their social acceptability within their own networks of relationships, taking into account their relative position. Building on this research, the goal is to identify the social processes underpinning misinformation-generating digital communications: collective identity, inequalities of status or authority, hierarchy of shared norms. This will enable interpreting the online social interactions through which actors collectively judge the (appropriate or inappropriate) quality of a message or information and then decide whether to relay or share it – and with whom. In particular, the thesis work will contribute to: 1/ drawing up a state of the art, mainly within sociology but open to the neighbouring disciplines which have also addressed these questions; 2/ illustrating and testing these theories through an empirical analysis of a digital database, mainly with quantitative methods, which may be enriched through a small complementary qualitative fieldwork; 3/ to contribute to the preparation of guidelines that help information professionals and policy-makers to detect the sources and modalities of emergence and propagation of misinformation.
The thesis will be done within the framework of the interdisciplinary project “AI-based-technologies for trustworthy solutions against disinformation” (AI4TRUST), funded by the European Union over the period 2023-2026, involving 17 partners (research institutions and media professionals) in 10 countries, and coordinated by Fondazione Bruno Kessler (Italy).
The AI4TRUST project aims to build a hybrid system, with advanced artificial intelligence solutions capable of cooperating with humans in the fight against disinformation. The new algorithms that will be developed in this framework, constantly checked and improved by human fact-checkers, will monitor multiple online social platforms in nearly real time, analysing text, audio, and visual contents in several languages. The resulting quantitative indicators, including infodemic risk, will be inspected under the lens of social and computational social sciences, to build the trustworthy elements required by media professionals.
The successful candidate will have the opportunity to join a group of highly motivated scientists and practitioners from across the continent; to participate in collaborations with other teams working on the project in an interdisciplinary framework; to attend regular meetings with the project’s principal Investigator, the scientists and experts involved, and public decision-makers; to present and publish research results in international conferences and journals.
The ideal candidate has a good background in quantitative sociology or in a STEM discipline (e.g., mathematics, statistics, computer science) with a strong interest in societal issues and challenges. A very good knowledge of English, an interdisciplinary approach and the ability to work in teams are essential.
Candidates should apply on the CNRS portal, where they will also find more details.
We organized the one-day conference AIGLe on 27 October 2022 to present the outcomes of interdisciplinary research conducted by our DiPLab teams in French-speaking African countries (ANR HuSh Project) and Spanish-speaking countries in Latin America (CNRS-MSH TrIA Project). Both initiatives study the human labor necessary to generate and annotate the data needed to produce artificial intelligence, to check outputs, and to intervene in real time when algorithms fail. Researchers from economics, sociology, computer science, and linguistics shared exciting new results and discussed them with the audience.
AIGLe is part of the project HUSh (The HUman Supply cHain behind smart technologies, 2020-2024), funded by ANR, and the research project TRIA (The Work of Artificial Intelligence, 2020-2022), co-financed by the CNRS and the MSH Paris Saclay. This event, under the aegis of the Institut Mines-Télécom, was organized by the DiPLab team with support of ANR, MSH Paris-Saclay and the Ministry of Economy and Finance.
PROGRAM 9:00 – 9:15 Welcome session
9:15 – 10:40 – Session 1 – Maxime Cornet & Clément Le Ludec (IP Paris, ANR HUSH Project): Unraveling the AI Production Process: How French Startups Externalise Data Work to Madagascar. Discussant: Mohammad Amir Anwar (U. of Edinburgh)
10:45 – 11:00 Coffee Break
11:00 – 12:30 – Session 2 – Chiara Belletti and Ulrich Laitenberger (IP Paris, ANR HUSH Project): Worker Engagement and AI Work on Online Labor Markets. Discussant: Simone Vannuccini (U. of Sussex)
12:30 – 13:30 Lunch Break
13:30 – 15:00 Session 3 – Juana-Luisa Torre-Cierpe (IP Paris, TRIA Project) & Paola Tubaro (CNRS, TRIA Project): Uninvited Protagonists: Venezuelan Platform Workers in the Global Digital Economy. Discussant: Maria de los Milagros Miceli (Weizenbaum Institut)
15:15 – 15:30 Coffee Break
15:30 – 17:00 Session 4 – Ioana Vasilescu (CNRS, LISN, TRIA Project), Yaru Wu (U. of Caen, TRIA Project) & Lori Lamel (LISN CNRS): Socioeconomic profiles embedded in speech : modeling linguistic variation in micro-workers interviews. Discussant: Chloé Clavel (Télécom Paris, IP Paris)
Today’s artificial intelligence, largely based on data-intensive machine learning algorithms, relies heavily on the digital labour of invisibilized and precarized humans-in-the-loop who perform multiple functions of data preparation, verification of results, and even impersonation when algorithms fail. This form of work contributes to the erosion of the salary institution in multiple ways. One is commodification of labour, with very little shielding from market fluctuations via regulative institutions, exclusion from organizational resources through outsourcing, and transfer of social reproduction costs to local communities to reduce work-related risks. Another is heteromation, the extraction of economic value from low-cost labour in computer-mediated networks, as a new logic of capital accumulation. Heteromation occurs as platforms’ technical infrastructures handle worker management problems as if they were computational problems, thereby concealing the employment nature of the relationship, and ultimately disguising human presence. My just-published paper highlights a third channel through which the salary institution is threatened, namely misrecognition of micro-workers’ skills, competencies and learning. Broadly speaking, salary can be seen as the framework within which the employment relationship is negotiated and resources are allocated, balancing the claims of workers and employers. In general, the most basic claims revolve around skill, and in today’s ‘society of performance’ where value is increasingly extracted from intangible resources and competencies, unskilled workers are substitutable and therefore highly vulnerable. In human-in-the-loop data annotation, tight breakdown of tasks, algorithmic control, and arm’s-length transactions obfuscate the competence of workers and discursively undermine their deservingness, shifting power away from them and voiding the equilibrating role of the salary institution.
Following Honneth, I define misrecognition as the attitudes and practices that result in people not receiving due acknowledgement for their value and contribution to society, in this case in terms of their education, skills, and skill development. Platform organization construes work as having little value, and creates disincentives for micro-workers to engage in more complex tasks, weakening their status and their capacity to be perceived as competent. Misrecognition is endemic in these settings and undermines workers’ potential for self-realization, negotiation and professional development.
My argument is based on original empirical data from a mixed-method survey of human-in-the-loop workers in two previously under-researched settings, namely Spain and Spanish-speaking Latin America.