The socio-contextual basis for disinformation

Within the Horizon-Europe project AI4TRUST, we published a first report presenting the state of the art in the socio-contextual basis for disinformation, relying on a broad review of extant literature, of which the below is a synthesis.

What is disinformation?

Recent literature distinguishes three forms:

  • misinformation’ (inaccurate information unwittingly produced or reproduced)
  • disinformation’ (erroneous, fabricated, or misleading information that is intentionally shared and may cause individual or social harm)
  • malinformation’ (accurate information deliberately misused with malicious or harmful intent).

Two consequences derive from this insight. First, the expression ‘fake news’ is unhelpful: problematic contents are not just news, and are not always false. Second, research efforts limited to identifying incorrect information alone, without capturing intent, may miss some of the key social processes surrounding the emergence and spread of problematic contents.

How does mis/dis/malinformation spread?

Recent literature often describes the characteristics of the process of diffusion of mis/dis/malinformation in terms of ‘cascades’, that is, the iterative propagation of content from one actor to others in a tree-like fashion, sometimes with consideration of temporality and geographical reach. There is evidence that network structures may facilitate or hinder propagation, regardless of the characteristics of individuals: therefore, relationships and interactions constitute an essential object of study to understand how problematic contents spread. Instead, the actual offline impact of online disinformation (for example, the extent to which online campaigns may have inflected electoral outcomes) is disputed. Likewise, evidence on the capacity of mis/dis/malinformation to spread across countries is mixed. A promising perspective to move forwards relies on hybrid approaches mixing network and content analysis (‘socio-semantic networks’).

What incentivizes mis/dis/malinformation?

Mis/dis/malinformation campaigns are not always driven solely by political tensions and may also be the product of economic interest. There may be incentives to produce or share problematic information, insofar as the business model of the internet confers value upon contents that attract attention, regardless of their veracity or quality. A growing, shadow market of paid ‘like’, ‘share’ and ‘follow’ inflates the rankings and reputation scores of web pages and social media profiles, and it may ultimately mislead search engines. Thus, online metrics derived from users’ ratings should be interpreted with caution. Research should also be mindful that high-profile disinformation campaigns are only the tip of the iceberg, low-stake cases being far more frequent and difficult to detect.

Who spreads mis/dis/malinformation?

Spreaders of mis/dis/malinformation may be bots or human users, the former being increasingly controlled by social media companies. Not all humans are equally likely to play this role, though, and the literature highlights ‘super-spreaders’, particularly successful at sharing popular albeit implausible contents, and clusters of spreaders – both detectable in data with social network analysis techniques.

How is mis/dis/malinformation adopted?

Adoption of mis/dis/malinformation should not be taken for granted and depends on cognitive and psychological factors at individual and group levels, as well as on network structures. Actors use ‘appropriateness judgments’ to give meaning to information and elaborate it interactively with their networks. Judgments depend on people’s identification to reference groups, recognition of authorities, and alignment with priority norms. Adoption can thus be hypothesised to increase when judgments are similar and signalled as such in communication networks. Future research could target such signals to help users in their contextualization and interpretation of the phenomena described. 

Multiple examples of research in social network analysis can help develop a model of the emergence and development of appropriateness judgements. Homophily and social influence theories help conceptualise the role of inter-individual similarities, the dynamics of diffusion in networks sheds light on temporal patterns, and analyses of heterogeneous networks illuminate our understanding of interactions. Overall, social network analysis combined with content analysis can help research identify indicators of coordinated malicious behaviour, either structural or dynamic.  

Micro-work and the outsourcing industry in Madagascar

I had the privilege and pleasure to visit Madagascar in the last two weeks. I had an invitation from Institut Français where I participated in a very interesting panel on “How can Madagascar help us rethink artificial intelligence more ethically?”, with Antonio A. Casilli, Jeremy Ranjatoelina et Manovosoa Rakotovao. I also conducted exploratory fieldwork by visiting a sample of technology companies, as well as journalists and associations interested in the topic.

A former French colony, Madagascar participates in the global trend toward outsourcing / offshoring which has shaped the world economy in the past two decades. The country harnesses its cultural and linguistic heritage (about one quarter of the population still speak French, often as a second language) to develop services for clients mostly based in France. In particular, it is a net exporter of computing services – still a small-sized sector, but with growing economic value.

Last year, a team of colleagues has already conducted extensive research with Madagascan companies that provide micro-work and data annotation services for French producers of artificial intelligence (and of other digital services). Some interesting results of their research are available here. This time, we are trying to take a broader look at the sector and include a wider variety of computing services, also trying to trace higher-value-added activities (like computer programming, website design, and even AI development).

It is too early to present any results, but the big question so far is the sustainability of this model and the extent to which it can push Madagascar higher up in the global technology value chain. Annotation and other lower-level services create much-needed jobs in a sluggish economy with widespread poverty and a lot of informality; however, these jobs attract low recognition and comparatively low pay, and have failed so far to offer bridges toward more stable or rewarding career paths. More qualified computing jobs are better paid and protected, but turnover is high and (national and international) competition is tough.

At policy level, more attention should be brought to the quality of these jobs and their longer-term stability, while client tech companies in France and other Global North countries should take more responsibility over working conditions throughout their international supply chains.

Brazil in the global AI supply chains: the role of micro-workers

AI is not just a Silicon Valley dream. It relies among other things, on inputs from human workers who generate and annotate data for machine learning. They record their voice to augment speech datasets, transcribe receipts to provide examples to OCR software, tag objects in photographs to train computer vision algorithms, and so on. They also check algorithmic outputs, for example, by noting whether the outputs of a search engine meet users’ queries. Occasionally, they take the place of failing automation, for example when content moderation software is not subtle enough to distinguish whether some image or video is appropriate. AI producers outsource these so-called “micro-tasks” via international digital labor platforms, who often recruit workers in Global-South countries, where labor costs are lower. Pay is by piecework, without any no long-term commitment and without any social-security scheme or labor protection.

In a just-published report co-authored with Matheus Viana Braz and Antonio A. Casilli, as part of the research program DiPlab, we lifted the curtain on micro-workers in Brazil, a country with a huge, growing, and yet largely unexplored reservoir of AI workers.

We found among other things that:

  • Three out of five Brazilian data workers are women, while in most other previously-surveyed countries, women are a minority (one in three or less in ILO data).
  • 9 reais (1.73 euros) per hour is the average amount earned on platforms.
  • There are at least 54 micro-working platforms operating in Brazil.
  • One third of Brazilian micro-workers have no other source of income, and depend on microworking platforms for subsistence.
  • Two out of five Brazilian data workers are (apart from this activity) unemployed, without professional activity, or in informality. In Brazil, platform microwork arises out of widespread unemployment and informalization of work.
  • Three out of five of data workers have completed undergraduate education, although they mostly do repetitive and unchallenging online data tasks, suggesting some form of skill mismatch.
  • The worst microtasks involve moderation of violent and pornographic contents on social media, as well as data training in tasks that workers may find uncomfortable or weird, such as taking pictures of dog poop in domestic environments to train data for “vacuuming robots”.
  • Workers’ main grievances are linked to uncertainty, lack of transparency, job insecurity, fatigue and lack of social interaction on platforms.

To read the report in English, click here.

To read the report in Portuguese, click here.

Artificial Intelligence and Globalization: Data Labor  and Linguistic Specificities (AIGLe)

We organized the one-day conference AIGLe on 27 October 2022 to present the outcomes of interdisciplinary research conducted by our DiPLab teams in French-speaking African countries (ANR HuSh Project) and Spanish-speaking countries in Latin America (CNRS-MSH TrIA Project). Both initiatives study the human labor necessary to generate and annotate the data needed to produce artificial intelligence, to check outputs, and to intervene in real time when algorithms fail. Researchers from economics, sociology, computer science, and linguistics shared exciting new results and discussed them with the audience.

AIGLe is part of the project HUSh (The HUman Supply cHain behind smart technologies, 2020-2024), funded by ANR, and the research project TRIA (The Work of Artificial Intelligence, 2020-2022), co-financed by the CNRS and the MSH Paris Saclay. This event, under the aegis of the Institut Mines-Télécom, was organized by the DiPLab team with support of ANR, MSH Paris-Saclay and the Ministry of Economy and Finance.

PROGRAM
9:00 – 9:15 Welcome session

9:15 – 10:40 – Session 1 – Maxime Cornet & Clément Le Ludec (IP Paris, ANR HUSH Project): Unraveling the AI Production Process: How French Startups Externalise Data Work to Madagascar. Discussant: Mohammad Amir
Anwar (U. of Edinburgh)

10:45 – 11:00 Coffee Break

11:00 – 12:30 – Session 2 – Chiara Belletti and Ulrich Laitenberger (IP Paris, ANR HUSH Project): Worker Engagement and AI Work on Online Labor Markets. Discussant: Simone Vannuccini (U. of Sussex)

12:30 – 13:30 Lunch Break

13:30 – 15:00 Session 3 – Juana-Luisa Torre-Cierpe (IP Paris, TRIA Project) & Paola Tubaro (CNRS, TRIA Project): Uninvited Protagonists: Venezuelan Platform Workers in the Global Digital Economy. Discussant:
Maria de los Milagros Miceli (Weizenbaum Institut)

15:15 – 15:30 Coffee Break

15:30 – 17:00 Session 4 – Ioana Vasilescu (CNRS, LISN, TRIA Project), Yaru Wu (U. of Caen, TRIA Project) & Lori Lamel (LISN CNRS): Socioeconomic profiles embedded in speech : modeling linguistic variation in
micro-workers interviews
. Discussant: Chloé Clavel (Télécom Paris, IP Paris)

Learners in the loop: hidden human skills in machine intelligence

I am glad to announce the publication of a new article in a special issue of the journal Sociologia del lavoro, dedicated to digital labour.

Today’s artificial intelligence, largely based on data-intensive machine learning algorithms, relies heavily on the digital labour of invisibilized and precarized humans-in-the-loop who perform multiple functions of data preparation, verification of results, and even impersonation when algorithms fail. This form of work contributes to the erosion of the salary institution in multiple ways. One is commodification of labour, with very little shielding from market fluctuations via regulative institutions, exclusion from organizational resources through outsourcing, and transfer of social reproduction costs to local communities to reduce work-related risks. Another is heteromation, the extraction of economic value from low-cost labour in computer-mediated networks, as a new logic of capital accumulation. Heteromation occurs as platforms’ technical infrastructures handle worker management problems as if they were computational problems, thereby concealing the employment nature of the relationship, and ultimately disguising human presence. My just-published paper highlights a third channel through which the salary institution is threatened, namely misrecognition of micro-workers’ skills, competencies and learning. Broadly speaking, salary can be seen as the framework within which the employment relationship is negotiated and resources are allocated, balancing the claims of workers and employers. In general, the most basic claims revolve around skill, and in today’s ‘society of performance’ where value is increasingly extracted from intangible resources and competencies, unskilled workers are substitutable and therefore highly vulnerable. In human-in-the-loop data annotation, tight breakdown of tasks, algorithmic control, and arm’s-length transactions obfuscate the competence of workers and discursively undermine their deservingness, shifting power away from them and voiding the equilibrating role of the salary institution.

Following Honneth, I define misrecognition as the attitudes and practices that result in people not receiving due acknowledgement for their value and contribution to society, in this case in terms of their education, skills, and skill development. Platform organization construes work as having little value, and creates disincentives for micro-workers to engage in more complex tasks, weakening their status and their capacity to be perceived as competent. Misrecognition is endemic in these settings and undermines workers’ potential for self-realization, negotiation and professional development.

My argument is based on original empirical data from a mixed-method survey of human-in-the-loop workers in two previously under-researched settings, namely Spain and Spanish-speaking Latin America.

An openly accessible version of the paper is available from the HAL repository.

Human listeners and virtual assistants: privacy and labor arbitrage in the production of smart technologies

I’m glad to announce the publication of new research, as a chapter in the fabulous Digital Work in the Planetary Market, a volume edited by Mark Graham and Fabian Ferrari and published in open access by MIT Press.

The chapter, co-authored with Antonio A. Casilli, starts by recalling how In spring 2019, public outcry followed media revelations that major producers of voice assistants recruit human operators to transcribe and label users’ conversations. These high-profile cases uncovered the paradoxically labor-intensive nature of automation, ultimate cause of the highly criticized privacy violations.

The development of smart solutions requires large amounts of human work. Sub-contracted on demand through digital platforms and usually paid by piecework, myriad online “micro-workers” annotate, tag, and sort the data used to prepare and calibrate algorithms. These humans are also needed to check outputs – such as automated transcriptions of users’ conversations with their virtual assistant – and to make corrections if needed, sometimes in real time. The data that they process include personal information, of which voice is an example.

We show that the platform system exposes both consumers and micro-workers to high risks. Because producers of smart devices conceal the role of humans behind automation, users underestimate the degree to which their privacy is challenged. As a result, they might unwittingly let their virtual assistant capture children’s voices, friends’ names and addresses, or details of their intimate life. Conversely, the micro-workers who hear or transcribe this information face the moral challenge of taking the role of intruders, and bear the burden of maintaining confidentiality. Through outsourcing, platforms often leave them without sufficient safeguards and guidelines, and may even shift onto them the responsibility to protect the personal data they happen to handle.

Besides, micro-workers themselves release their personal data to platforms. The tasks they do include, for example, recording utterances for the needs of virtual assistants that need large sets of, say, ways to ask about the weather to “learn” to recognize such requests. Workers’ voices, identities and profiles are personal data that clients and platforms collect, store and re-use. With many actors in the loop, privacy safeguards are looser and transparency is harder to ensure. Lack of visibility, not to mention of collective organization, prevents workers from taking action.

Note: Description of one labor-intensive data supply chain. A producer of smart speakers located in the US outsources AI verification to a Chinese platform (1) that relies on a Japanese online service (2) and a Spanish sub-contractor (3) to recruit workers in France (4). Workers are supervised by an Italian company (5), and sign up to a microtask platform managed by the lead firm in the US (6). Source: Authors’ elaboration.

These issues become more severe when micro-tasks are subcontracted to countries where labor costs are low. Globalization enables international platforms to allocate tasks for European and North American clients to workers in Southeast Asia, Africa, and Latin America. This global labor arbitrage goes hand in hand with a global privacy one, as data are channeled to countries where privacy and data protection laws provide uneven levels of protection. Thus, we conclude that any solution must be dual – protecting workers to protect users.

The chapter is available in open access here.

Unboxing AI conference

I’m excited to be part of the organizing team for an upcoming conference entitled “Unboxing AI” and aiming to open – at least to an extent – the black box. What are the material conditions of AI production? Who are the multitudes of precarious workers who contribute to it in the shadow, by generating data and checking algorithmic outputs? What are the geographical areas and the social scope of the work that produces today’s intelligent technologies? These are some of the questions we aim to explore.

The first two days of the conference (November 5 and 6, 3pm – 7pm CET) will bring together highly regarded international specialists from a wide variety of disciplines (sociology, law, economics, but also the arts and humanities…). On the third day (November 7, also 3 pm – 7 pm CET), there will be a doctoral colloquium with a selection of very promising work by young researchers.

The conference was initially planned to take place in Milan in March 2020, and had to be postponed due to the Covid-19 pandemic. As the health situation is still critical, we have opted for an online-only version. At least, this format is cheap – no need to travel to attend – and we can welcome a more geographically diverse range of participants. Indeed the afternoon-only schedule is meant to enable colleagues from North and South America to attend.

Participation is free of charge but prior registration is required. You will find the programme as well as registration forms here
(please note that there is a separate form for each of the three dates of the conference).

The conference is organized as part of the initiatives of our ‘International Network on Digital Labor‘ and is co-sponsored by ISRF (Independent Social Research Foundation), the Nexa Center for Internet and Society, and Fondazione Feltrinelli.

Crowdworking Symposium 2020

With Antonio A. Casilli, I will be presenting a paper tomorrow at the Crowdworking Symposium organized by the University of Paderborn, Germany. Unfortunately, we will participate only online because of the health situation.

Our mini-paper (3 pages), entitled ‘Portraits of micro-workers: The real people behind AI in France’, is available here.