Data in society – Data Big and Small

The socio-contextual basis for disinformation

Within the Horizon-Europe project AI4TRUST, we published a first report presenting the state of the art in the socio-contextual basis for disinformation, relying on a broad review of extant literature, of which the below is a synthesis.

What is disinformation?

Recent literature distinguishes three forms:

‘misinformation’ (inaccurate information unwittingly produced or reproduced)

‘disinformation’ (erroneous, fabricated, or misleading information that is intentionally shared and may cause individual or social harm)

‘malinformation’ (accurate information deliberately misused with malicious or harmful intent).

Two consequences derive from this insight. First, the expression ‘fake news’ is unhelpful: problematic contents are not just news, and are not always false. Second, research efforts limited to identifying incorrect information alone, without capturing intent, may miss some of the key social processes surrounding the emergence and spread of problematic contents.

How does mis/dis/malinformation spread?

Recent literature often describes the characteristics of the process of diffusion of mis/dis/malinformation in terms of ‘cascades’, that is, the iterative propagation of content from one actor to others in a tree-like fashion, sometimes with consideration of temporality and geographical reach. There is evidence that network structures may facilitate or hinder propagation, regardless of the characteristics of individuals: therefore, relationships and interactions constitute an essential object of study to understand how problematic contents spread. Instead, the actual offline impact of online disinformation (for example, the extent to which online campaigns may have inflected electoral outcomes) is disputed. Likewise, evidence on the capacity of mis/dis/malinformation to spread across countries is mixed. A promising perspective to move forwards relies on hybrid approaches mixing network and content analysis (‘socio-semantic networks’).

What incentivizes mis/dis/malinformation?

Mis/dis/malinformation campaigns are not always driven solely by political tensions and may also be the product of economic interest. There may be incentives to produce or share problematic information, insofar as the business model of the internet confers value upon contents that attract attention, regardless of their veracity or quality. A growing, shadow market of paid ‘like’, ‘share’ and ‘follow’ inflates the rankings and reputation scores of web pages and social media profiles, and it may ultimately mislead search engines. Thus, online metrics derived from users’ ratings should be interpreted with caution. Research should also be mindful that high-profile disinformation campaigns are only the tip of the iceberg, low-stake cases being far more frequent and difficult to detect.

Who spreads mis/dis/malinformation?

Spreaders of mis/dis/malinformation may be bots or human users, the former being increasingly controlled by social media companies. Not all humans are equally likely to play this role, though, and the literature highlights ‘super-spreaders’, particularly successful at sharing popular albeit implausible contents, and clusters of spreaders – both detectable in data with social network analysis techniques.

How is mis/dis/malinformation adopted?

Adoption of mis/dis/malinformation should not be taken for granted and depends on cognitive and psychological factors at individual and group levels, as well as on network structures. Actors use ‘appropriateness judgments’ to give meaning to information and elaborate it interactively with their networks. Judgments depend on people’s identification to reference groups, recognition of authorities, and alignment with priority norms. Adoption can thus be hypothesised to increase when judgments are similar and signalled as such in communication networks. Future research could target such signals to help users in their contextualization and interpretation of the phenomena described.

Multiple examples of research in social network analysis can help develop a model of the emergence and development of appropriateness judgements. Homophily and social influence theories help conceptualise the role of inter-individual similarities, the dynamics of diffusion in networks sheds light on temporal patterns, and analyses of heterogeneous networks illuminate our understanding of interactions. Overall, social network analysis combined with content analysis can help research identify indicators of coordinated malicious behaviour, either structural or dynamic.

Research ethics in the age of digital platforms

I am thrilled to announce the (open access) publication of ‘Research ethics in the age of digital platforms‘ in Science and Engineering Ethics, co-authored with José Luis Molina, Antonio A. Casilli & Antonio Santos Ortega.

We examine the implications of the use of digital micro-working platforms for scientific research. Although these platforms offer ways to make a living or to earn extra income, micro-workers lack fundamental labour rights and ‘decent’ working conditions, especially in the Global South. We argue that scientific research currently fails to treat micro-workers in the same way as in-person human participants, producing de facto a double morality: one applied to people with rights acknowledged by states and international bodies (e.g. Helsinki Declaration), the other to ‘guest workers of digital autocracies’ who have almost no rights at all.

Doctoral thesis: Dynamics of collective elaboration of (in)appropriate information in social networks

We’re hiring!

As part of a large, interdisciplinary European research project, we are seeking a motivated, open-minded student to join CNRS (specifically, the Centre for Research in Economics and Statistics, CREST) in Palaiseau, France, for three years.

The thesis aims to model the production and dissemination of ‘fake news’ in situations of uncertainty and socio-economic inequality. A rich sociological literature suggests that actors contextualise messages received and emitted as questions or answers, interpret them according to their recipients and senders, and assess their social acceptability within their own networks of relationships, taking into account their relative position. Building on this research, the goal is to identify the social processes underpinning misinformation-generating digital communications: collective identity, inequalities of status or authority, hierarchy of shared norms. This will enable interpreting the online social interactions through which actors collectively judge the (appropriate or inappropriate) quality of a message or information and then decide whether to relay or share it – and with whom. In particular, the thesis work will contribute to: 1/ drawing up a state of the art, mainly within sociology but open to the neighbouring disciplines which have also addressed these questions; 2/ illustrating and testing these theories through an empirical analysis of a digital database, mainly with quantitative methods, which may be enriched through a small complementary qualitative fieldwork; 3/ to contribute to the preparation of guidelines that help information professionals and policy-makers to detect the sources and modalities of emergence and propagation of misinformation.

The thesis will be done within the framework of the interdisciplinary project “AI-based-technologies for trustworthy solutions against disinformation” (AI4TRUST), funded by the European Union over the period 2023-2026, involving 17 partners (research institutions and media professionals) in 10 countries, and coordinated by Fondazione Bruno Kessler (Italy).

The AI4TRUST project aims to build a hybrid system, with advanced artificial intelligence solutions capable of cooperating with humans in the fight against disinformation. The new algorithms that will be developed in this framework, constantly checked and improved by human fact-checkers, will monitor multiple online social platforms in nearly real time, analysing text, audio, and visual contents in several languages. The resulting quantitative indicators, including infodemic risk, will be inspected under the lens of social and computational social sciences, to build the trustworthy elements required by media professionals.

CNRS contributes to the study of the sociological dimension of these issues, and participates in the project through its laboratories Centre Marc Bloch (CMB, Berlin), Centre de Sociologie des Organisations (CSO, Paris) and Centre de Recherche en Economie et Statistique (CREST, Palaiseau). In practice, the thesis will be carried out at CREST, and co-directed by representatives of the three laboratories involved in this AI4TRUST: myself, Emmanuel Lazega (CSO) and Camille Roth (CMB).

The successful candidate will have the opportunity to join a group of highly motivated scientists and practitioners from across the continent; to participate in collaborations with other teams working on the project in an interdisciplinary framework; to attend regular meetings with the project’s principal Investigator, the scientists and experts involved, and public decision-makers; to present and publish research results in international conferences and journals.

The ideal candidate has a good background in quantitative sociology or in a STEM discipline (e.g., mathematics, statistics, computer science) with a strong interest in societal issues and challenges. A very good knowledge of English, an interdisciplinary approach and the ability to work in teams are essential.

Candidates should apply on the CNRS portal, where they will also find more details.

How much does a face cost?

Three to five dollars: that’s the answer. As simple as that. I am talking about the behind-the-curtain market for personal data that sustains machine learning technologies, specifically for the development of face recognition algorithms. To train their models, tech companies routinely buy selfies as well as pictures or videos of ID documents from little-paid micro-workers, mostly from lower-income countries such as Venezuela and the Philippines.

Josephine Lulamae of Algorithm Watch interviewed me for a comprehensive report on the matter. She shows how, in this globalized market, the rights of workers are hardly respected – both in terms of labour rights and of data protection provisions.

I saw many such cases in my research of the last two years, as I interviewed people in Venezuela who do micro-tasks on international digital platforms for a living. Their country is affected by a terrible economic and political crisis, with skyrocketing inflation, scarcity of even basic goods, and high emigration. Under these conditions, international platforms – that pay little, but in hard currency – have seen a massive inflow of Venezuelans since about 2017-18.

Some of the people I interviewed just could not afford to refuse a task paid five dollars – at a moment in which the monthly minimum wage of Venezuela was plummeting to as little as three dollars. They do tasks that workers in richer countries such as Germany and the USA refuse to do, according to Lulamae’s report. Still, even the Venezuelans did not always feel comfortable doing tasks that involved providing personal data such as photos of themselves. One man told me that before, in better conditions, he would not have done such a task. Another interviewee told me that in an online forum, there were discussions about someone who had accepted to upload some selfies and later found his face in an advertisement on some website, and had to fight hard to get it removed. I had no means to fact-check whether this story was true, but the very fact that it circulated among workers is a clear sign that they worry about these matters.

On these platforms that operate globally, personal data protection does not work very well. This does not mean that clients openly violate the law: for example, workers told me they had to sign consent forms, as prescribed in the European General Data Protection Regulation (GDPR). However, people who live outside of Europe are less familiar with this legislation (and sometimes, with data protection principles more generally), and some of my interviewees did not well understand consent forms. More importantly, they have few means to contact clients, who typically avoid revealing their full identity on micro-working platforms – and therefore, can hardly exert their rights under GDPR (right to access, to rectification, to erasure etc.).

The rights granted by GDPR are comprehensive, but do not include property rights. The European legislator did not create a framework in which personal data to be sold and bought, and rather opted for guaranteeing inalienable rights to each and every citizen. However, this market exists and is flourishing, to the extent that it is serving the development of state-of-the-art technologies. Its existence is problematic, like the ‘repugnant’ markets for, say, human organs or babies for adoption, where moral arguments effectively counter economic interest. It is a market that thrives on global inequalities, and reminds of the high price to pay for today’s technical progress.

See the full report here.

Ciclo de charlas en Chile sobre inteligencia artificial, trabajo y redes sociales

Estoy muy emocionada y feliz de empezar un ciclo de charlas en Chile, principalmente en Santiago y Talca, con Antonio A. Casilli este mes de enero. Agradezco mucho a la Embajada de Francia en Chile, al Instituto Francés de Chile, y a la Fundación Teatro a Mil por esta oportunidad maravillosa. Gracias también a Juana Torres Cierpe y a Francisca Ortiz Ruiz por su ayuda en contactar con colegas, amigos y estudiantes de Chile.

Empezaremos por una charla titulada “Plataformas digitales, trabajo en línea y automatización tras la crisis sanitaria”, que tendrá lugar el día lunes 16 de enero a las 12:00 hrs en la sede de la CUT (1 oriente # 809, Talca). En esta charla presentaremos nuestras investigaciones sobre el fenómeno del micro-trabajo fuertemente precarizado que se desarrolla en las plataformas digitales. Agradezco mucho a la profesora Claudia Jordana Contreras y a la Escuela de Sociología de la Universidad Católica del Maule por la organización de este evento.

El martes 17 enero 2023, 11:00, hablaré de “Inteligencia artificial, transformaciones laborales y desigualdades: El trabajo de las mujeres en las plataformas digitales de ‘microtareas” en el Instituto de Sociología de la Universidad Católica y con el Quantitative and Computational Social Science Research Group. Gracias a Mauricio Bucca que ha organizado este evento. Estaremos en la Pontificia Universidad Católica de Chile, Campus San Joaquín.

El martes 17 por la tarde (a las 17:000 hrs), hablaré de “Ética de la inteligencia artificial y otros desafíos para la investigación sobre redes sociales” como parte de la Escuela de Verano del Centro de Investigación en Complejidad Social, Universidad del Desarrollo. Agradezco a Jorge Fábrega Lacoa y sus colegas para la organización.

El martes 17 a las 10:000 hrs, también habrá una ponencia de Antonio Casilli en el evento Congreso Futuro: “Trabajo global y inteligencia artificial. Los ‘ingredientes humanos’ ed la automatización” (Teatro Oriente, Pedro de Valdivia 099, Providencia).

El viernes 20 de enero 2023, a las 10:00 hrs, Antonio y yo hablaremos juntos de “El trabajo detrás de la inteligencia artificial y la automatización en América Latina” en un taller internacional organizado por la Universidad de Chile – con Pablo Pérez (gracias por la organización!) y Francisca Gutiérrez, sala 129, FASCO, Av. Ignacio Carrera Pinto 1045, Ñuñoa.

Sigue un evento organizado por el Instituto Francés, “La noche de las ideas”:

Viernes 20 enero 2023, 20:00 hrs, Centro cultural La Moneda, Noche de las Ideas, Santiago — Paola Tubaro “Automatización: ¿El fin del humano?” (con con Denis Parra y Javier Ibacache, Plaza de la Ciudadanía 26, Santiago).

Sabado 21 enero 2021, 16:00 hrs, Centro cultural La Moneda, Noche de las Ideas, Santiago — Antonio Casilli “¿Qué esconde la inteligencia artificial?” (con José Ulloa, Constanza Michelson y Paula Escobar, Plaza de la Ciudadanía 26, Santiago).

El miércoles 26 de enero 2023, a las 18:30 hrs en Santiago, habrá la presentación del libro de Antonio Casilli, “Esperando a los robots. Investigación sobre el trabajo del clic” (LOM, 2021) (con Paulo Slachevsky, Librería del Ulises Lastarria, José Victorino Lastarria 70, local 2, Paseo Barrio Lastarria).

Todos los eventos son gratuitos. Para la Noche de las Ideas y el Congreso Futuro, es necesario inscribirse online.

Learners in the loop: hidden human skills in machine intelligence

I am glad to announce the publication of a new article in a special issue of the journal Sociologia del lavoro, dedicated to digital labour.

Today’s artificial intelligence, largely based on data-intensive machine learning algorithms, relies heavily on the digital labour of invisibilized and precarized humans-in-the-loop who perform multiple functions of data preparation, verification of results, and even impersonation when algorithms fail. This form of work contributes to the erosion of the salary institution in multiple ways. One is commodification of labour, with very little shielding from market fluctuations via regulative institutions, exclusion from organizational resources through outsourcing, and transfer of social reproduction costs to local communities to reduce work-related risks. Another is heteromation, the extraction of economic value from low-cost labour in computer-mediated networks, as a new logic of capital accumulation. Heteromation occurs as platforms’ technical infrastructures handle worker management problems as if they were computational problems, thereby concealing the employment nature of the relationship, and ultimately disguising human presence. My just-published paper highlights a third channel through which the salary institution is threatened, namely misrecognition of micro-workers’ skills, competencies and learning. Broadly speaking, salary can be seen as the framework within which the employment relationship is negotiated and resources are allocated, balancing the claims of workers and employers. In general, the most basic claims revolve around skill, and in today’s ‘society of performance’ where value is increasingly extracted from intangible resources and competencies, unskilled workers are substitutable and therefore highly vulnerable. In human-in-the-loop data annotation, tight breakdown of tasks, algorithmic control, and arm’s-length transactions obfuscate the competence of workers and discursively undermine their deservingness, shifting power away from them and voiding the equilibrating role of the salary institution.

Following Honneth, I define misrecognition as the attitudes and practices that result in people not receiving due acknowledgement for their value and contribution to society, in this case in terms of their education, skills, and skill development. Platform organization construes work as having little value, and creates disincentives for micro-workers to engage in more complex tasks, weakening their status and their capacity to be perceived as competent. Misrecognition is endemic in these settings and undermines workers’ potential for self-realization, negotiation and professional development.

My argument is based on original empirical data from a mixed-method survey of human-in-the-loop workers in two previously under-researched settings, namely Spain and Spanish-speaking Latin America.

An openly accessible version of the paper is available from the HAL repository.

Human listeners and virtual assistants: privacy and labor arbitrage in the production of smart technologies

I’m glad to announce the publication of new research, as a chapter in the fabulous Digital Work in the Planetary Market, a volume edited by Mark Graham and Fabian Ferrari and published in open access by MIT Press.

The chapter, co-authored with Antonio A. Casilli, starts by recalling how In spring 2019, public outcry followed media revelations that major producers of voice assistants recruit human operators to transcribe and label users’ conversations. These high-profile cases uncovered the paradoxically labor-intensive nature of automation, ultimate cause of the highly criticized privacy violations.

The development of smart solutions requires large amounts of human work. Sub-contracted on demand through digital platforms and usually paid by piecework, myriad online “micro-workers” annotate, tag, and sort the data used to prepare and calibrate algorithms. These humans are also needed to check outputs – such as automated transcriptions of users’ conversations with their virtual assistant – and to make corrections if needed, sometimes in real time. The data that they process include personal information, of which voice is an example.

We show that the platform system exposes both consumers and micro-workers to high risks. Because producers of smart devices conceal the role of humans behind automation, users underestimate the degree to which their privacy is challenged. As a result, they might unwittingly let their virtual assistant capture children’s voices, friends’ names and addresses, or details of their intimate life. Conversely, the micro-workers who hear or transcribe this information face the moral challenge of taking the role of intruders, and bear the burden of maintaining confidentiality. Through outsourcing, platforms often leave them without sufficient safeguards and guidelines, and may even shift onto them the responsibility to protect the personal data they happen to handle.

Besides, micro-workers themselves release their personal data to platforms. The tasks they do include, for example, recording utterances for the needs of virtual assistants that need large sets of, say, ways to ask about the weather to “learn” to recognize such requests. Workers’ voices, identities and profiles are personal data that clients and platforms collect, store and re-use. With many actors in the loop, privacy safeguards are looser and transparency is harder to ensure. Lack of visibility, not to mention of collective organization, prevents workers from taking action.

Note: Description of one labor-intensive data supply chain. A producer of smart speakers located in the US outsources AI verification to a Chinese platform (1) that relies on a Japanese online service (2) and a Spanish sub-contractor (3) to recruit workers in France (4). Workers are supervised by an Italian company (5), and sign up to a microtask platform managed by the lead firm in the US (6). Source: Authors’ elaboration.

These issues become more severe when micro-tasks are subcontracted to countries where labor costs are low. Globalization enables international platforms to allocate tasks for European and North American clients to workers in Southeast Asia, Africa, and Latin America. This global labor arbitrage goes hand in hand with a global privacy one, as data are channeled to countries where privacy and data protection laws provide uneven levels of protection. Thus, we conclude that any solution must be dual – protecting workers to protect users.

The chapter is available in open access here.

Hidden inequalities: the gendered labour of women on micro-tasking platforms

Around the world, myriad workers perform data tasks on online labour platforms to fuel the digital economy. Mostly short, repetitive and little paid, these so-called ‘micro-tasks’ include for example labelling objects in images, classifying tweets, recording utterances, and transcribing audio files – notably to satisfy the data appetite of today’s fast-growing artificial intelligence industry. While casualization of labour and low pay have attracted sharp criticisms against these platforms, they appear gender-blind and accessible even to people with basic skills. Women with care or household duties may particularly benefit from the time flexibility and the possibility to work from home that platforms offer. So, are these new labour arrangements gender equalizers after all?

In a new paper co-authored with Marion Coville, Clément Le Ludec and Antonio A. Casilli, we demonstrate that this new form of online labour fails to fill gender gaps, and may even exacerbate them. We proceed in three steps. First, we show that legacy inequalities in the professional and domestic spheres turn platform-mediated micro-tasking into a ‘third shift’ that adds to already heavy schedules. Both working fathers and working mothers experience it, but the structure of the other two shifts affects their experience. Looking at their time use, it turns out that men dedicate long and uninterrupted slots of time to each activity: their main job, their share of household duties, leisure and micro-work. They tend to do all micro-tasks in a row, usually at night after work or in the morning before starting. Instead, women have more fragmented schedules, and micro-work during short breaks, here and there, eating into their leisure time. This is one reason why they earn less on platforms: they have short slots of time available, so they cannot search for better-paid tasks, and just content themselves with whatever is available at that moment.

Time use of typical female (left) and male (right), micro-workers, both of whom have a main job in addition to platform micro-tasks, and dependent children.

Second, we submit that the human capital of male and female data workers differ, with women less likely to have received training in science and technology fields.

Highest educational qualification (left) and discipline of specialization (right) of men and women micro-workers. Data collected in France, 2018 (n = 908).

Third, their social capital differs: using a position generator instrument to capture workers’ access to the informational and support resources that may come from contacts with people in different occupations, we show that women have fewer ties to digital-related professionals who could provide them with knowledge and advice to successfully navigate the platform world.

Gender assortativity index for each occupation in the 48-item position generator that measures respondents’ social capital. Each panel represents respondents’ choices, ordered from lowest (negative) to highest (positive) degree of similarity. Top panel: female respondents, bottom panel: male respondents. The bars corresponding to digital and computing occupations are hatched.

Taken together, these factors leave women with fewer career prospects within a tech-driven workforce, and reproduce relegation of women to lower-level computing work as observed in the history of twentieth-century technology.

The full paper is available in open access here.

It is part of a full special issue of Internet Policy Review on ‘The gender of the platform economy‘, guest-edited by M. Fuster Morell, R. Espelt and D. Megias.

Unboxing AI conference

I’m excited to be part of the organizing team for an upcoming conference entitled “Unboxing AI” and aiming to open – at least to an extent – the black box. What are the material conditions of AI production? Who are the multitudes of precarious workers who contribute to it in the shadow, by generating data and checking algorithmic outputs? What are the geographical areas and the social scope of the work that produces today’s intelligent technologies? These are some of the questions we aim to explore.

The first two days of the conference (November 5 and 6, 3pm – 7pm CET) will bring together highly regarded international specialists from a wide variety of disciplines (sociology, law, economics, but also the arts and humanities…). On the third day (November 7, also 3 pm – 7 pm CET), there will be a doctoral colloquium with a selection of very promising work by young researchers.

The conference was initially planned to take place in Milan in March 2020, and had to be postponed due to the Covid-19 pandemic. As the health situation is still critical, we have opted for an online-only version. At least, this format is cheap – no need to travel to attend – and we can welcome a more geographically diverse range of participants. Indeed the afternoon-only schedule is meant to enable colleagues from North and South America to attend.

Participation is free of charge but prior registration is required. You will find the programme as well as registration forms here
(please note that there is a separate form for each of the three dates of the conference).

The conference is organized as part of the initiatives of our ‘International Network on Digital Labor‘ and is co-sponsored by ISRF (Independent Social Research Foundation), the Nexa Center for Internet and Society, and Fondazione Feltrinelli.

First seminar of the year!

Next Thursday, 17 September, I have been invited to give a talk as part of the cycle of seminars organized by the quantitative sociology research group at CREST-ENSAE, Palaiseau (Paris area). Although the health situation is still bleak, I am glad to return to almost-normal functioning by giving an in-person talk. Hopefully there won’t be any new lockdown before that.

I will present an in-progress paper provisionally entitled:

«Disembedded or deeply embedded? A multi-level network analysis of the online platform economy»

The two types of platform labour analyzed in the paper.

In this paper, I extend the economic-sociological concept of embeddedness to encompass not only social networks of, for example, friendship or kinship ties, but also economic networks of ownership and control relationships. Applying these ideas to the case of digital platform labour pinpoints two possible scenarios. When platforms take the role of market intermediaries, economic ties are thin and workers are left to their own devices, in a form of ‘disembeddedness’. When platforms partake in intricate inter-firm outsourcing structures, economic ties envelop workers in a ‘deep embeddedness’ which involves both stronger constraints and higher rewards. With this added dimension, the notion of embeddedness becomes a compelling tool to describe the social structures that frame economic action, including the power imbalances that characterize digital labour in the global economy.