Human listeners and virtual assistants: privacy and labor arbitrage in the production of smart technologies

I’m glad to announce the publication of new research, as a chapter in the fabulous Digital Work in the Planetary Market, a volume edited by Mark Graham and Fabian Ferrari and published in open access by MIT Press.

The chapter, co-authored with Antonio A. Casilli, starts by recalling how In spring 2019, public outcry followed media revelations that major producers of voice assistants recruit human operators to transcribe and label users’ conversations. These high-profile cases uncovered the paradoxically labor-intensive nature of automation, ultimate cause of the highly criticized privacy violations.

The development of smart solutions requires large amounts of human work. Sub-contracted on demand through digital platforms and usually paid by piecework, myriad online “micro-workers” annotate, tag, and sort the data used to prepare and calibrate algorithms. These humans are also needed to check outputs – such as automated transcriptions of users’ conversations with their virtual assistant – and to make corrections if needed, sometimes in real time. The data that they process include personal information, of which voice is an example.

We show that the platform system exposes both consumers and micro-workers to high risks. Because producers of smart devices conceal the role of humans behind automation, users underestimate the degree to which their privacy is challenged. As a result, they might unwittingly let their virtual assistant capture children’s voices, friends’ names and addresses, or details of their intimate life. Conversely, the micro-workers who hear or transcribe this information face the moral challenge of taking the role of intruders, and bear the burden of maintaining confidentiality. Through outsourcing, platforms often leave them without sufficient safeguards and guidelines, and may even shift onto them the responsibility to protect the personal data they happen to handle.

Besides, micro-workers themselves release their personal data to platforms. The tasks they do include, for example, recording utterances for the needs of virtual assistants that need large sets of, say, ways to ask about the weather to “learn” to recognize such requests. Workers’ voices, identities and profiles are personal data that clients and platforms collect, store and re-use. With many actors in the loop, privacy safeguards are looser and transparency is harder to ensure. Lack of visibility, not to mention of collective organization, prevents workers from taking action.

Note: Description of one labor-intensive data supply chain. A producer of smart speakers located in the US outsources AI verification to a Chinese platform (1) that relies on a Japanese online service (2) and a Spanish sub-contractor (3) to recruit workers in France (4). Workers are supervised by an Italian company (5), and sign up to a microtask platform managed by the lead firm in the US (6). Source: Authors’ elaboration.

These issues become more severe when micro-tasks are subcontracted to countries where labor costs are low. Globalization enables international platforms to allocate tasks for European and North American clients to workers in Southeast Asia, Africa, and Latin America. This global labor arbitrage goes hand in hand with a global privacy one, as data are channeled to countries where privacy and data protection laws provide uneven levels of protection. Thus, we conclude that any solution must be dual – protecting workers to protect users.

The chapter is available in open access here.

Hidden inequalities: the gendered labour of women on micro-tasking platforms

Around the world, myriad workers perform data tasks on online labour platforms to fuel the digital economy. Mostly short, repetitive and little paid, these so-called ‘micro-tasks’ include for example labelling objects in images, classifying tweets, recording utterances, and transcribing audio files – notably to satisfy the data appetite of today’s fast-growing artificial intelligence industry. While casualization of labour and low pay have attracted sharp criticisms against these platforms, they appear gender-blind and accessible even to people with basic skills. Women with care or household duties may particularly benefit from the time flexibility and the possibility to work from home that platforms offer. So, are these new labour arrangements gender equalizers after all?

In a new paper co-authored with Marion Coville, Clément Le Ludec and Antonio A. Casilli, we demonstrate that this new form of online labour fails to fill gender gaps, and may even exacerbate them. We proceed in three steps. First, we show that legacy inequalities in the professional and domestic spheres turn platform-mediated micro-tasking into a ‘third shift’ that adds to already heavy schedules. Both working fathers and working mothers experience it, but the structure of the other two shifts affects their experience. Looking at their time use, it turns out that men dedicate long and uninterrupted slots of time to each activity: their main job, their share of household duties, leisure and micro-work. They tend to do all micro-tasks in a row, usually at night after work or in the morning before starting. Instead, women have more fragmented schedules, and micro-work during short breaks, here and there, eating into their leisure time. This is one reason why they earn less on platforms: they have short slots of time available, so they cannot search for better-paid tasks, and just content themselves with whatever is available at that moment.

Time use of typical female (left) and male (right), micro-workers, both of whom have a main job in addition to platform micro-tasks, and dependent children.

Second, we submit that the human capital of male and female data workers differ, with women less likely to have received training in science and technology fields.

Highest educational qualification (left) and discipline of specialization (right) of men and women micro-workers. Data collected in France, 2018 (n = 908).

Third, their social capital differs: using a position generator instrument to capture workers’ access to the informational and support resources that may come from contacts with people in different occupations, we show that women have fewer ties to digital-related professionals who could provide them with knowledge and advice to successfully navigate the platform world.

Gender assortativity index for each occupation in the 48-item position generator that measures respondents’ social capital. Each panel represents respondents’ choices, ordered from lowest (negative) to highest (positive) degree of similarity. Top panel: female respondents, bottom panel: male respondents. The bars corresponding to digital and computing occupations are hatched.

Taken together, these factors leave women with fewer career prospects within a tech-driven workforce, and reproduce relegation of women to lower-level computing work as observed in the history of twentieth-century technology. 

The full paper is available in open access here.

It is part of a full special issue of Internet Policy Review on ‘The gender of the platform economy‘, guest-edited by M. Fuster Morell, R. Espelt and D. Megias.

The visualization of personal networks

I am pleased to co-organize with Vincent Lorant of UCLouvain a special session on “The visualization of personal networks” at the forthcoming INSNA Sunbelt conference (12-16 July 2022, Cairns, Australia, and online).

Personal network data collection methods allow describing the composition and the structure of an individual’s (hereafter ego) social network. This method has been implemented in different domains such as migration, drug use, mental health, aging, education, and social welfare. Over the last years, these data have also been used to provide respondents with visualizations of their personal network, using different algorithms and customizing results through computer assisted data collection. Visualization gives valuable feedback to the respondent, improves data validity and may trigger positive behavioural changes, notably in vulnerable individuals or groups. Yet, visualization is not a free lunch. Recent research has evidenced the ethical dilemmas of providing such feedback to individuals: ego’s social life is being exposed, the researcher may be exposed as well, and such feedback may imply some contractual exchanges or therapeutic implications that require attention.

This session aims to describe the stakes of different visualization approaches to personal networks with different populations. We welcome qualitative and quantitative papers addressing issues related to the implementation of visualization or reports of personal networks in terms of techniques, levels of respondent’s satisfaction with visualization, conditions under which visualization is recommended or discouraged, and effects of the personal network visualization for the respondent.

More information on the conference and the submission process is available here.

Networks in the digital organization

This week, I was pleased and honoured to give a keynote speech at wonderful EUSN2021 (European Social Networks 2021) conference. The event was originally planned in beautiful Naples, but was unfortunately moved online because of pandemic-induced uncertainties.

In my talk, I endeavoured to reconcile the tradition of research on social and organizational network analysis – in which I have been trained, and which constitutes the specialism of most participants to EUSN conferences – with the nascent literature on digital platform labour. Indeed, organizational network studies have shaped my (and many other colleagues’) understanding of how social ties and structures drive collective action and shape its outcomes. However, contemporary computing technologies breed novel sociabilities and organizational modes that disrupt established practices and knowledge. In particular, the emergence of digital platforms as market intermediaries constitutes a puzzle for network researchers. These emerging organizational structures loosen individual-organization links, fragment production processes, individualize sub-contracting, extend competition beyond the local level, and threaten jobs with AI-fuelled automation. My question then is: in these environments where isolation dominates and collaboration fades, how do social networks operate, if at all? And how can we, as researchers, apprehend them?

In my talk, I discussed how digital platforms, and the transformations of work processes they trigger, challenge some of the key tenets of organizational network analysis. Yet there is still much to learn from this tradition, and the limited overlaps with the nascent literature on platforms reveal facets that neither of them, alone, could capture. This analysis also confirms that overall, technology-enabled platform intermediation restrains sociability and limits interactions, but specific cases where networking has been possible highlight the fundamental advantages it brings to workers.

On this basis, I outlined directions for future research and policy action.

Many thanks to the organizers who still did a wonderful job despite the online-only mode, and to all attendees for inspiring questions and feedback.

Counting online workers

I have just discovered this very interesting new paper by Otto Kässi, Vili Lehdonvirta and Fabian Stephany. Their data-driven count of online workers is not without reminding of this research published last year, which I did with Clément Le Ludec and Antonio A. Casilli.

There are differences of course: theirs is a large multi-country study while we focused on one national setting (France). Also: Kässi et al. consider online labour in general, while we looked specifically at micro-work.

Nevertheless, there are striking similarities. Both studies included larger as well as smaller and more peripheral platforms, often left aside in previous research. Both started from the numbers of registered users declared by the platforms in scope, although this is likely an upper bound. Indeed registering may not mean using, and for example researchers (like ourselves) and journalists would register only to observe, especially when registration is open and easy.

Also, both studies used web traffic analysis data but for different purposes. We used them as an estimate of minimally active users – those who connect at least monthly, as per the definition given by the providers of these data. For the platforms we observed, these numbers tend to be lower than registrations.

Instead, Kässi et al. have used these data to assess registration numbers for the platforms that do not report them. My first reaction would be to think their estimates are likely a lower bound. But presumably their use of a mix of sources, and the seriousness and caution with which they have conducted their estimate, provide enough correction.

Finally, both studies attempted to correct estimates downward by taking into account multi-homing – the tendency of users to rely on multiple platforms. The coefficient of Kässi et al. is 1.83, ours was 1.27. The gap is due to the fact that we focused only on micro-work: if we had counted participation across all types of online labour platforms, our coefficient would be just below 2 – not far from theirs! Kässi et al. also correct for the possibility of multiple workers using a single account, which we did not observe in our French sample. One might imagine other corrections depending on observed usages. For example my ongoing Latin American study of micro-workers suggests that there are unofficial sales and purchases of highly rated platform accounts, more likely to access better-paying tasks – again, something we did not observe in France. Kässi et al. rightly note that all these corrections come from ad hoc surveys and should be interpreted with caution.

Overall, I would say that both studies point to the need to put in place new and creative methods to account for these new forms of labour that traditional statistical studies fail to capture well. The price to pay, as both studies stress, is a high degree of uncertainty. I also dare suggest that both are mixed-method studies: while the design is essentially quantitative, input from smaller and even qualitative research is crucial – for example to get insight into multi-homing and multi-working.

Before concluding, let us recall the key results. Kässi et al. reckon that there are 163 million freelancer profiles registered on online labour platforms globally, of whom approximately 19 million have worked at least once, and 5 million work more intensely. We estimated that approximately 260,000 French residents are registered with micro-work platforms, of whom some 50,000 are ‘regular’ workers who do micro-tasks at least monthly, and a more restrictive measure of ‘very active’ workers would decrease this figure to 15,000.

Are these numbers large or small? Curiously, our French study attracted both criticisms: some worried that we might be overstating the importance of micro-work, others wondered why we bothered for such a tiny part of national GDP. It is not easy to answer this question, as the answer depends on the perspective taken and the goals – the same numbers would mean different things to policymakers and researchers, for example. Nevertheless, I think that the point that is important to all, is to say that this population exists and needs attention – despite its limited visibility and the fuzzy boundaries that make it so difficult to assess its size.

Big data and the hypothesis of the end of privacy

In the late 2000s, voices suggesting that our societies might be nearing the ‘end of privacy’ became increasingly deafening. Our cultural, political and regulatory environment was on the verge of major transformation – so went the narrative. Businesses rejoiced as notoriously, less privacy and more information oils the economy.

In a video interview with Italian media Idee Sottosopra, I review the courses of action taken by various stakeholders, in particular Internet companies, and examine their conflicts and controversies. I show how the very concept of privacy, inherited from a long legal and judicial tradition, should be revised and redefined to appropriately describe today’s online interactions.

Overall, there is no deterministic and inevitable tendency to exclude privacy from our societies, but rather a tension between social forces for and against privacy, which has accompanied the advent of the digital economy and especially social media. The positions of stakeholders, especially users, are often ambiguous, and social media companies attempted to leverage this ambiguity to their own advantage.

Yet civil society reactions have been stronger and stronger, and after initial David-vs-Goliath attempts of individuals and small associations, more and more authoritative institutions have taken seriously the defence of privacy. We are no longer left to costly and little-visible individual choices, and especially after entry into force of GDPR in Europe, we have now an unprecedented opportunity to act at a more systemic level.

Big Data. L’ipotesi della fine della della Privacy | Società Digitale | Idee Sottosopra

Embeddedness in digital platform labour

Starting from Granovetter’s seminal 1985 article, the concept of embeddedness has given new life to economic sociology. With it, it has finally been possible to operationalize the idea that factors other than individual, under-socialized choices drive the economy. In addition to people’s own interests and motivations, the social environments of which they are part contribute to shaping their action. With this idea, economic sociology could claim legitimacy as a valid approach to study the market and the firm – beyond the exclusive pretensions of much economics.

The idea of embeddedness and its operationalization were not without their critics, though. After all, one may say that economic sociology has performed better in its analysis of the firm, than of the market. The very meaning of the embeddedness concept has been stretched a lot over time – also getting back, on occasion, to the quite different nuances that Polanyi attached to it back in the 1940s.

In a just-published article, I go back to this concept and challenge it against digital platforms – recently emerged economic coordination devices that, in the view of many, defy the traditional firm/market boundaries. This helps uncover a new idea: extends the economic-sociological concept of embeddedness to encompass not only social networks of, for example, friendship or kinship ties, but also economic networks of ownership and control relationships.

Applying these ideas to the case of digital platform labour pinpoints two possible scenarios. When platforms take the role of market intermediaries, economic ties are thin and workers are left to their own devices, in a form of ‘disembeddedness’. In this sense, I confirm the results obtained by a group of Oxford scholars in a similar setting. But when platforms partake in intricate inter-firm outsourcing structures, economic ties envelop workers in a ‘deep embeddedness’ which involves both stronger constraints and higher rewards.

I show that with this added dimension, the notion of embeddedness becomes a compelling tool to describe the social structures that frame economic action, including the power imbalances that characterize digital labour in the global economy. Granovetter’s original idea can still provide a lot of insight to help us understand the transformations of today’s economy.

The article is available here.

A preliminary version of this article was presented in a seminar in September 2020.

Internship offer (3 months, master’s level, spring 2021)

The research project TRIA (from its French title “Le TRavail de l’IA: éthique et gouvernance de l’automation”) is a study of the production systems of artificial intelligence. We investigate “micro-work” platforms, which allocate small standardized tasks to crowds of providers, and use the outputs of their work to prepare and annotate data for machine learning algorithms. We study the ramifications of this phenomenon in Spanish-speaking countries, which have remained under-researched so far despite their strong participation. With data from an empirical survey already started in 2020, and to be analyzed through mixed methods (including advanced NLP techniques), we will address important issues related to digital platform governance, online work ethics, and consequences (e.g. in terms of bias) of the use of these humans in the production of artificial intelligence.
Funded by the French National Center of Scientific Research (CNRS), the TRIA project resembles research teams in the Paris and Rennes regions in France, as well as partners in Spain (Barcelona and Valencia) and Canada (Toronto).

We are currently looking for a student intern to help us set up a survey targeting micro-workers in Spain and Spanish-speaking Latin American countries.
He/she will help us to :

  • update an inventory of micro-work platforms operating in Spanish-speaking countries, a first version of which was created in 2020;
  • launch a replication of the online questionnaire, already fielded on Microworkers.com, on another micro-work platform;
  • to liaise and ensure communication between the project teams.

The applicant should :

  • be enrolled in the first or second year of a master’s degree in social science (like sociology, political science, management or economics) ;
  • have skills in the design and/or execution of questionnaire surveys;
  • have some prior knowledge of, or at least interest in, the transformations of work and/or the societal effects of digital technology;
  • be able to work independently, with advanced relational skills;
  • have a fairly good command of French or English, and at least a basic knowledge of Spanish.

More information is available in the enclosed job description.

Unboxing AI conference

I’m excited to be part of the organizing team for an upcoming conference entitled “Unboxing AI” and aiming to open – at least to an extent – the black box. What are the material conditions of AI production? Who are the multitudes of precarious workers who contribute to it in the shadow, by generating data and checking algorithmic outputs? What are the geographical areas and the social scope of the work that produces today’s intelligent technologies? These are some of the questions we aim to explore.

The first two days of the conference (November 5 and 6, 3pm – 7pm CET) will bring together highly regarded international specialists from a wide variety of disciplines (sociology, law, economics, but also the arts and humanities…). On the third day (November 7, also 3 pm – 7 pm CET), there will be a doctoral colloquium with a selection of very promising work by young researchers.

The conference was initially planned to take place in Milan in March 2020, and had to be postponed due to the Covid-19 pandemic. As the health situation is still critical, we have opted for an online-only version. At least, this format is cheap – no need to travel to attend – and we can welcome a more geographically diverse range of participants. Indeed the afternoon-only schedule is meant to enable colleagues from North and South America to attend.

Participation is free of charge but prior registration is required. You will find the programme as well as registration forms here
(please note that there is a separate form for each of the three dates of the conference).

The conference is organized as part of the initiatives of our ‘International Network on Digital Labor‘ and is co-sponsored by ISRF (Independent Social Research Foundation), the Nexa Center for Internet and Society, and Fondazione Feltrinelli.

Covid-19 and transfer of risk on digital platform workers

At an internal meeting of the IDHES lab in Gif-sur-Yvette, and then at an event at the University of Bologna, I have had the pleasure of presenting recent research on how the current health crisis reveals a new dimension of digital platforms – their tendency, wherever possible, to shift risk from clients to workers, within its ecosystem. The paper, co-authored with Antonio A. Casilli, is now under submission for a journal.

Here is an abstract:

As the recessionary effects of the 2020 Covid-19 pandemic become
manifest, the paper discusses their effects on digital platforms and the
workers in their eco-systems. Against the possibility that platform
labor may be a buffer against crisis-induced layoffs, our analysis of
the risks associated to it suggests that it may eventually increase
precarity, without necessarily mitigating health risks for workers. Our
argument is based on a comparison of the three main categories of
platform labor – “on-demand labor” (gigs such as delivery and
transportation), “online labor” (tasks performed by freelancers and
micro-workers) and “social media labor” (like content generation
and moderation) – in terms of the health and economic risks involved in
coronavirus times. We show that platform managers have deployed varying
strategies to transfer risk from themselves and their clients onto
workers, exploiting and deepening the existing power imbalance between
them. Success in achieving this has enabled them to secure their bottom
line even at the expense of working conditions. The Covid-19 pandemic
has brought to light how digital platforms apply a management style that
revolves around transferring the burden of risk to their own workforce.