Data & methods – Data Big and Small

Doctoral thesis: Dynamics of collective elaboration of (in)appropriate information in social networks

We’re hiring!

As part of a large, interdisciplinary European research project, we are seeking a motivated, open-minded student to join CNRS (specifically, the Centre for Research in Economics and Statistics, CREST) in Palaiseau, France, for three years.

The thesis aims to model the production and dissemination of ‘fake news’ in situations of uncertainty and socio-economic inequality. A rich sociological literature suggests that actors contextualise messages received and emitted as questions or answers, interpret them according to their recipients and senders, and assess their social acceptability within their own networks of relationships, taking into account their relative position. Building on this research, the goal is to identify the social processes underpinning misinformation-generating digital communications: collective identity, inequalities of status or authority, hierarchy of shared norms. This will enable interpreting the online social interactions through which actors collectively judge the (appropriate or inappropriate) quality of a message or information and then decide whether to relay or share it – and with whom. In particular, the thesis work will contribute to: 1/ drawing up a state of the art, mainly within sociology but open to the neighbouring disciplines which have also addressed these questions; 2/ illustrating and testing these theories through an empirical analysis of a digital database, mainly with quantitative methods, which may be enriched through a small complementary qualitative fieldwork; 3/ to contribute to the preparation of guidelines that help information professionals and policy-makers to detect the sources and modalities of emergence and propagation of misinformation.

The thesis will be done within the framework of the interdisciplinary project “AI-based-technologies for trustworthy solutions against disinformation” (AI4TRUST), funded by the European Union over the period 2023-2026, involving 17 partners (research institutions and media professionals) in 10 countries, and coordinated by Fondazione Bruno Kessler (Italy).

The AI4TRUST project aims to build a hybrid system, with advanced artificial intelligence solutions capable of cooperating with humans in the fight against disinformation. The new algorithms that will be developed in this framework, constantly checked and improved by human fact-checkers, will monitor multiple online social platforms in nearly real time, analysing text, audio, and visual contents in several languages. The resulting quantitative indicators, including infodemic risk, will be inspected under the lens of social and computational social sciences, to build the trustworthy elements required by media professionals.

CNRS contributes to the study of the sociological dimension of these issues, and participates in the project through its laboratories Centre Marc Bloch (CMB, Berlin), Centre de Sociologie des Organisations (CSO, Paris) and Centre de Recherche en Economie et Statistique (CREST, Palaiseau). In practice, the thesis will be carried out at CREST, and co-directed by representatives of the three laboratories involved in this AI4TRUST: myself, Emmanuel Lazega (CSO) and Camille Roth (CMB).

The successful candidate will have the opportunity to join a group of highly motivated scientists and practitioners from across the continent; to participate in collaborations with other teams working on the project in an interdisciplinary framework; to attend regular meetings with the project’s principal Investigator, the scientists and experts involved, and public decision-makers; to present and publish research results in international conferences and journals.

The ideal candidate has a good background in quantitative sociology or in a STEM discipline (e.g., mathematics, statistics, computer science) with a strong interest in societal issues and challenges. A very good knowledge of English, an interdisciplinary approach and the ability to work in teams are essential.

Candidates should apply on the CNRS portal, where they will also find more details.

The visualization of personal networks

I am pleased to co-organize with Vincent Lorant of UCLouvain a special session on “The visualization of personal networks” at the forthcoming INSNA Sunbelt conference (12-16 July 2022, Cairns, Australia, and online).

Personal network data collection methods allow describing the composition and the structure of an individual’s (hereafter ego) social network. This method has been implemented in different domains such as migration, drug use, mental health, aging, education, and social welfare. Over the last years, these data have also been used to provide respondents with visualizations of their personal network, using different algorithms and customizing results through computer assisted data collection. Visualization gives valuable feedback to the respondent, improves data validity and may trigger positive behavioural changes, notably in vulnerable individuals or groups. Yet, visualization is not a free lunch. Recent research has evidenced the ethical dilemmas of providing such feedback to individuals: ego’s social life is being exposed, the researcher may be exposed as well, and such feedback may imply some contractual exchanges or therapeutic implications that require attention.

This session aims to describe the stakes of different visualization approaches to personal networks with different populations. We welcome qualitative and quantitative papers addressing issues related to the implementation of visualization or reports of personal networks in terms of techniques, levels of respondent’s satisfaction with visualization, conditions under which visualization is recommended or discouraged, and effects of the personal network visualization for the respondent.

More information on the conference and the submission process is available here.

Counting online workers

I have just discovered this very interesting new paper by Otto Kässi, Vili Lehdonvirta and Fabian Stephany. Their data-driven count of online workers is not without reminding of this research published last year, which I did with Clément Le Ludec and Antonio A. Casilli.

There are differences of course: theirs is a large multi-country study while we focused on one national setting (France). Also: Kässi et al. consider online labour in general, while we looked specifically at micro-work.

Nevertheless, there are striking similarities. Both studies included larger as well as smaller and more peripheral platforms, often left aside in previous research. Both started from the numbers of registered users declared by the platforms in scope, although this is likely an upper bound. Indeed registering may not mean using, and for example researchers (like ourselves) and journalists would register only to observe, especially when registration is open and easy.

Also, both studies used web traffic analysis data but for different purposes. We used them as an estimate of minimally active users – those who connect at least monthly, as per the definition given by the providers of these data. For the platforms we observed, these numbers tend to be lower than registrations.

Instead, Kässi et al. have used these data to assess registration numbers for the platforms that do not report them. My first reaction would be to think their estimates are likely a lower bound. But presumably their use of a mix of sources, and the seriousness and caution with which they have conducted their estimate, provide enough correction.

Finally, both studies attempted to correct estimates downward by taking into account multi-homing – the tendency of users to rely on multiple platforms. The coefficient of Kässi et al. is 1.83, ours was 1.27. The gap is due to the fact that we focused only on micro-work: if we had counted participation across all types of online labour platforms, our coefficient would be just below 2 – not far from theirs! Kässi et al. also correct for the possibility of multiple workers using a single account, which we did not observe in our French sample. One might imagine other corrections depending on observed usages. For example my ongoing Latin American study of micro-workers suggests that there are unofficial sales and purchases of highly rated platform accounts, more likely to access better-paying tasks – again, something we did not observe in France. Kässi et al. rightly note that all these corrections come from ad hoc surveys and should be interpreted with caution.

Overall, I would say that both studies point to the need to put in place new and creative methods to account for these new forms of labour that traditional statistical studies fail to capture well. The price to pay, as both studies stress, is a high degree of uncertainty. I also dare suggest that both are mixed-method studies: while the design is essentially quantitative, input from smaller and even qualitative research is crucial – for example to get insight into multi-homing and multi-working.

Before concluding, let us recall the key results. Kässi et al. reckon that there are 163 million freelancer profiles registered on online labour platforms globally, of whom approximately 19 million have worked at least once, and 5 million work more intensely. We estimated that approximately 260,000 French residents are registered with micro-work platforms, of whom some 50,000 are ‘regular’ workers who do micro-tasks at least monthly, and a more restrictive measure of ‘very active’ workers would decrease this figure to 15,000.

Are these numbers large or small? Curiously, our French study attracted both criticisms: some worried that we might be overstating the importance of micro-work, others wondered why we bothered for such a tiny part of national GDP. It is not easy to answer this question, as the answer depends on the perspective taken and the goals – the same numbers would mean different things to policymakers and researchers, for example. Nevertheless, I think that the point that is important to all, is to say that this population exists and needs attention – despite its limited visibility and the fuzzy boundaries that make it so difficult to assess its size.

How many ‘micro-workers’?

Finally published! Counting `micro-workers’: Societal and methodological challenges around new forms of labour is a paper that I co-authored with Clément Le Ludec and Antonio A. Casilli, and that hs just been published in a special issue of the journal Work Organisation, Labour & Globalisation.

What is it about? ‘Micro-work’ consists of fragmented data tasks that myriad providers execute on online platforms. While crucial to the development of data-based technologies, this little visible and geographically spread activity is particularly difficult to measure. To fill this gap, we combined qualitative and quantitative methods (online surveys, in-depth interviews, capture-recapture techniques, and web traffic analytics) to count micro-workers in a single country, France. On the basis of this analysis, we estimate that approximately 260,000 people are registered with micro-work platforms. Of these some 50,000 are ‘regular’ workers who do micro-tasks at least monthly, and we speculate that using a more restrictive measure of ‘very active’ workers decreases this figure to 15,000. This analysis is important to better understand platform labour and the labour in the digital economy that lies behind artificial intelligence.

Le moment big data des sciences sociales: quel accès aux données du web et des médias sociaux ?

Table ronde, Sciences Po Paris, 6 décembre 2018, 18h00

Pour que la recherche en sciences sociales puisse pleinement tirer profit des grandes bases de données numériques, un verrou reste à lever : l’accès à ces données est limité, inégalement distribué, et entouré d’un flou juridique et déontologique. Nous proposons d’en discuter à l’occasion de la parution du numéro spécial de la Revue Française de Sociologie sur “Big data, sociétés et sciences sociales” (n. 59/3). Cette table ronde réunit les chercheur.e.s avec d’autres parties prenantes publiques et
privées.

Avec :

Garance Lefèvre, Policy senior associate, Uber
Roxane Silberman, Conseillère scientifique, Centre d’Accès Sécurisé aux Données (CASD)
Sophie Vulliet-Tavernier, Directrice des relations avec les publics et la recherche, Commission Nationale de l’Informatique et des Libertés (CNIL)
Les auteurs du numéro spécial.

Modérateurs : Gilles Bastin (Univ. Grenoble Alpes) et Paola Tubaro (CNRS), coordinateurs du numéro spécial.

Entrée libre et gratuite, dans la limite des places disponibles: pour s’inscrire, cliquez ici.

Accès : Sciences Po, salle Goguel. Entrée par le 27 rue Saint-Guillaume, 75007 Paris (traverser le jardin et prendre l’ascenseur jusqu’au dernier étage). La table ronde est organisée par la Revue Française de Sociologie en collaboration avec les Presses de Sciences Po. Elle sera suivie d’un pot.

Big data, societies and social sciences

Just published: Big data, societies and social sciences, a special issue of Revue Française de Sociologie, guest-edited by Gilles Bastin and myself.

Read a pre-print of our Introduction here.

English versions will be available soon.

Call for papers: “Recent Ethical Challenges in Social Network Analysis”

Submissions are now invited for a special section of the journal Social Networks on “Recent Ethical Challenges in Social Network Analysis” (guest-edited by myself with Antonio A. Casilli, Alessio D’Angelo, and Louise Ryan).

Research on social networks raises formidable ethical issues that often fall outside existing regulations and guidelines. State-of-the-art tools to collect, handle, and store personal data expose both researchers and participants to new risks. Political, military and corporate interests interfere with scientific priorities and practices, while legal and social ramifications of studies of personal ties and human networks come to the surface.

The proposed special section aims to critically engage with ethics in research related to social networks, specifically addressing the challenges that recent technological, scientific, legal and political transformations trigger.

Following a successful workshop on this topic that was held in December 2017 in Paris, we welcome submissions that critically engage with ethics in research related to social networks, possibly based on reflective accounts of first-hand experiences or case studies, taken as concrete illustrations of the general principles at stake, the attitudes and behaviors of stakeholders, or the legal and institutional constraints. We are particularly interested in novel, original answers to some unprecedented ethical challenges, or the need to reinterpret norms in ambiguous situations.

The full Call for Papers is available here.

Programme now online, “More than complex: large and rich network structures”

As part of the upcoming NetSci2018 conference in Paris, I co-organize a satellite event that aims to foster interdisciplinary reflection on how methods from social science can be upscaled to large network structures and on how methods from complex systems can be downscaled to deal with small heterogeneous structures.

The idea is to reconcile two traditions of research that have remained separate so far. Sociology typically handles small but rich networks where a wealth of network attributes results from the complexity of the data collection design. Differences across nodes and edges enable to capture the social processes underlying network structures and their dynamics. Instead, the complex systems tradition handles large but poorly-specified networks. Assuming statistical equivalence of graph entities, a mean field treatment suffices to describe the aggregate properties of the network. Today’s network data-sets contain an unprecedented quantity of relational information within and between all possible levels: individuals, social groups, organizations, and macro entities. Such large and rich network structures expose the implicit limitations of the two above-mentioned approaches: classical sociological methods cannot be upscaled because of their heavy algorithms, and those from complex systems lose track of the multi-faceted nature of social actors, their relationships and their processes.

Our satellite event aims to move forwards, inviting an inter-disciplinary reflection and exploring ways in which these limitations can be overcome.

The program of the satellite is now online.

I am most pleased to co-organize this satellite event with Floriana Gargiulo, Sylvie Huet, and Emmanuel Lazega.

We are honored to count, among our invited speakers, Camille Roth, Matthieu Latapy, Fariba Karimi, and Noshir Contractor.