I am now in Montréal, where I participated, last Friday, in a panel on Open Data at “Science & You” international conference. It was interesting for me to reflect on how the picture has changed since my previous panel on the same topic – in Kiev in 2012. Back then, we were busy trying to convince public administrations that data opening was good for transparency and could help improve services to communities. Since then, a lot of attempts have been made in numerous countries – local authorities often pioneering the process, followed only later by central governments (one example cited in my panel was Québec City). What is made open is typically information from public registers (first names of newborns, records of road accidents) and increasingly, from technological devices and sensors (bus traffic information).
There are some conditions to be met for a dataset to be said “open”:
Technically, it needs to be “raw”, detailed, digital and reusable. The French Interior Ministry released results of the first round of the recent presidential elections within a few days, at polling station level. This is sufficiently detailed (with over 69,000 polling stations throughout the country), raw (allowing aggregations, comparisons etc.), and digital/reusable (so much so that the newspaper Le Monde could develop a user-friendly application to let readers easily check results in their neighborhoods). Some would also insist that “open” data should be released in non-proprietary formats (better .csv than .xls, for example).
Legally, the data must come with a license that allows re-use by third parties (typically within the Creative Commons family). Ideally, no type of reuse should be ruled out (including somewhat controversially, commercial / for-profit reuse).
Economically, the data should be available to all for free (or at least with minimal charges if data preparation requires extra work or expenses).
If in the past few years, a lot of thought has been devoted to the “ideal” conditions for data opening and how this would positively affect public service, the data landscape has now significantly changed.
Information oils the economy – as we know since the path-breaking research of George Akerlof, Michael Spence and Joseph Stiglitz in the 1970s – and information can be extracted from data. Today, increased availability of “big” data creates the opportunity to access ever more information – for the good of the economy, then.
But in practice, how do companies extract value from this increasingly available information? In a nutshell, there are three ways in which they can do so: matching, targeted advertising, and market segmentation.
Matching is the key business idea of many recently-created companies and start-ups, and consists in helping potential parties to a transaction to find each other: driver and passenger (Uber), host and guest (Airbnb), buyer and seller (eBay), and so on. It is by processing users’ data with suitable algorithms that matching can be done, and the more detailed are the data, the more satisfactory the matching. Firms’ business model is usually based on taking a fee for each successful transaction (each realized match).
Targeted advertising is the practice of selecting, for each user, only the ads that correspond at best to their tastes or practices. Publicizing diapers to the general population will be largely ineffective as many people do not have young children; but targeting only those with young children is likely to produce better results. Here, the function of data is to help decide what to advertise to whom; useful data are people’s socio-demographic situation (age, marriage, children…), their current or past practices (if you bought diapers last week, you might do that again next week), and any declared tastes (for example as a post on Facebook or Twitter). How this produces a gain is obvious: if targeted adverts are more effective, sales will go up.
I gave today a talk at AUTONOMY, a major festival of urban mobility in Paris, where new technologies are at center stage, from driverless cars to electric scooters, bike-sharing solutions, and connected infrastructure for the smart city. I had been asked to talk about labor in digital platforms, such as those offering mobility services.
Digital platforms are often thought of in terms of automation, but it is clear that there is labor too: we all have in mind the example of the couriers and drivers of the “on-demand” economy. But there’s more: I’ll show how platforms involve the labor of everyone, including passengers and users of all types. By labor, I mean here human activity that produces data and information – the key source of value for platforms. It is often an implicit, invisible activity of which we may not even be aware – as we tend to focus more on consumption aspects as we talk routinely about “car pooling” or “car sharing”, rather than looking at the underlying productive effort. This is what scholars call “digital labor”.
Twitter conversations are one way through which participants in an event engage with the programme, comment and discuss about the talks they attend, prolong questions-and-answers sessions. Twitter feeds have become part of the official communication strategy of major events and serve documentation and information purposes, both for attendees and for outsiders. While tweeting is becoming more an more a prerogative of “official” accounts in charge of event communication, it is also a potential tool in the hands of each participant, allowing anyone to join the conversation at least in principe. Participants may become aware of each other, perhaps using the opportunity of the event to meet face-to-face, start relationships and even collaborations. A Nesta study insisted on the potential for using social media data to attain a quantitative understanding of events and their impacts on participants’ networks.
The OuiShare Fest 2016, a major gathering of the collaborative economy community that took place last week in Paris, was one opportunity to see such mechanisms in place. Tweeting was easy – with an official hashtag, #OSFEST16, although related hashtags were also widely used. I mined a total of 12440 tweets over the four days of the event. Do Twitter conversations related to the Fest bring to light the emergence of a community? While it’s too early for any deep analysis, some descriptive results can already be shown.
First, when did people tweet? Mostly at the beginning of each day’s programme (9am on the first two days, 2pm on the third day). Tweeting was more intense in the first day and declined over time (Figure 1). The comparatively low participation on the fourth day is due to the fact that the format was different – an open day in French (rather than an international conference in English), whereby local people were free to come and go. Online activity is not independent of what happens on the ground – quite on the contrary, it follows the timings of physical activity.
Who tweeted most? Our dataset has a predictable outlier, the official @OuiShareFest Twitter account, who published 727 tweets – twice as many as the second in the ranking. But let’s look at the people who had no obligation to tweet, and still did so: who among them contributed most to documenting the Fest? Figure 2 shows the presence of some other institutional accounts among the top 10, but the most active include a few individual participants. Ironically, one of them was not even physically present at the Fest, and followed the live video streaming from home. In this sense, Twitter served as an interface between event participants and interested people who couldn’t make it to Paris.
What was the proportion of tweets, replies and retweets? Original tweets are interesting for their unique content (what are people talking about?), while replies and retweets are interesting because they reveal social interactions – dialogue, endorsement or criticism between users. Figure 3 shows that the number of replies is small compared to tweets and retweets.
Let’s now look closer at the replies. By taking who replied to whom, we can build a social network of conversations between a group of tweeters. It’s a relatively small network of 311 tweeters (the coloured points in Figure 4), with 321 ties among them (the lines in Figure 4). The size of points depends on the number of their incoming ties, that is, the number of replies received: even if the points haven’t been labelled, I am sure you can tell immediately which one represents the official @OuiShareFest account… the usual suspect! But let’s look at the network structure more closely. Some ties are self-loops, that is, people replying to themselves. (Let’s be clear, it’s not a sign of social isolation, but simply a consequence of the 140-character limit imposed on Twitter: self-replies are meant to deliver longer messages). A lot of other participants are involved in just simple dyads or small chains (A replies to B who replies to C, but then C does not reply to A), unconnected to the rest. There is a larger cluster formed around the most replied-to users: here, some closure becomes apparent (A replies to B who replies to C who replies to A) and enables this sub-network to grow.
Now, my own experience of tweeting at the Fest suggested that tweets were multilingual. Apart from the fourth day, there seemed to be a large number of French-speaking participants. A quick-and-dirty (for now) language detection exercise revealed that roughly 60% of tweets were in English, 25% in French, the rest being split between different languages especially German, Spanish, and Catalan. So, did people reply to each other based on the language of their tweets? It turns out that quite a few tweeters were involved in conversations in multiple languages. Figure 5 is a variant of Figure 4, colouring nodes and ties differently depending on language. A nice mix: interestingly, the central cluster is not monolingual and in fact, is kept together by a few, albeit small, multi-lingual tweeters.
Let’s turn now to mentions: who are the most mentioned tweeters? Again, I’ll take out of the analysis @OuiShareFest, hugely ahead of anyone else with 832 mentions received. Below, Figure 6 ranks the most mentioned: mostly companies (partners or sponsors of the event such as MAIF), speakers (such as Nathan Schneider, Nilofer Merchant), and key OuiShare personalities (such as Antonin Léonard). Mentions follow the programme of the event, and most mentioned are people and organizations that play a role in shaping it.
Mentions are also a basis to build another social network – of who mentions whom in a tweet. This will be a larger network compared to the net of replies, as mentions can be of many types and also include retweets (which as we saw above, are very numerous here). There are 17248 mentions (some of which are repeated more than once) in the network. They involve 796 users who mention others and are mentioned in turn; 550 users who are mentioned, but do not mention themselves; and 1680 users who mention others, but are not themselves mentioned.
A large network such as this is more difficult to visualize meaningfully, and I had to introduce some simplifications to do so. I have included only pairs in which one had mentioned the other at least twice: this makes a network of 778 nodes with 2222 ties. The color of nodes depends on their modularity class (a group of nodes that are more connected with one another, than with any other nodes in the network) and their size depends on the number of mentions received. You will clearly recognize at the center of the network, the official @OuiShareFest account, which structures the bulk of the conversations. But even intuitively, other actors seem central as well, and their role deserves being examined more thoroghly (in some future, less preliminary analysis).
This analysis is part of a larger research project, “Sharing Networks“, led by Antonio A. Casilli and myself, and dedicated to the study of the emergence of communities of values and interest at the OuiShare Fest 2016. Twitter networks will be combined with other data on networking – including informal networking which we are capturing through a (perhaps old-fashioned, but still useful!) survey.
The analyses and visualizations above were done with the packages TwitteR and igraph in R; Figure 7 was produced with Gephi.