Posts Tagged ‘ Twitter data ’

Science XXL: digital data and social science

I attended last week (unfortunately only part of) an interesting workshop on the effects of today’s abundance and diversity of digital data on social science practices, aptly called “Science XXL“. A variety of topics were discussed and different research experiences were shared, but I’ll just summarize here a few lessons learned that I find interesting.

  • Digital data are archive data. Data retrieved automatically from the digital traces of individual actions, such as those mined from the APIs of platforms such as Twitter, are unlike survey data in that they were not originally recorded for research purposes. The researcher must select relevant records on the basis of some understanding of the conditions under which these data were produced. Perhaps ironically, digital data share these characteristic with data from historical or literary archives.
  • Digital data are not necessarily “big”, in the sense that their volume is often small (at least in social science research so far!), even though they may share other characteristics of big data such as velocity (being generated on the fly as people use digital platforms) or variety (being little or not structured).
  • Digital data can help fill gaps in survey data, for example when survey sampling is not statistically representative: detail and volume can provide extra information that supports general conclusions.
  • Non-clean data, outliers and aberrant observations may be very informative, revealing details that would escape attention if researchers focused only on the average or center of the distribution (the normal law cherished in classical statistical approaches). Special cases are no longer a prerogative of qualitative research.
  • Data analysis is a key ingredient of “computational social science” a field that is growing in importance after an initial phase in which it was largely confined to agent-based simulation and complexity theory.

Twitter networks at the OuiShare Fest Barcelona 2016

Twitter conversations are one way through which participants in an event engage with the programme, comment and discuss about the talks they attend, prolong questions-and-answers sessions. Twitter feeds have become part of the official communication strategy of major events and serve documentation and information purposes, both for attendees and for outsiders. While tweeting is becoming more an more a prerogative of “official” accounts in charge of event communication, it is also a potential tool in the hands of each participant, allowing anyone to join the conversation at least in principe. Earlier, I have discussed how the Twitter discussion networks formed at the OuiShare Fest 2016, a major gathering of the collaborative economy community that took place last May in Paris, were one opportunity to see such mechanisms in place.

Here is a similar analysis, performed after the OuiShareFest Barcelona – the Spanish-language version of the event that I had the chance of attending last week. The size of this event is smaller than its Paris counterpart but nonetheless impressive: I mined 3497 tweets with the official hashtag of the event, #OSfestBCN, mostly written during the two days of the event (my count stopped the day after). Do Twitter #OSfestBCN conversations describe the community?

First, when did people tweet? As often happens, there are more tweets on the first than the second day of the event, and there are more tweets during the first hours of each day, though the difference between morning and afternoon is not dramatic; tweeting declines only at night, when the fest’s activities are suspended. Online activity is not independent of what happens on the ground – quite on the contrary, it follows the timings of physical activity.

osfestbcn_tweetsovertime_days12_plum

Who tweeted most? Obviously the official @OuiShare_es account, who published 630 tweets – nine times as many as the second in the ranking. Those who follow immediately are all individuals, who have between 50-70 tweets each.

Who tweeted with whom? What interests me most are conversations – who interacts with whom. The most explicit way of seeing this with Twitter data is to look at replies: who replied to whom. This corresponds to a small social network of 134 tweeters (the coloured points in the next Figure). Ties among them are represented as lines in the figure, and the size of points depends on the number of their incoming ties, that is, the number of replies received. Beyond the official @OuiShare_es account, several tweeters receive a lot of replies:  they are mostly speakers, track leaders, or otherwise important actors in the community.

replies

Now, who tweeted about whom? This is also an important aspect of Twitter conversations. We can capture it with the social network of mentions, associating each tweeter with those they mentioned, and counting the number of times they did so. This will be a larger network (with 2553 mentions) compared to the net of replies, as mentions can be of many types and also include retweets.

The below figure represents the network of mentions. As before, the colored points are tweeters (the larger they, the more often they have been mentioned by others), while lines between them are mentions (the thicker they are, the higher the number of times a user has mentioned another). Colors represent a measure called “modularity”, which identifies clusters of nodes whereby internal connections are stronger than the connections they have with nodes in other clusters; so for example, a purple node is more likely to have mentioned other purple nodes, than blue nodes.

Modularity is computed based only on counts of ties, without considering the nature of their conversations (what the mention is about) ou other qualities of nodes (gender, nationality, language of tweeters, etc.). And yet, it clearly identifies specific sub-communities. The very numerous, central purple nodes are the OuiShare community: connectors, activists, and others close to the organization especially within Spain. The green nodes at the bottom-left are the catalan community, including representatives of local authorities,notably the Barcelona municipality. The blue nodes at the bottom are different actors and groups from other parts of Spain. The few black nodes on the left are the international OuiShare community, and the sparse orange ones at the top are other international actors.

mentions22

This analysis is part of a larger research project, “Sharing Networks“, led by Antonio A. Casilli and myself, and dedicated to the study of the emergence of communities of values and interest at the OuiShare Fest 2016. Twitter networks will be combined with other data on networking – including informal networking which we are capturing through a (perhaps old-fashioned, but still useful!) survey.

The analyses and visualizations above were done with the package TwitteR in R as well as Gephi.

Twitter networks at the OuiShare Fest 2016

Twitter conversations are one way through which participants in an event engage with the programme, comment and discuss about the talks they attend, prolong questions-and-answers sessions. Twitter feeds have become part of the official communication strategy of major events and serve documentation and information purposes, both for attendees and for outsiders. While tweeting is becoming more an more a prerogative of “official” accounts in charge of event communication, it is also a potential tool in the hands of each participant, allowing anyone to join the conversation at least in principe. Participants may become aware of each other, perhaps using the opportunity of the event to meet face-to-face, start relationships and even collaborations. A Nesta study insisted on the potential for using social media data to attain a quantitative understanding of events and their impacts on participants’ networks.

The OuiShare Fest 2016, a major gathering of the collaborative economy community that took place last week in Paris, was one opportunity to see such mechanisms in place. Tweeting was easy – with an official hashtag, #OSFEST16, although related hashtags were also widely used. I mined a total of 12440 tweets over the four days of the event. Do Twitter conversations related to the Fest bring to light the emergence of a community? While it’s too early for any deep analysis, some descriptive results can already be shown.

First, when did people tweet? Mostly at the beginning of each day’s programme (9am on the first two days, 2pm on the third day). Tweeting was more intense in the first day and declined over time (Figure 1). The comparatively low participation on the fourth day is due to the fact that the format was different – an open day in French (rather than an international conference in English), whereby local people were free to come and go. Online activity is not independent of what happens on the ground – quite on the contrary, it follows the timings of physical activity.

Tweets_Over_Time_Blog

Figure 1: Tweets over time.

Who tweeted most? Our dataset has a predictable outlier, the official @OuiShareFest Twitter account, who published 727 tweets – twice as many as the second in the ranking. But let’s look at the people who had no obligation to tweet, and still did so: who among them contributed most to documenting the Fest? Figure 2 shows the presence of some other institutional accounts among the top 10, but the most active include a few individual participants. Ironically, one of them was not even physically present at the Fest, and followed the live video streaming from home. In this sense, Twitter served as an interface between event participants and interested people who couldn’t make it to Paris.

OSFest16_Top10_NoOutliers

Figure 2: Ten most active tweeters (excluding @OuiShareFest).

What was the proportion of tweets, replies and retweets? Original tweets are interesting for their unique content (what are people talking about?), while replies and retweets are interesting because they reveal social interactions – dialogue, endorsement or criticism between users. Figure 3 shows that the number of replies is small compared to tweets and retweets.

Tweets_By_Type_blog

Figure 3: Tweets, replies and retweets

Let’s now look closer at the replies. By taking who replied to whom, we can build a social network of conversations between a group of tweeters. It’s a relatively small network of 311 tweeters (the coloured points in Figure 4), with 321 ties among them (the lines in Figure 4). The size of points depends on the number of their incoming ties, that is, the number of replies received: even if the points haven’t been labelled, I am sure you can tell immediately which one represents the official @OuiShareFest account… the usual suspect! But let’s look at the network structure more closely. Some ties are self-loops, that is, people replying to themselves. (Let’s be clear, it’s not a sign of social isolation, but simply a consequence of the 140-character limit imposed on Twitter: self-replies are meant to deliver longer messages). A lot of other participants are involved in just simple dyads or small chains (A replies to B who replies to C, but then C does not reply to A), unconnected to the rest. There is a larger cluster formed around the most replied-to users: here, some closure becomes apparent (A replies to B who replies to C who replies to A) and enables this sub-network to grow.

Network_replies_OSF16_blog

Figure 4: the network of replies.

Now, my own experience of tweeting at the Fest suggested that tweets were multilingual. Apart from the fourth day, there seemed to be a large number of French-speaking participants. A quick-and-dirty (for now) language detection exercise revealed that roughly 60% of tweets were in English, 25% in French, the rest being split between different languages especially German, Spanish, and Catalan. So, did people reply to each other based on the language of their tweets? It turns out that quite a few tweeters were involved in conversations in multiple languages. Figure 5 is a variant of Figure 4, colouring nodes and ties differently depending on language. A nice mix: interestingly, the central cluster is not monolingual and in fact, is kept together by a few, albeit small, multi-lingual tweeters.

Replies_by_language

Figure 5: the network of replies, by language.

Let’s turn now to mentions: who are the most mentioned tweeters? Again, I’ll take out of the analysis @OuiShareFest, hugely ahead of anyone else with 832 mentions received. Below, Figure 6 ranks the most mentioned: mostly companies (partners or sponsors of the event such as MAIF), speakers (such as Nathan Schneider, Nilofer Merchant), and key OuiShare personalities (such as Antonin Léonard). Mentions follow the programme of the event, and most mentioned are people and organizations that play a role in shaping it.

15MostMentioned_OSF16

Figure 6: Most mentioned tweeters.

Mentions are also a basis to build another social network – of who mentions whom in a tweet. This will be a larger network compared to the net of replies, as mentions can be of many types and also include retweets (which as we saw above, are very numerous here). There are 17248 mentions (some of which are repeated more than once) in the network. They involve 796 users who mention others and are mentioned in turn; 550 users who are mentioned, but do not mention themselves; and 1680 users who mention others, but are not themselves mentioned.

A large network such as this is more difficult to visualize meaningfully, and I had to introduce some simplifications to do so. I have included only pairs in which one had mentioned the other at least twice: this makes a network of 778 nodes with 2222 ties. The color of nodes depends on their modularity class (a group of nodes that are more connected with one another, than with any other nodes in the network) and their size depends on the number of mentions received. You will clearly recognize at the center of the network, the official @OuiShareFest account, which structures the bulk of the conversations. But even intuitively, other actors seem central as well, and their role deserves being examined more thoroghly (in some future, less preliminary analysis).

Mentions2

Figure 7: Network of Twitter mentions

This analysis is part of a larger research project, “Sharing Networks“, led by Antonio A. Casilli and myself, and dedicated to the study of the emergence of communities of values and interest at the OuiShare Fest 2016. Twitter networks will be combined with other data on networking – including informal networking which we are capturing through a (perhaps old-fashioned, but still useful!) survey.

The analyses and visualizations above were done with the packages TwitteR and igraph in R; Figure 7 was produced with Gephi.