The “Impact of Social Science” blog of the London School of Economics has, in the past few weeks, published a series on “Philosophy of data science“. Each installment is an interview conducted by sociologist Mark Carrigan with a key contributor to the social science reflection on data.
I gave a presentation on the topic of “Data and social networks: empowerment and new uncertainties” at the Better Decisions Forum on Big Data and Open Data that took place in Rome on 12 November 2014. The event brought together six speakers from different backgrounds on a variety of topics related to data, and participants were businesspeople, public administration managers, journalists, data and computer scientists.
Here is a video of my talk:
Unfortunately as you will have noticed, the slides are not always very clearly visible, so it’s better to download them from their original source:
My interview before my talk:
See? I am trying to stick to my 1st-January commitment of blogging more this year…
How many people do you know? How many friends do you have? You may have tried to count your contacts on Facebook or other social networking websites. You may even have felt a bit weird realizing that your “real” friends — those you can rely on — are just a handful. As unexpected it might seem, business professionals have this question in mind too: they want to get a sense of the potentially useable social capital of their associates and employees.
Social research has investigated this matter intensely and can offer insight. There are, in fact, two aspects to be considered: the size of personal networks and the effects of online communication on socialisation.
The size of personal networks
Let us first start with the size of personal networks. A milestone in this debate is the so-called “Dunbar’s number“, based on a 1992 study of Oxford anthropologist Robin Dunbar. The idea is that human cognitive capacities as measured by the size of the neocortex lead to a network size of around 148 (with some range of variation). The original study compared the size of the neocortex in various groups of primates and humans and referred to cohesive communities. The resulting limit indicates the number of people with whom one can maintain “stable” social relationships, i.e., know who each contact is, and how they are related to one another.
Other parts of the brain may be involved too, suggest neuroscientists: Lisa Barrett and her co-authors (2010) found a correlation between amygdala volume and social network size in humans. (I understand that the amygdala is the part of brain that regulates emotional responses and aggression, while the neocortex to which Dunbar referred is the part of the brain that presides higher mental functions.) (see this Blogpost for further information).
In social network analysis perspective, it is also important to define which social network we are measuring. Peter Marsden (1987) distinguished “core” networks from whole personal networks, pointing out that even when people have many friends, there are only a handful with whom they “can discuss important matters”. In this sense, core networks may not include more than five or six people. So if you thought you had very few friends, you shouldn’t feel weird after all… apparently the Portuguese have a saying, “You have five friends, and the rest is landscape.”
On the other hand, your full network also including mere acquaintances and weaker ties may be much larger than Dunbar’s: counts of full networks taken by Peter Killworth, H. Russel Bernard, Chris McCarthy and co-authors in the 1990s – 2000s went up to about 1500 for the average American. From these, they extracted more meaningful measures of networks that are really relevant for people’s daily lives and came up with other numbers: they found a mean personal network size of 290 (twice the Dunbar number!); more recently, Matthew Salganik and his co-authors (2010) have come up with an even larger size of 610 (twice Killworth’s number…).
Overall, an issue that emerges from many of these discussions is that cognitive capacities (however defined) matter primarily because they are associated with a basic limitation of all living beings –time is finite. Therefore, increasing the size of one’s personal network implies that less time is available for each contact: the size of the overall network increases, but the size of the core network doesn’t. Weak ties may gain at the expense of strong ties.
Data visualisation is still relatively uncommon in the social sciences, and is not normally expected to be part of the standard work of a scholar (contrary, some would say, to what happens in the sciences, where visualisation is sometimes necessary to figure out the properties of objects whose existence is proven, but which cannot be seen). Yet data visualisation has an extraordinary history of accomplishments even in the social realm, as cleverly documented in a forthcoming article by James Moody and Kieran Healy; and classics such as Pierre Bourdieu valued it and attempted to use it in at least some of their work, as Baptiste Coulmont interestingly reported in a blog post.
Yet the digital age offers new opportunities for data visualisation, that are largely unexploited in the social sciences. It becomes not only a tool for the researcher — to explore data prior to conducting statistical analyses, or to present results once the work is done — but also for the general user, the study subject, the beneficiary of any policy under discussion, and the general public. As theorists in the arts and digital humanities (but not much in the social sciences, I am afraid) have noticed, the Internet and all digital infrastructures are becoming today interfaces with databases, and users of all types are immersed in a world of data in a way that was unknown before. This means that data visualisations can have new and more transformative uses, empowering study subjects and people in general, by offering them intuitive and aesthetically appealing tools to better navigate this digital world. But it also involves new dangers, as to who sets the agenda and what aspects or characteristics of the data are being stressed; data are not just objective, ‘raw’ materials but mediated ones, and the choice of how to make them perceptible by the senses is not neutral.
At the annual conference of the British Sociological Association today in Leeds, in the Methodological Innovations Stream, I am presenting data visualisation work I have done with colleagues Antonio A. Casilli, Lise Mounier and Fred Pailler, as well as data visuliaser Quentin Bréant, as part of the research project ANAMIA. We developed three tools — one for data collection, one for data exploration and preliminary analysis, one as a basis for heuristics and presentation of results. The first was for our study subjects, the second for us researchers and our colleagues, the third for us and the larger public. My slides are available:
A now classical result of the sociology of social networks is the distinction between formal social structures defined by kinship, inherited hierarchy or companies’ organisational charts, and informal structures arising from nets of friendship, trust, solidarity, similarities and dissimilarities. As far back as 1954, John A. Barnes (who incidentally, is credited with coining the wording ‘social networks’) in a renowned study of a small community of fishers in a Norwegian parish demonstrated that exogenously defined positions such as those arising from political administration, economic activity or family are insufficient to explain the social structure of the community, which largely depends on less codified relationships of friendship and acquaintance. In organisational studies, it appeared that the formal chart of a company and the actual networks of advice, trust or communication of members may differ widely, and surveys aimed at eliciting network ties (with ‘name generators’ for example) became a privileged means to bring to light the ‘company behind the chart‘ (Krackhardt & Hanson 1993) and to make ‘invisible work visible‘ (Cross, Parker & Borgatti 2002). Social network scholars advised managers on how, by using employee questionnaires, they could generate network maps and get to the root of many organisational problems. Another major finding was about the emergence of informal roles – the leader, the deviant, the broker – and their important contribution to driving the behaviours and outcomes of human groups, beyond all prescribed, formal authorities (Johnson, Boster & Palinkas 2003).
The research and consultancy activity that built on these ideas had a strong impact on organisational culture worldwide, especially as companies tended to flatten and rely on teams and cross-divisional, project-based work, so that managers’ authority mattered less and understanding these informal networks became a potential key for success. Many would admit today that the organisational chart is the fantasy of the employer, not an actionable tool, and even less so a reliable reflection of reality. But then, what are the advice, trust, and communication networks mapped by the researcher – shouldn’t we say they are the fantasy of the sociologist? These networks are built from questionnaires and therefore rely on the subjective responses of participants; and it is well known in the area of survey design research, that question wording orients responses, that different cultures and groups tend to interpret questions differently, and that people may give biased answers due to forgetting, deliberate concealing of sensitive information, ambiguity of definitions, and diversity in perceptions. The survey is the traditionally primary tool of investigation of the social networks scholar, but brings with it its limitations and distortions.
One may think that the formal organisational chart and the informal advice (or trust or communication) network are just two different ways of construing social structure and objectivating it. They are informed by different political and epistemological orientations: those of (old-style) employers for the former, those of social researchers (and perhaps enlightened employers) for the latter. The resulting formal-informal dichotomy would then be the result of a cleavage between two competing approaches to the management of organisations (and more generally of human groups or communities), one more hierarchical and functional, the other flatter and more collaborative.
Science, like the rest of human life, is subject to fashions. Data visualisation is the latest trend: policy-makers and the public are all under its charm, and researchers magically suspend their disbelief — give me a fancy image, and I won’t look too closely at your p-values. So I was intrigued by the discovery, at a talk few days ago by Paul Jackson of the Office for National Statistics, that there are precedents, and that they have a long history behind them.
The story is that of John Snow, an epidemiologist who was persuaded, against the received wisdom of the mid-nineteenth century, that cholera does not propagate through air but through contaminated water or food. But how to convince others? When cholera struck London in 1854, Snow began plotting the location of deaths on a map of Soho: he represented each death through a line parallel to the building front in which the person died.
Snow soon realised that there was a concentration of “death lines” around Broad Street — more specifically, around a water pump at the corner between Broad and Cambridge St.
He managed to convince the authorities to remove the handle of the pump, so that people could no longer use it: in a few days, the number of deaths in the area plummeted. Snow had proven his point and saved lives: using no medical trials, no sophisticated chemistry, just with some basic count statistics, and a clever dataviz.
A major health data plan is on the verge of being called off, to never have a chance again. It is supposed to anonymise all the patient records in the National Health Service (NHS) in the UK, linking them together into one single, giant database, and making them available under controlled use conditions to health researchers and (controversially) to commercial companies too. Public outcry has led to the plan being delayed for six months.
In an article published in The Guardian last week, Ben Goldacre, a medical doctor and high-profile media commentator on science matters, rightly identifies what the point is: in principle, the public accepts release of data for scientific purposes, but resists commercial exploitation. And rightly so: medical knowledge results from the study of several cases, and the higher the availability of cases, the more accurate the results; in the era of big data, it is also clear that aggregation and sharing of a wealth of data such as those held by the NHS is a unique opportunity for medical science to discover ways of saving lives. On the other hand, use of data for any other purposes looks much more opaque, and people understandably feel it might lead to discrimination and potentially negative individual consequences, for example if disclosure of the health history of a person results in higher insurance premiums, or rejection of job applications.
Network data are among those that are changing fastest these days. When I say I study social networks, people almost automatically think of Facebook or Twitter –without necessarily realizing that networks have been around for, well, the whole history of humanity, long before the internet. Networks are just systems of social relationships, and as such, they can exist in any social context — the family, school, workplace, village, church, leisure club, and so forth. Social scientists started mapping and analysing networks as early as the 1930s. But people didn’t think of their social relationships as “networks” and didn’t always see themselves as “networkers” even if they did invest a lot in their relationships, were aware of them, and cared about them. The term, and the systemic configuration, were just not familiar. There was something inherently informal and implicit about social ties.
What has changed with Facebook and its homologues, is that the network metaphor has become explicit. People are now accustomed to talking about “networks”, and think in systemic terms, seeing their own relationships as part of a more global structure. Network ties have become formal — you have to make a clear choice and action when you add a “friend” on Facebook, or “follow” someone on Twitter; you will have a list of your friends/followers/followees (whatever the specific terminology is) and monitor changes in this list. You know who the friends of your friends are, and can keep track of how many people viewed your profile /included you in their “lists” / mentioned you in their Tweets. Now, everyone knows what networks are –so if you are a social network researcher and conduct a survey like in the old days, you won’t fear your respondents may misunderstand. In fact, you may not even need to do a survey at all –the formal nature of online ties, digitally recorded and stored, makes it possible to retrieve your network information automatically. You can just mine network tie data from Facebook, Twitter, or whatever service your target populations happen to be using.
The growth of “big data” changes the very essence of modern markets in an important sense. Big data are nothing but the digital traces of a growing number of people’s daily transactions, activities and movements, which are automatically recorded by digital devices and end up in huge amounts in the hands of companies and governments. Payments by debit and credit cards record timing, place, amount, and identity of payer and payee; supermarket loyalty cards report purchases by type, quantity, price, date; frequent traveler programs and public transport cards log users’ locations and movements; and CCTV cameras in retail centers, buses and urban streets capture details from clothing and gestures to facial expressions.
This means that all our market transactions – purchases and sales – are identifiable, and our card providers know a great deal about our economic actions. Our consumption habits (and income and tastes) may seem more opaque to scrutiny but at least to some extent, can be inferred from our locations, movements, and detail of expenses. If I buy some beer, maybe my supermarket cannot tell much about my drinking; but if I never buy any alcohol, it will have strong reasons to conclude that I am unlikely to get drunk. As data crunching techniques progress (admittedly, they are still in their infancy now), my supermarket will get better and better at gauging my habits, practices and preferences.