On Friday last week, the British Sociological Association (BSA) held an event on “The Challenge of Big Data” at the British Library. It was interesting, stimulating and relevant – I was particularly impressed by the involvement of participants and the very intense live-tweeting, never so lively at a BSA event! And people were particularly friendly and talkative both on their keyboards and at the coffee tables… so in honour of all this, I am choosing the hashtag of the day #bigdataBL as title here.
- The designation of “big data” is from industry, not (social) science, said a speaker at the very beginning. And it is known to be fuzzy. Yet it becomes a relevant object of scientific inquiry in that it is bound to affect society, democracy, the economy and, well, social science.
- Big-data practices change people’s perception of data production and use. Ordinary people are now increasingly aware that a growing range of their actions and activities are being digitally recorded and stored. Data are now a recognized social object.
- Big data needs to be understood in the context of new forms of value production.
- So, social scientists need to take note (and this was the intended motivation of the whole event). The complication is that Big Data matter for social science in two different ways. First, they are an object of study in themselves – what are their implications for, say, inequalities, democratic participation, the distribution of wealth. Second, they offer new methods to be exploited to gain insight into a wide range of (traditional and new) social phenomena, such as consumer behaviours (think of Tesco supermarket sales data).
- Put differently, if you want to understand the world as it is now, you need to understand how information is created, used and stored – that’s what the Big Data business is all about, both for social scientists and for industry actors.
- Among the substantive issues, there is the question of power and empowerment. Censuses were distributed to the whole population regardless of wealth, behaviours and practices. But now, censuses are gradually being reconsidered or phased out in many countries, and replaced with smaller surveys combined with administrative and – at times – big data sources. Won’t this leave out the poorer, those who do not have credit cards or Internet access?
- Who owns the data? Who gets to use it? How do we access “our” personal data? What about privacy protection? These are other substantive issues that were discussed at the event.
- As a tool and method for scientific research, big data have the advantage to be cheap – free software and cloud storage do the trick. But big data require specific expertise. This is a challenge for social science, which does not just reflect the old qualitative-quantitative divide: big data analytical tools and methods differ from classical statistics.
- Why do many people capitalize “Big Data”? There’s no particular reason except that it looks more impressive (not necessarily more serious, I must add). Perhaps it’s a way for big data analytical service providers to command a higher price on the market. But the capitalization impresses some social scientists too – can we cope with these data, with our theories and approaches (and our lack of statistical training, some would add)?
- Some scientists – economists in particular, who do have strong, albeit classical, quantitative skills – resist big data for a different reason. Economists care about causality, while big data serve best for prediction. It’s like the crucifix and Dracula, said an attendee.
- The other major methodological issue is that traditional (survey) data, and even most governmental open data, are curated, while big data are not – one attendee defined them as “feral”. The risk of misinterpretation and misuse is high.
- In practice, social scientists who are interested in big data need to redefine disciplinary boundaries and the academia/rest-of-the-world frontier. They need to team up with mathematicians and computer scientists to access the skills needed to analyse big data. And they need to talk with the private sector and say that we can do valuable sociable things with their data. Tesco sales figures are not just a tool for better managing stores, but may serve more general goals of better understanding time use, social capital, income inequalities, and people’s health awareness – and to design suitable social policies.
More issues were discussed but this was in my view the essence. A visualisation of the Twitter discussions that accompanied the event is available here.