Posts Tagged ‘ Big data ’

The data of my friend are my data

The rise of digital data, particularly data from the internet, is to be understood in social relational perspective. Online interactions – from email exchanges to use of VOIP services and participation in social media such as Facebook, Twitter and LinkedIn – make people’s social connections explicit and visible. The “social network”, once a metaphor used only in a small sub-field within sociology, is now familiar to everybody as the archetype of computer-mediated social interaction. Digital devices systematically record network structures, so that social ties become an essential part of every individual profile, and users are more and more aware of them.

One consequence of this is the booming popularity of network analysis concepts, which support the algorithms that handle digital data: for example, centrality measures are at the heart of search engine functionalities, and transitivity measures found “friend-of-a-friend” algorithms in social media. In passing, social network analysis itself which had been originally developed for small-sized, non-digital datasets (like surveys about friendship in schools) has undergone a major upgrade to account for social data from the web.

FOAFMore importantly, the relational nature of digital data and the underlying possibilities to use social network analysis, open up new avenues for data collection. If user B publishes a post on, say, their Facebook wall, comments and “likes” received from their friends A, D and E will be connected to the profile of B, accessible and visible from it; in other words, it is possible to retrieve information on A, D or E through the profile of just B. In general social networks, a friend of my friend is my friend; in digital networks, the data of my friends are my data.

Continue reading

Philosophy of data science

The “Impact of Social Science” blog of the London School of Economics has, in the past few weeks, published a  series on “Philosophy of data science“. Each installment is an interview conducted by sociologist Mark Carrigan with a key contributor to the social science reflection on data.

BigData

Continue reading

Data and social networks: empowerment and new uncertainties (in Italian)

I gave a presentation on the topic of “Data and social networks: empowerment and new uncertainties” at the Better Decisions Forum on Big Data and Open Data that took place in Rome on 12 November 2014. The event brought together six speakers from different backgrounds on a variety of topics related to data, and participants were businesspeople, public administration managers, journalists, data and computer scientists.

Here is a video of my talk:

 

 

Unfortunately as you will have noticed, the slides are not always very clearly visible, so it’s better to download them from their original source:

slide-1-638

 

My interview before my talk:

 

 

See? I am trying to stick to my 1st-January commitment of blogging more this year…

Sociology in 2014: rediscovering methods

I was back last week from the annual conference of British Sociological Association (BSA) in Leeds, and as usual, I try to put down my impressions as long as they’re still fresh in my mind. I wasn’t very quick, though, and the BSA’s members newsletter has already come out with comments and short reports about the plenaries, the prizes awarded, and the conference overall. While the conference is described as having been “very vibrant and sociable”, with “exciting conversations” during the breaks and a “diverse mixture of topics” that “reflected the breadth of interests”. My own feelings, I confess, are a bit more mixed.

Leeds
In 2012, the BSA conference was followed by a lively debate after an article, by Aditya Chakrabortty on the Guardian, where he complained about the discipline’s lack of engagement with the financial crisis. He pointed to the BSA press releases featuring research on “older bodybuilders”, and to time devoted to the “holistic massage industry” at the conference, as evidence of what he saw as a retreat from public space. The BSA took the criticism very seriously and, apart from responding to the Guardian, put in place a massive effort to encourage public engagement. The 2013 conference was entitled “Engaging Sociology” and many sessions were dedicated to showing that the profession means it. Confrontation and comparison with economics was open and clear. A major project on social class was presented with all honours. The Sociology journal released a call for papers for a special issue to “Sociology and the Global Economic Crisis”.

Leeds2This year, the “Changing Society” title aimed to stress continuity with last year’s efforts; yet it seems to me that we are back to business as usual. I had the impression that many paper presentations were on topics similar to the body builders and massage that Chakrabortty talked about. That’s why, as I said, my feelings are mixed.

Continue reading

#bigdataBL

On Friday last week, the British Sociological Association (BSA) held an event on “The Challenge of Big Data” at the British Library. It was interesting, stimulating and relevant – I was particularly impressed by the involvement of participants and the very intense live-tweeting, never so lively at a BSA event! And people were particularly friendly and talkative both on their keyboards and at the coffee tables… so in honour of all this, I am choosing the hashtag of the day #bigdataBL as title here.

bigdataBL(Visualisation: http://www.digitalcoeliac.com/)

Some highlights:

  • The designation of “big data” is from industry, not (social) science, said a speaker at the very beginning. And it is known to be fuzzy. Yet it becomes a relevant object of scientific inquiry in that it is bound to affect society, democracy, the economy and, well, social science.
  • Big-data practices change people’s perception of data production and use. Ordinary people are now increasingly aware that a growing range of their actions and activities are being digitally recorded and stored. Data are now a recognized social object.
  • Big data needs to be understood in the context of new forms of value production.
  • So, social scientists need to take note (and this was the intended motivation of the whole event). The complication is that Big Data matter for social science in two different ways. First, they are an object of study in themselves – what are their implications for, say, inequalities, democratic participation, the distribution of wealth. Second, they offer new methods to be exploited to gain insight into a wide range of (traditional and new) social phenomena, such as consumer behaviours (think of Tesco supermarket sales data).
  • Put differently, if you want to understand the world as it is now, you need to understand how information is created, used and stored – that’s what the Big Data business is all about, both for social scientists and for industry actors.

Continue reading

Big Data and social research

Data are not a new ingredient of socio-economic research. Surveys have served the social sciences for long; some of them like the European Social Survey, are (relatively) large-scale initiatives, with multiple waves of observation in several countries; others are much smaller. Some of the data collected were quantitative, other qualitative, or mixed-methods. Data from official and governmental statistics (censuses, surveys, registers) have also been used a lot in social research, owing to their large coverage and good quality. These data are ever more in demand today.

Now, big data are shaking this world. The digital traces of our activities can be retrieved, saved, coded and processed much faster, much more easily and in much larger amounts than surveys and questionnaires. Big data are primarily a business phenomenon, and the hype is about the potential gains they offer to companies (and allegedly to society as a whole). But, as researcher Emma Uprichard says very rightly in a recent post, big data are essentially social data. They are about people, what they do, how they interact together, how they form part of groups and social circles. A social scientist, she says, must necessarily feel concerned.

It is good, for example, that the British Sociological Association is organizing a one-day event on The Challenge of Big Data. It is a good point that members must engage with it. This challenge goes beyond the traditional qualitative/quantitative divide and the underrepresentation of the latter in British sociology. Big data, and the techniques to handle them, are not statistics, and professional statisticians have trouble with it too. (The figure below is just anecdotal, but clearly suggests how a simple search on the Internet identifies Statistics and Big Data as unconnected sets of actors and ties). The challenge has more to do with the a-theoretical stance that big data seem to involve.

TouchGraph2

Continue reading

The fuzziness of Big Data

Fascinating as they may be, Big Data are not without posing problems. Size does not eliminate the problem of quality: because of the very way they are collected, Big Data are unstructured and unsystematized, the sampling criteria are fuzzy, and the classical statistical analyses do not apply very well. The more you zoom in (the more detail you have), the more noise you find, so that you need to aggregate data (that is, to reduce a “big” micro-level dataset to a “smaller” macro one) to detect any meaningful tendency. Analyzing Big Data as they are, without any caution, increases the likelihood of finding spurious correlations – a statistician’s nightmare! In short, processing Big Data is problematic: Although we do have sufficient computational capacity today, we still need to refine appropriate analytical techniques to produce reliable results.

In a sense, the enthusiasm for Big Data is diametrically opposed to another highly fashionable trend in socioeconomic research: that of using randomized controlled trials (RCTs), as in medicine, or at least quasi-experiments (often called “natural experiments”), which enable collecting data under controlled conditions and facilitate detection of causal relationships  much more clearly and precisely than in traditional, non-experimental social research. These data have a lot more structure and scientific rigor than old-fashioned surveys – just the opposite of Big Data!

This is just anecdotal evidence, but do a quick Google search for images on RCTs  vs. Big Data. Here are the first two examples I came across: on the left are RCTs (from a dentistry course), on the right are Big Data (from a business consultancy website).  The former conveys order, structure and control, the latter a sense of being somewhat lost, or of not knowing where all this is heading… Look for other images, I’m sure the great majority won’t be that different from these two.

RCTvsBigData

Continue reading

Big Data redefine what “markets” are

The growth of “big data” changes the very essence of modern markets in an important sense. Big data are nothing but the digital traces of a growing number of people’s daily transactions, activities and movements, which are automatically recorded by digital devices and end up in huge amounts in the hands of companies and governments. Payments by debit and credit cards record timing, place, amount, and identity of payer and payee; supermarket loyalty cards report purchases by type, quantity, price, date; frequent traveler programs and public transport cards log users’ locations and movements; and CCTV cameras in retail centers, buses and urban streets capture details from clothing and gestures to facial expressions.

This means that all our market transactions – purchases and sales – are identifiable, and our card providers know a great deal about our economic actions. Our consumption habits (and income and tastes) may seem more opaque to scrutiny but at least to some extent, can be inferred from our locations, movements, and detail of expenses. If I buy some beer, maybe my supermarket cannot tell much about my drinking; but if I never buy any alcohol, it will have strong reasons to conclude that I am unlikely to get drunk. As data crunching techniques progress (admittedly, they are still in their infancy now), my supermarket will get better and better at gauging my habits, practices and preferences.

Continue reading

Big data: Quantity or quality?

The very designation of “Big” Data suggests that size of datasets is the dividing line, distinguishing them from “Small” Data (the surveys and questionnaires traditionally used in social science and statistics). But is that all – or are there other, and perhaps more profound, differences?

Let’s start from a well-accepted, size-based definition. In its influential 2011 report, McKinsey Global Institute depicts Big Data as:

“datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze”.

Similarly, O’Reilly Media (2012) defines it as:

“data that exceeds the processing capacity of conventional database systems”.

The literature goes on discussing how to quantify this size, typically measured in terms of bytes. McKinsey estimates that:

“big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes)”

This is not set in stone, though, depending on both technological advances over time and specific industry characteristics.

Continue reading

Hallo world – a new blog is now live!

Hallo Data-analyst, Data-user, Data-producer or Data-curious — whatever your role, if you have the slightest interest in data, you’re welcome to this blog!

This is the first post and as is customary, it needs to tell what the whole blog is about. Well, data. Of course! But it aims to do so in an innovative, and hopefully useful, way.

DataBigAndSmall2

Continue reading