Data visualization techniques are enjoying ever greater popularity, notably thank to the recent boom of Big Data and our increased capacity to handle large datasets. Network data visualization techniques are no exception. in fact, appealing diagrams of social connections (sociograms) have been at the heart of the field of social network analysis since the 1930s, and have contributed a lot to its success. Today, all this is evolving at unprecedented pace.
In line with these tendencies, the research team of the project ANAMIA (a study of the networks and online sociability of persons with eating disorders, funded by the French ANR) of which I was one of the investigators, have developed new software tools for the visualization of personal network data, with different solutions for the three stages of data collection, analysis, and dissemination of results.
– ANAMIA EGOCENTER is a graphical version of a name generator, to be embedded in a computer-based survey to collect personal network data. It has turned out to be a user-friendly, highly effective interface for interacting and engaging with survey respondents;
Data are not a new ingredient of socio-economic research. Surveys have served the social sciences for long; some of them like the European Social Survey, are (relatively) large-scale initiatives, with multiple waves of observation in several countries; others are much smaller. Some of the data collected were quantitative, other qualitative, or mixed-methods. Data from official and governmental statistics (censuses, surveys, registers) have also been used a lot in social research, owing to their large coverage and good quality. These data are ever more in demand today.
Now, big data are shaking this world. The digital traces of our activities can be retrieved, saved, coded and processed much faster, much more easily and in much larger amounts than surveys and questionnaires. Big data are primarily a business phenomenon, and the hype is about the potential gains they offer to companies (and allegedly to society as a whole). But, as researcher Emma Uprichard says very rightly in a recent post, big data are essentially social data. They are about people, what they do, how they interact together, how they form part of groups and social circles. A social scientist, she says, must necessarily feel concerned.
It is good, for example, that the British Sociological Association is organizing a one-day event on The Challenge of Big Data. It is a good point that members must engage with it. This challenge goes beyond the traditional qualitative/quantitative divide and the underrepresentation of the latter in British sociology. Big data, and the techniques to handle them, are not statistics, and professional statisticians have trouble with it too. (The figure below is just anecdotal, but clearly suggests how a simple search on the Internet identifies Statistics and Big Data as unconnected sets of actors and ties). The challenge has more to do with the a-theoretical stance that big data seem to involve.
Fascinating as they may be, Big Data are not without posing problems. Size does not eliminate the problem of quality: because of the very way they are collected, Big Data are unstructured and unsystematized, the sampling criteria are fuzzy, and the classical statistical analyses do not apply very well. The more you zoom in (the more detail you have), the more noise you find, so that you need to aggregate data (that is, to reduce a “big” micro-level dataset to a “smaller” macro one) to detect any meaningful tendency. Analyzing Big Data as they are, without any caution, increases the likelihood of finding spurious correlations – a statistician’s nightmare! In short, processing Big Data is problematic: Although we do have sufficient computational capacity today, we still need to refine appropriate analytical techniques to produce reliable results.
In a sense, the enthusiasm for Big Data is diametrically opposed to another highly fashionable trend in socioeconomic research: that of using randomized controlled trials (RCTs), as in medicine, or at least quasi-experiments (often called “natural experiments”), which enable collecting data under controlled conditions and facilitate detection of causal relationships much more clearly and precisely than in traditional, non-experimental social research. These data have a lot more structure and scientific rigor than old-fashioned surveys – just the opposite of Big Data!
This is just anecdotal evidence, but do a quick Google search for images on RCTs vs. Big Data. Here are the first two examples I came across: on the left are RCTs (from a dentistry course), on the right are Big Data (from a business consultancy website). The former conveys order, structure and control, the latter a sense of being somewhat lost, or of not knowing where all this is heading… Look for other images, I’m sure the great majority won’t be that different from these two.