We have just published the results of our research project ANAMIA, studying the personal networks and online interactions of persons with eating disorders (“ana” and “mia” in web jargon). The report has just come out:
Report: Young internet users and eating disorder websites: beyond the notion of “pro-ana” (pdf, 92 pp, in French)
Infographic: results and recommendations of the ANAMIA project (pdf, in French)
Summary (in English!)
The ana-mia webosphere had remained opaque for long, with little data available for a science-based understanding of it. As a result, misconceptions proliferated and policy-makers hesitated — threatening censorship but without devising solutions to reach out and support a population in distress. Our study has been the first to overcome these limitations and reveal the social environment, actual eating practices and digital usages of persons with eating disorders in the English and French web.
Visualization of the personal networks of four individuals with, respectively, EDNOS (Eating Disorders Not Otherwise Specified, top panel, left), anorexia nervosa (top, right), bulimia nervosa (bottom, left), binge eating (bottom right). Hollow circles represent their face-to-face acquaintances, filled circles their online ones. Colours indicate relational proximity to the subject (green: intimate, blue: very close, yellow: close, red: somewhat close). Source: ANAMIA project report.
Continue reading “Small Data to study the Web: The ANAMIA project”
On Friday last week, the British Sociological Association (BSA) held an event on “The Challenge of Big Data” at the British Library. It was interesting, stimulating and relevant – I was particularly impressed by the involvement of participants and the very intense live-tweeting, never so lively at a BSA event! And people were particularly friendly and talkative both on their keyboards and at the coffee tables… so in honour of all this, I am choosing the hashtag of the day #bigdataBL as title here.
- The designation of “big data” is from industry, not (social) science, said a speaker at the very beginning. And it is known to be fuzzy. Yet it becomes a relevant object of scientific inquiry in that it is bound to affect society, democracy, the economy and, well, social science.
- Big-data practices change people’s perception of data production and use. Ordinary people are now increasingly aware that a growing range of their actions and activities are being digitally recorded and stored. Data are now a recognized social object.
- Big data needs to be understood in the context of new forms of value production.
- So, social scientists need to take note (and this was the intended motivation of the whole event). The complication is that Big Data matter for social science in two different ways. First, they are an object of study in themselves – what are their implications for, say, inequalities, democratic participation, the distribution of wealth. Second, they offer new methods to be exploited to gain insight into a wide range of (traditional and new) social phenomena, such as consumer behaviours (think of Tesco supermarket sales data).
- Put differently, if you want to understand the world as it is now, you need to understand how information is created, used and stored – that’s what the Big Data business is all about, both for social scientists and for industry actors.
Continue reading “#bigdataBL”
Yesterday, Antonio Casilli and I gave our promised talk on network data visualization. It was an opportunity to discuss the extension of the tools we developed within a given research project to other network studies, and to reflect on the contribution as well as the limitations of data visualizations. Here are our slides:
Data visualization techniques are enjoying ever greater popularity, notably thank to the recent boom of Big Data and our increased capacity to handle large datasets. Network data visualization techniques are no exception. in fact, appealing diagrams of social connections (sociograms) have been at the heart of the field of social network analysis since the 1930s, and have contributed a lot to its success. Today, all this is evolving at unprecedented pace.
In line with these tendencies, the research team of the project ANAMIA (a study of the networks and online sociability of persons with eating disorders, funded by the French ANR) of which I was one of the investigators, have developed new software tools for the visualization of personal network data, with different solutions for the three stages of data collection, analysis, and dissemination of results.
– ANAMIA EGOCENTER is a graphical version of a name generator, to be embedded in a computer-based survey to collect personal network data. It has turned out to be a user-friendly, highly effective interface for interacting and engaging with survey respondents;
Continue reading “Three tools to visualize personal networks”
Data are not a new ingredient of socio-economic research. Surveys have served the social sciences for long; some of them like the European Social Survey, are (relatively) large-scale initiatives, with multiple waves of observation in several countries; others are much smaller. Some of the data collected were quantitative, other qualitative, or mixed-methods. Data from official and governmental statistics (censuses, surveys, registers) have also been used a lot in social research, owing to their large coverage and good quality. These data are ever more in demand today.
Now, big data are shaking this world. The digital traces of our activities can be retrieved, saved, coded and processed much faster, much more easily and in much larger amounts than surveys and questionnaires. Big data are primarily a business phenomenon, and the hype is about the potential gains they offer to companies (and allegedly to society as a whole). But, as researcher Emma Uprichard says very rightly in a recent post, big data are essentially social data. They are about people, what they do, how they interact together, how they form part of groups and social circles. A social scientist, she says, must necessarily feel concerned.
It is good, for example, that the British Sociological Association is organizing a one-day event on The Challenge of Big Data. It is a good point that members must engage with it. This challenge goes beyond the traditional qualitative/quantitative divide and the underrepresentation of the latter in British sociology. Big data, and the techniques to handle them, are not statistics, and professional statisticians have trouble with it too. (The figure below is just anecdotal, but clearly suggests how a simple search on the Internet identifies Statistics and Big Data as unconnected sets of actors and ties). The challenge has more to do with the a-theoretical stance that big data seem to involve.
Continue reading “Big Data and social research”
Fascinating as they may be, Big Data are not without posing problems. Size does not eliminate the problem of quality: because of the very way they are collected, Big Data are unstructured and unsystematized, the sampling criteria are fuzzy, and the classical statistical analyses do not apply very well. The more you zoom in (the more detail you have), the more noise you find, so that you need to aggregate data (that is, to reduce a “big” micro-level dataset to a “smaller” macro one) to detect any meaningful tendency. Analyzing Big Data as they are, without any caution, increases the likelihood of finding spurious correlations – a statistician’s nightmare! In short, processing Big Data is problematic: Although we do have sufficient computational capacity today, we still need to refine appropriate analytical techniques to produce reliable results.
In a sense, the enthusiasm for Big Data is diametrically opposed to another highly fashionable trend in socioeconomic research: that of using randomized controlled trials (RCTs), as in medicine, or at least quasi-experiments (often called “natural experiments”), which enable collecting data under controlled conditions and facilitate detection of causal relationships much more clearly and precisely than in traditional, non-experimental social research. These data have a lot more structure and scientific rigor than old-fashioned surveys – just the opposite of Big Data!
This is just anecdotal evidence, but do a quick Google search for images on RCTs vs. Big Data. Here are the first two examples I came across: on the left are RCTs (from a dentistry course), on the right are Big Data (from a business consultancy website). The former conveys order, structure and control, the latter a sense of being somewhat lost, or of not knowing where all this is heading… Look for other images, I’m sure the great majority won’t be that different from these two.
Continue reading “The fuzziness of Big Data”
If you are a researcher in economics, demography, sociology, geography or political science, you may have experienced the frustration of discovering a relevant data resource and being denied access to it — typically on the ground that data release would violate the confidentiality of data subjects. Or you may have heard of fantastic analyses — with all the fancy new statistical and econometrics tools and software that are increasingly in fashion today — done with large amounts of very detailed microdata, but you have no clue how to do anything like that yourself. Maybe you have tried to look at the website of some public administration that likely holds the data you want – like labor market or business data — but could not figure out how to ask for these data in the first place. And f you ever tried to access data from two or more different countries, you probably found the task of even finding out how to apply in different systems daunting.
Now, there is a great opportunity for you to get closer to your goal. The European project “Data without Boundaries” (DwB) offers social scientists from across Europe funding, information and support to access household surveys and business data from public-sector records in special Research Data Centers in France, Germany, the Netherlands and UK. These are microdata at individual level, highly detailed; they cannot be publicly released, but access can be legally given for scientific and statistical research purposes.
Both confirmed researchers and PhD students are welcome to apply, and should do so in a country different from the one where they reside. There is a preference for comparative, cross-country projects. The deadline is 15th October 2013.
For more information, see the call for proposals on the DwB website.
This is part of a broader policy effort to improve researchers’ access to data in Europe, to enhance capacity to produce science-based understanding of society across the continent.
All the hype today is about Data and Big Data, but this notion may seem a bit elusive. My students sometimes struggle understanding the difference between “data” and “literature”, perhaps because of the unfortunate habit to call library portals “databases”. Even colleagues are sometimes uncomfortable with the notion of data (whether “big” or “small”) and the breadth it is now taking. So, a definition can be helpful.
Data are pieces of unprocessed information – more precisely raw indicators, or basic markers, from which information is to be extracted. Untreated, they hardly reveal anything; subject to proper analysis, they can disclose the inner working of some relevant aspects of reality.
The “typical” example of socioeconomic data is the observations/variables matrix, where each row represents an observation – an individual in a population – and each column represents a variable – a particular indicator about that individual, for example age, gender, or geographical location. (In truth data types are more varied and may also include unstructured text, images, audio and video; But for the sake of simplicity, let’s stick to the Matrix here.)
Continue reading “What is data?”
The “open data” movement is radically transforming policy-making. In the name of transparency and openness the UK, US and other governments are releasing large amounts of records. It is a way to hold the government to account: in UK for example, all lobbying efforts in the form of meetings with senior officers are now publicly released. Data also enable the public to make more informed decisions: for example, using apps from public transport services to plan their journeys, or tracking indicators of, say, crime or air pollution levels in their area to decide where to buy property. Data are provided as a free resource for all, and businesses may use them for profit.
The open data movement is not limited to the censuses and surveys produced by National Statistical Institutes (NSIs), the public-sector bodies traditionally in charge of collecting, storing and analyzing data for policy purposes. It extends to other administrations such as the Department for Work and Pensions or the Department for Education in the UK, which also gather and process data, though usually through a different process, not using questionnaires but rather registers.
Continue reading “Data in the public sector: Open data and research data”
The growth of “big data” changes the very essence of modern markets in an important sense. Big data are nothing but the digital traces of a growing number of people’s daily transactions, activities and movements, which are automatically recorded by digital devices and end up in huge amounts in the hands of companies and governments. Payments by debit and credit cards record timing, place, amount, and identity of payer and payee; supermarket loyalty cards report purchases by type, quantity, price, date; frequent traveler programs and public transport cards log users’ locations and movements; and CCTV cameras in retail centers, buses and urban streets capture details from clothing and gestures to facial expressions.
This means that all our market transactions – purchases and sales – are identifiable, and our card providers know a great deal about our economic actions. Our consumption habits (and income and tastes) may seem more opaque to scrutiny but at least to some extent, can be inferred from our locations, movements, and detail of expenses. If I buy some beer, maybe my supermarket cannot tell much about my drinking; but if I never buy any alcohol, it will have strong reasons to conclude that I am unlikely to get drunk. As data crunching techniques progress (admittedly, they are still in their infancy now), my supermarket will get better and better at gauging my habits, practices and preferences.
Continue reading “Big Data redefine what “markets” are”