All the hype today is about Data and Big Data, but this notion may seem a bit elusive. My students sometimes struggle understanding the difference between “data” and “literature”, perhaps because of the unfortunate habit to call library portals “databases”. Even colleagues are sometimes uncomfortable with the notion of data (whether “big” or “small”) and the breadth it is now taking. So, a definition can be helpful.
Data are pieces of unprocessed information – more precisely raw indicators, or basic markers, from which information is to be extracted. Untreated, they hardly reveal anything; subject to proper analysis, they can disclose the inner working of some relevant aspects of reality.
The “typical” example of socioeconomic data is the observations/variables matrix, where each row represents an observation – an individual in a population – and each column represents a variable – a particular indicator about that individual, for example age, gender, or geographical location. (In truth data types are more varied and may also include unstructured text, images, audio and video; But for the sake of simplicity, let’s stick to the Matrix here.)
Continue reading “What is data?”
The “open data” movement is radically transforming policy-making. In the name of transparency and openness the UK, US and other governments are releasing large amounts of records. It is a way to hold the government to account: in UK for example, all lobbying efforts in the form of meetings with senior officers are now publicly released. Data also enable the public to make more informed decisions: for example, using apps from public transport services to plan their journeys, or tracking indicators of, say, crime or air pollution levels in their area to decide where to buy property. Data are provided as a free resource for all, and businesses may use them for profit.
The open data movement is not limited to the censuses and surveys produced by National Statistical Institutes (NSIs), the public-sector bodies traditionally in charge of collecting, storing and analyzing data for policy purposes. It extends to other administrations such as the Department for Work and Pensions or the Department for Education in the UK, which also gather and process data, though usually through a different process, not using questionnaires but rather registers.
Continue reading “Data in the public sector: Open data and research data”
The growth of “big data” changes the very essence of modern markets in an important sense. Big data are nothing but the digital traces of a growing number of people’s daily transactions, activities and movements, which are automatically recorded by digital devices and end up in huge amounts in the hands of companies and governments. Payments by debit and credit cards record timing, place, amount, and identity of payer and payee; supermarket loyalty cards report purchases by type, quantity, price, date; frequent traveler programs and public transport cards log users’ locations and movements; and CCTV cameras in retail centers, buses and urban streets capture details from clothing and gestures to facial expressions.
This means that all our market transactions – purchases and sales – are identifiable, and our card providers know a great deal about our economic actions. Our consumption habits (and income and tastes) may seem more opaque to scrutiny but at least to some extent, can be inferred from our locations, movements, and detail of expenses. If I buy some beer, maybe my supermarket cannot tell much about my drinking; but if I never buy any alcohol, it will have strong reasons to conclude that I am unlikely to get drunk. As data crunching techniques progress (admittedly, they are still in their infancy now), my supermarket will get better and better at gauging my habits, practices and preferences.
Continue reading “Big Data redefine what “markets” are”
The very designation of “Big” Data suggests that size of datasets is the dividing line, distinguishing them from “Small” Data (the surveys and questionnaires traditionally used in social science and statistics). But is that all – or are there other, and perhaps more profound, differences?
Let’s start from a well-accepted, size-based definition. In its influential 2011 report, McKinsey Global Institute depicts Big Data as:
“datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze”.
Similarly, O’Reilly Media (2012) defines it as:
“data that exceeds the processing capacity of conventional database systems”.
The literature goes on discussing how to quantify this size, typically measured in terms of bytes. McKinsey estimates that:
“big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes)”
This is not set in stone, though, depending on both technological advances over time and specific industry characteristics.
Continue reading “Big data: Quantity or quality?”
Hallo Data-analyst, Data-user, Data-producer or Data-curious — whatever your role, if you have the slightest interest in data, you’re welcome to this blog!
This is the first post and as is customary, it needs to tell what the whole blog is about. Well, data. Of course! But it aims to do so in an innovative, and hopefully useful, way.
Continue reading “Hallo world – a new blog is now live!”