National Statistical Institutes (NSIs) have long been the recognised repositories of all socio-economic information, mandated by governments to collect and analyse data on their behalf. The development of big data is shaking this world. New actors are coming in and commercially-oriented, privately-produced information challenges the monopoly of NSIs. At the same time, NSIs themselves can tap into digital technologies and produce “big” data. More generally, these new sources offer a range of opportunities, challenges and risks to the work of NSIs.
The Statistical Journal of the IAOS, the flagship journal of the International Association for Official Statistics, has published a special section on big data – of particular interest to the extent that it is free of charge!
Fride Eeg-Henriksen and Peter Hackl introduce this special section by defining big data and emphasising its interest for official statistics. But it is crucial, albeit admittedly not easy, to separate the hype around big data from its actual importance.
The other papers are concrete examples of how big data may be integrated into official statistics:
Steven Vale’s “International Collaboration to Understand the Relevance of Big Data for Official Statistics” describes how international collaboration activities can help the official statistical community to better understand the phenomenon.
“Web scraping techniques to collect data on consumer electronics and airfares for an Italian HICP compilation”, by Federico Polidoro and his colleagues at ISTAT, reports results of a project aimed at modernization of data collection and the use of web scraping techniques. Topics discussed are quality (in terms of efficiency and reduction of error) and some preliminary comments about the usability of big
“The production of salary profiles of ICT professionals: Moving from structured database to big data analytics” by Ramachandran Ramasamy from the National ICT Association of Malaysia reports about the production of salary profiles based on data from a private sector online job registration system.Discussion is on data dissemination issues and use of
Barteld Braaksma and Kees Zeelenberg from Statistics Netherlands write on the ‘representativity” of big data sources in their “Remake/Remodel – should big data change the modelling paradigm in official statistics?”. Interestingly, they argue that the use of
Finally, “Measuring output quality for multisource statistics in official statistics: Some directions“, by Mihaela Agafiţei and her co-authors from Eurostat, analyses the appropriateness of standard quality measures in when using multiple sources of data and suggests directions for further research. Focus is on administrative data rather than big data strictly speaking – the two are not equivalent, but some believe (not uncontroversially) that the former are one possible sub-category within the latter.
The conclusion of the editors is that “the feasibility and the potentials of using big data in official statistics have to be assessed from case to case. […] issues like the representativity and the quality of the resulting statistics, or the confidentiality and the risk of disclosure of personal data need to be assessed individually for each case.”
I take it as meaning that, well, we don’t know: there is undoubtedly some potential, but lines of development and approaches to address common problems are still to be clarified. More research must come, surely.