Overview of Economics Datasets and Databases

By Bas Machielsen

December 6, 2022

Introduction

In this post, I’ll give an overview and short description of various commonly-used and mostly publicly available data sources in Economic History. I am planning to update this repository as soon as I find new data sources, so if you find something is missing or if you have any suggestions, please contact me or write a comment below! Thanks to Giacomo Domini and Ruben Peeters for suggesting several of these sources. Here’s the current overview:

  1. World Values Survey
  2. Demographics and Health Survey
  3. Global Climate Database
  4. Global Agro-Ecological Zones
  5. GISS Surface Temperature Data
  6. National Centers for Environmental Information
  7. Clio Infra
  8. Scholarly Measures of Politics
  9. Correlates of War
  10. Eurostat
  11. Pew Polls
  12. Political Protests
  13. IISG Datasets
  14. D-place
  15. Labor conflicts
  16. World Bank, IMF, PWT
  17. UCDP Conflict Data
  18. Regional GDP
  19. EH.net Repository
  20. Macrohistory Database
  21. OECD Database
  22. World Economic History Datasets
  23. International Conflict Research Databases
  24. Soviet Historical Census
  25. Global Preferences Survey
  26. Global Gallup Datasets
  27. Comparative Political Dataset
  28. World Inequality Database
  29. COMTRADE
  30. Observatory of Economic Complexity
  31. Atlas of Economic Complexity
  32. Ricardo Database
  33. Federico-Tena World Trade Historical Database
  34. GRIP Global Roads Database
  35. Statistical Agencies, Parliaments
  36. GIS Databases
  37. Municipal Data
  38. R Spatial Data Packages
  39. Other Data Packages
  40. Regional GDP and HDI
  41. Linguistic Data
  42. Mining Production
  43. Gridded Data Collection GRID PRIO
  44. Gridded Population
  45. Road Networks
  46. Biodiversity
  47. EEA Geospatial Data Catalogue
  48. Cartes Historiques

World Values Survey

Available at https://www.worldvaluessurvey.org. The data contains geographically coded outcomes for individual surveys on a broad range of questions related to values. The latest wave is available here and can be downloaded in various formats.

Demographics and Health Survey

Available at https://dhsprogram.com/data. This is also geographically coded data and contains information on characteristics of the household population, and their dwelling conditions, as well as describing eligible respondents and indicators of women’s status and their situation, on fertility, fertility preferences, determinants of fertility, family planning, childhood mortality, maternal and child health and nutrition, and diseases. The survey is effectuated in a great number of African, Central Asian, Asian and Latin American countries, as well as some European and Oceanian ones. You need to make a (free) account to access the data.

Global Climate Database

A website containing climate data for the world on a number of spatial frequencies from 1970 to 2000. Available at http://www.worldclim.org/.

Global Agro-Ecological Zones

The Food and Agriculture Organization of the United Nations (FAO) and the International Institute for Applied Systems Analysis (IIASA) have cooperated over several decades to develop and implement the Agro-Ecological Zones (AEZ) modelling framework and databases. AEZ relies on well-established land evaluation principles to assess natural resources for finding suitable agricultural land utilization options. It identifies resource limitations and opportunities based on plant eco-physiological characteristics, climatic and edaphic requirements of crops and it uses these to evaluate suitability and production potentials for individual crop types under specific input and management conditions. Available here

GISS Surface Temperature Data

The GISS Surface Temperature Analysis version 4 (GISTEMP v4) is an estimate of global surface temperature change. Graphs and tables are updated around the middle of every month using current data files from NOAA GHCN v4 (meteorological stations) and ERSST v5 (ocean areas). Available here.

National Centers for Environmental Information

Another website containing historical (from 1763) to present climate information. Available at https://www.ncei.noaa.gov/access/search/index. Updates are applied daily. An example dataset can be found here.

Clio Infra

A repository of various economic history datasets assembled by various researchers in the discipline. It is available at http://clio-infra.eu. I also created an R package to access these data, available here.

Scholarly Measures of Politics

This is not a database, but rather an R package that aggregates some widely used datasets. It is available at https://github.com/xmarquez/democracyData. You can use it to access some widely used datasets, including Polity5, Freedom House, Geddes, Wright, and Frantz’ autocratic regimes dataset, the Lexical Index of Electoral Democracy, the DD/ACLP/PACL/CGV dataset, the main indexes of the V-Dem dataset, and many others. The datasets can be looked at by running democracyData::democracy_info inside R.

Correlates of War

Various datasets related to war, alliances and other geopolitical measures can be downloaded at https://correlatesofwar.org/data-sets/. Here is also a short tutorial on how to harmonize country codes. This dataset is also available in the peacesciencer package in R. This, in turn, is available here.

Eurostat

European statistical agency: contains a lot of demographic, economic and other indicators at different levels (country, region, municipality). Available at https://ec.europa.eu/eurostat/web/main/data/database. There is also an unofficial R package which you can use to access most of the data. A tutorial is available here.

Pew polls

Pew Research Center makes most of its survey data available for free online at https://www.pewresearch.org/tools-and-resources/. Then, one can download datasetes by research area. This includes datasets from the annual Global Attitudes Survey, a poll that asks adults in many countries about issues ranging from politics to economic conditions. Here is a blog providing a short introduction to the data.

Political Protests

ACLED provides data on political protests. You need to make a (free) account to access the data. This can be done by clicking the Register link at the bottom of the page (somewhat hidden).

In addition, there are datasets on a per-country basis available here. You also need an account for that.

IISG Datasets

The Dutch International Institute of Social History has a data repository containing a wide-array of (arguably) very specific datasets, coming from a large number of countries. One potentially interesting dataset here is the RISTAT (РИСТАТ) project, containing lots of historical statistical at the Guberniya (province) level in the Russian empire and beyond. These data are available here.

D-Place

D-PLACE is an open-source repository containing many data regarding cultures and environments. The famous Murdock Ethnographic Atlas is also available through this repository. In its own words, it “..is an attempt to bring together the dispersed corpus of information describing human cultural diversity. It aims to make it easy for individuals to contrast their own cultural practices with those of other societies, and to consider the factors that may underlie cultural similarities and differences.” It can be accessed here and the datasets can be downloaded here. There is also an R package as an inferface to this package, downloadable from https://github.com/matthewgthomas/dplacer.

Labor Conflicts

The International Institute of Social History (IISH) has made datasets regarding labor conflicts publicly available. This features datasets for several countries all over the world throughout history. It can be accessed here.

World Bank, IMF, PWT

The World Bank, IMF and Penn World Tables also contain many variables on the country and sometimes sub-country levels. I assume the reader to be familiar with these data sources, so it suffices to mention the URLs and potentially useful R packages here:

UCDP Conflict Data

The UCDP data is a repository containing a lot of dataset regarding conflicts, foreign policy, external support, candidate events, dyadic datasets, battle-related deaths and other indicators regarding conflicts. In their own words, “the Uppsala Conflict Data Program (UCDP) is the world’s main provider of data on organized violence and the oldest ongoing data collection project for civil war, with a history of almost 40 years. Its definition of armed conflict has become the global standard of how conflicts are systematically defined and studied. UCDP produces high-quality data, which are systematically collected, have global coverage, are comparable across cases and countries, and have long time series which are updated annually.” Some databases are geo-referenced in a very granular way. They can be accessed here.

Regional GDP

The Roses-Wolf database contains estimates of regional GDP (NUTS-2 level) for select European countries from about 1900-2015.

EH.net Repostiory

The EH.net repository contains various databases, mostly focused on the United States and finances. There are also a few datasets focusing on the UK.

Macrohistory Database

A database ( accessible here) with rates of returns per asset class per country, and balance sheets for the macroeconomy per country per year.

NBER Macrohistory Database

The NBER Macrohistory Database contains extensive data set that covers all aspects of the pre-WWI and interwar economies, including production, construction, employment, money, prices, asset market transactions, foreign trade, and government activity. Many series are highly disaggregated, and many exist at the monthly or quarterly frequency.

OECD Database

The OECD has detailed data on various country-level indicators of OECD countries. For example, there are decomposed measures of social expenditures, globalisation, finance, the environment, ICT, labor, etc. An R package is available here. Here is a short readme. In my experience, the package is still a little buggy, which may change in the future. Up until then, it might be best to just go to the web interface.

World Economic History Datasets

Several miscallaneous datasets are available on the world economic history website. Most of the datasets concern China and Indonesia.

International Conflict Research Databases

The ICR Databses, set up by researchers from UTH Zurich, have assembled many extensive datasets concerning ethnic and linguistic groups all over the world. Here I list a couple of these and give a short description:

  • Greg (Geo-Referencing of Ethnic Groups): https://icr.ethz.ch/data/greg/. Contains information based on the Soviet Atlas Narodov Mira, and supplemented by ETH Zurich researchers.
  • Geo-Epr: https://icr.ethz.ch/data/epr/geoepr/. Similar to Greg, but slightly more condensed. Contains only groups that are relevant for ethnic power relations.
  • C-shapes: https://icr.ethz.ch/data/cshapes/. Maps the borders and capitals as they shift over more than a century, from 1886 to 2019. Also directly accessible via R: install.packages("cshapes", dependencies = TRUE).
  • Side: https://icr.ethz.ch/data/side/The Spatially Interpolated Data on Ethnicity (SIDE) dataset is a collection of 253 near-continuous maps of local ethno-linguistic, religious, and ethno-religious settlement patterns in 47 low- and middle-income countries. These data are a generalization of ethnicity-related information in the geo-coded Demographic and Health Surveys (DHS). There is also an R package, see the web page for details.
  • Grow (Geographical Research On War): https://growup.ethz.ch/. Data that can be linked with the geographical data above. This data contains several sets of variables about Conflict Data (UCDP ACD, ACD2EPR), Group Hierarchy Data, Settlement Area Data (GeoEPR variables), Raster Aggregated Data (GRUMPv1 Population, DMSP Stable Nightlights, G-ECON GCP, GTOPO30 Elevation), Transnational Ethnic Kin (TEK) Data, and Ethnic Dimensions Data.
  • Ethnic Power Relations: https://icr.ethz.ch/data/epr/. A family of datasets all related to conflict, civil wars and ethnic and ethnolinguistic cleavages.

Soviet Historical Census

A collection of historical Soviet censuses (including other republics than the RSFSR), decomposed in a number of ways, including ethnicity, on a region-level. Available here (Russian).

Global Preferences Survey

A survey containing individual and country-level data on economic, risk and social preferences based on the 2018 QJE Article “Global evidence on economic preferences”. Available here. A short registration is required before the data can be accessed.

Global Gallup Datasets

Gallup has a couple of global datasets, sometimes geographically coded, on a number of topics, including risk, finance, food security, and urbanisation. Available here.

Comparitive Political Dataset

The “Comparative Political Data Set” (CPDS) is a collection of political and institutional country-level data provided by Klaus Armingeon, Sarah Engler and Lucas Leemann at the University of Zurich (Switzerland) and the Leuphana Universität (Germany). It contains data on the nature of governments and elections, and when which government was in power. Available here.

World Inequality Database

Data on income and wealth inequality throughout the ages for various countries. Available here.

COMTRADE

The United Nations Comtrade database aggregates detailed global annual and monthly trade statistics by product and trading partner for use by governments, academia, research institutes, and enterprises. Data compiled by the United Nations Statistics Division covers approximately 200 countries and represents more than 99% of the world’s merchandise trade. Available using various API tools as well as here.

Observatory of Economic Complexity

Based on COMTRADE data, available here. Unfortunately no longer free.

Atlas of Economic Complexity

The Atlas of Economic Complexity is an award-winning data visualization tool that allows people to explore global trade flows across markets, track these dynamics over time and discover new growth opportunities for every country. Available here.

Ricardo Database

The RICardo project is a comprehensive trade database that collects total and bilateral trade statistics of all countries in the world from the early nineteenth century (first data go back to 1787) until 1938. Available here.

Federico-Tena World Trade Historical Database

A database which contains annual series of trade by politics from 1800 to 1938 which sum up as series for continent and world. Available here.

GRIP Global Roads Database

A database with vector and raster shapefiles detailing all global roads, accessible here.

Statistical Agencies, Parliaments

This entry should serve as a repository for databases detailing countries, the data portal of their statistical agency (and package), and the data portal of their parliament, and other open data portals.

Country Description Link
Netherlands CBS Statistics Data Portal R Package, Python package, Documentation
Kazakhstan Open Data Portal Link
Spain Spanish Electoral Archive Link

Historical GIS Databases

This entry should serve as a repository for historical GIS databases. To be updated more extensively.

Country Description Link
Netherlands Netherlands Historical Municipalities (1812-2020) Link
France France Administrative Divisions, Other Divisions (1870-1940) Link
Russia Russian Empire (1897), Guberniya and District Boundaries Link
Germany Various Territorial Entities, Various Levels Link
China Various snapshots (1820-1990) of historical China Link
Austro-Hungarian Empire Including some German principalities, from abt. 1850 to WWI Link

There are also several other sources on the Historical GIS Network Website, but most of the links are defunct.

Municipal Data

https://www.insee.fr/fr/statistiques?debut=0&categorie=3

Country Description Link
France Municipal (Commune) Statistics Link
France INSEE (Communal/Regional) Statistics Link
Germany Statistikportal Deutschland Link
Germany Regionaldatenbank Deutschland Link
Germany Forschungsdatenzentrum Link
Netherlands Waar Staat Je Gemeente Link
Netherlands Volkstellingen (Censuses) Link
Netherlands Historical Database of Dutch Municipalities Link
Spain Instituto Nacional de Estadistica Link
Spain Historical Municipalities Database Paper and E-mail (Chapter 4 in the Paper)
Austria Ein Blick auf die Gemeinde Link
Belgium Statbel Link

Netherlands: You can also access to miscellaneous datasets from CBS Open Data in the following way:

library(cbsodataR)

toc <- cbsodataR::cbs_get_toc()
gem_dat <- toc |> filter(stringr::str_detect(Title, 'gemeente')) 

R Spatial Data Packages

  • Based on Moraga (2022):
    • The giscoR package contains open data at the country (or lower) level for European Union countries
    • The geodata package contains climate data (temperature, precipitation, wind speed) across time: in particular through the worldclim_country() function
    • The package chirps contains daily high-resolution precipitation, as well as daily maximum and minimum temperatures from the Climate Hazards Group database.
    • The elevatr package can download elevation data from Amazon Web Services (AWS) Terrain Tiles and OpenTopography Global Digital Elevation Models API. Through the get_elev_raster() function, it can be used to download elevation at the locations specified in argument locations and with a zoom specified in argument z
    • osmdata allows you to download data from OpenStreetMap. OpenStreetMap (OSM) is an open world geographic database updated and maintained by a community of volunteers. We can use the osmdata package (Padgham et al. 2023) to retrieve OSM data including rivers roads, shops, railway stations, and other buildings.
    • The rWBclimate data available here provides access to three different classes of climate data at two different spatial scales.
  • The spocc package is an interface to many species occurrence data sources including Global Biodiversity Information Facility (GBIF)
    • The wopr package provides access to the WorldPop Open Population Repository and provides estimates of population sizes for specific geographic areas
    • The rdhs package gives the users the ability to access and analyze the (geocoded) Demographic and Health Survey (DHS) data.
    • The openair package contains air quality data and other atmospheric composition data.
    • The malariaAtlas package can be used to download global malaria data hosted by the Malaria Atlas Project.
    • The GAEZr package ( Installable here) facilitate downloading and processing of Global Agro-Ecological Zones data in R
    • The rnaturalearth package makes mapping easy by making natural earth map data from Natural Earth Data available
    • The rWBclimate package ( here) is an R interface for the World Bank climate data used in the World Bank climate knowledge portal.
    • The climate package automatizes downloading of in-situ meteorological and hydrological data from publicly available repositories

Other Data Packages

  • ipumsr - A package port to the IPUMS database: IPUMS is the world’s largest publicly available population database, providing census and survey data from around the world integrated across time and space. An overview is available here
  • eurostat R package: A port to Eurostat. This is a database about European countries on aggregate (country) and more disaggregated levels
  • idbr: Instational Data Base package: access to the US Census Bureau’s International Data Base (IDB) API, and returns queries as R data frames. The IDB includes historical demographic data, current population estimates, and demographic projections to 2100 for countries of population 5,000 or greater that are recognized by the US Department of State
  • pewdata: A package helping to download Pew polls data sets.
  • glottospace: A package allowing for geospatial analysis of linguistic data. Contains a lot of linguistic data. See here for documentation and examples.
  • lingtypology: The lingtypology package connects R with the Glottolog database (v. 4.8) and provides an additional functionality for linguistic typology. The Glottolog database contains a catalogue of the world’s languages. More documentation here
  • WDI: The WDI package allows users to search and download data from over 40 datasets hosted by the World Bank, including the World Development Indicators (‘WDI’), International Debt Statistics, Doing Business, Human Capital Index, and Sub-national Poverty indicators. Available here
  • countrycode: countrycode standardizes country names, converts them into ~40 different coding schemes, and assigns region descriptors. Available here
  • dplacer: An R package to the D-Place database, with cultural and linguistic data including the Murdock ethnographic atlas available here
  • owidR: This package acts as an interface to Our World in Data datasets, allowing for an easy way to search through data used in over 3,000 charts and load them into the R environment. Available here

Regional GDP and HDI

Kummu et al. (Nature, 2018) present gap-filled multiannual datasets in gridded form for Gross Domestic Product (GDP) and Human Development Index (HDI). To provide a consistent product over time and space, the sub-national data were only used indirectly, scaling the reported national value and thus, remaining representative of the official statistics. This resulted in annual gridded datasets for GDP per capita (PPP), total GDP (PPP), and HDI, for the whole world at 5 arc-min resolution for the 25-year period of 1990–2015. Additionally, total GDP (PPP) is provided with 30 arc-sec resolution for three time steps (1990, 2000, 2015). Available here.

Also, DOSE is a substantially extended version of DOSE – the MCC-PIK Database Of Sub-national Economic Output. DOSE v2 contains harmonised data on reported economic output for:

  • 1,661 sub-national regions
  • across 83 countries
  • from 1953 to 2020
  • with sectoral detail for the agricultural, manufacturing and services sectors.

Available here

Linguistic Data

Mining Production

While the extraction of natural resources has been well documented and analysed at the national level, production trends at the level of individual mines are more difficult to uncover, mainly due to poor availability of mining data with sub-national detail. In this paper, we contribute to filling this gap by presenting an open database on global coal and metal mine production on the level of individual mines. It is based on manually gathered information from more than 1900 freely available reports of mining companies, where every data point is linked to its source document, ensuring full transparency. The database covers 1171 individual mines and reports mine-level production for 80 different materials in the period 2000–2021. Available here.

Gridded Data Collection GRID PRIO

A whole host of gridded datasets, including nightlights, is available here. The data come in .csv format. These need a grid reference file, which is also available somewhere on the website. Also available through: devtools::install_github("prio-data/priogrid"). The actual shapefile which allows you to combine these data is on this page.

Gridded Population

Road Networks

An overview of road networks per country:

Country Description Link
France ROUTE 500 Link
France Medieval Road Network Link
Netherlands Nationaal Wegen Bestand Link

Biodiversity

EEA Geospatial Data Catalogue

Cartes Historiques

  • A website detailing an overview and redirections to repositories with various historical shapefiles
Posted on:
December 6, 2022
Length:
16 minute read, 3383 words
See Also: