Empirical Regularities

1. Empirical Regularities#

import os
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
import pandas as pd
import numpy as np
from matplotlib.lines import Line2D
import seaborn as sns

Why the study of economic growth and its attendant economic development issues?

Before we begin with our study proper, let’s gaze at some data.

Like stargazers thousands of years before us, economists seek to discover how much observed, or more precisely, measured “reality” will motivate us with questions.

These questions further drive us to develop useful frameworks for causal reasoning.

1.1. The Maddison Dataset#

Let us begin by taking a look at the Maddison Project Database 2020 dataset. Some tasks are inspired by Omer Ozak’s notes.

STEP 1. Check or create folders for storing data and graphs.

data_dir = './data/'

if not os.path.exists(data_dir):
    os.mkdir(data_dir)

STEP 2. Get the data.

# Database vintage
db_year = "2020"
db_region_year = "2018"
stata_dformat = ".dta"

# Filenames
mad_stata = "Maddison" + db_year + stata_dformat
madregion_stata = "Maddison" + db_region_year + "_region" + stata_dformat

try:
    # if previously downloaded to directory ``data_dir``
    df_mad = pd.read_stata(data_dir + mad_stata)
    df_madregion = pd.read_stata(data_dir + madregion_stata)
except:
    # otherwise download from Groningen Growth and Development Centre ...
    mad_url = "https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/"
    df_mad = pd.read_stata(mad_url + "mpd2018.dta")
    df_madregion = pd.read_stata(mad_url + "mpd2018_region_data.dta")
    # then write/save to STATA file with new name
    df_mad.to_stata(data_dir + mad_stata, 
                    write_index=False, version=117)
    df_madregion.to_stata(data_dir + madregion_stata, 
                          write_index=False, version=117)

STEP 3. Quick inspection of the dataframes.

df_mad

	countrycode	country	year	gdppc	pop
0	AFG	Afghanistan	1820	NaN	3280.00000
1	AFG	Afghanistan	1870	NaN	4207.00000
2	AFG	Afghanistan	1913	NaN	5730.00000
3	AFG	Afghanistan	1950	1156.0000	8150.00000
4	AFG	Afghanistan	1951	1170.0000	8284.00000
...	...	...	...	...	...
21677	ZWE	Zimbabwe	2014	1594.0000	13313.99205
21678	ZWE	Zimbabwe	2015	1560.0000	13479.13812
21679	ZWE	Zimbabwe	2016	1534.0000	13664.79457
21680	ZWE	Zimbabwe	2017	1582.3662	13870.26413
21681	ZWE	Zimbabwe	2018	1611.4052	14096.61179

21682 rows × 5 columns

df_madregion

	region	region_name	year	cgdppc	rgdpnapc	pop
0	af	Africa	1870.0	NaN	NaN	NaN
1	af	Africa	1871.0	NaN	NaN	NaN
2	af	Africa	1872.0	NaN	NaN	NaN
3	af	Africa	1873.0	NaN	NaN	NaN
4	af	Africa	1874.0	NaN	NaN	NaN
...	...	...	...	...	...	...
1034	wd	World	2012.0	13821.0	13818.0	6992923.0
1035	wd	World	2013.0	14038.0	14090.0	7072213.0
1036	wd	World	2014.0	14261.0	14376.0	7152269.0
1037	wd	World	2015.0	14500.0	14616.0	7231375.0
1038	wd	World	2016.0	14574.0	14692.0	7311687.0

1039 rows × 6 columns

Note that the year series in df_madregion contains floating-point numbers or floats.

df_madregion["year"].dtype

dtype('float32')

That is not so nice. (Why?) Let’s change that into integers.

df_madregion["year"] = df_madregion["year"].astype(int)
df_madregion["year"].dtype

dtype('int64')

df_madregion

	region	region_name	year	cgdppc	rgdpnapc	pop
0	af	Africa	1870	NaN	NaN	NaN
1	af	Africa	1871	NaN	NaN	NaN
2	af	Africa	1872	NaN	NaN	NaN
3	af	Africa	1873	NaN	NaN	NaN
4	af	Africa	1874	NaN	NaN	NaN
...	...	...	...	...	...	...
1034	wd	World	2012	13821.0	13818.0	6992923.0
1035	wd	World	2013	14038.0	14090.0	7072213.0
1036	wd	World	2014	14261.0	14376.0	7152269.0
1037	wd	World	2015	14500.0	14616.0	7231375.0
1038	wd	World	2016	14574.0	14692.0	7311687.0

1039 rows × 6 columns

Let us re-index or project the original dataframes df and df_madregion by or onto the year series. This will turn out to be useful for plotting things later!

See this useful pandas lecture here for more information.

df = df_mad.set_index(["year"])
dfr = df_madregion.set_index(["year"])

Let’s have a look at a snippet of the dataframe. Here’s a first few rows:

df.head()

	countrycode	country	gdppc	pop
year
1820	AFG	Afghanistan	NaN	3280.0
1870	AFG	Afghanistan	NaN	4207.0
1913	AFG	Afghanistan	NaN	5730.0
1950	AFG	Afghanistan	1156.0	8150.0
1951	AFG	Afghanistan	1170.0	8284.0

Here’s the last few rows:

dfr.tail()

	region	region_name	cgdppc	rgdpnapc	pop
year
2012	wd	World	13821.0	13818.0	6992923.0
2013	wd	World	14038.0	14090.0	7072213.0
2014	wd	World	14261.0	14376.0	7152269.0
2015	wd	World	14500.0	14616.0	7231375.0
2016	wd	World	14574.0	14692.0	7311687.0

1.1.1. Postcards for our journey ahead#

We will also select a few countries for comparison.

# Specify your selected countries as list
country_list = ["United States", 
                "United Kingdom", 
                "Australia", 
                "China", 
                "Spain", 
                "Nigeria", 
                "Botswana",
                "Singapore",
                "Argentina",
                "Brazil",
                "India"]

We’ll reshape the dataframe df to be able to plot the time series of log real GDP by each country.

# Reshape df 
df_pivot = df.reset_index().pivot(index="year", 
                                  columns="country", 
                                  values="gdppc")
df_sel = df_pivot[country_list]
df_sel = df_sel.dropna().apply(np.log)

# Auto marker styles, for plotting
marker_style = list(Line2D.markers.keys())

Exercise 1.1

Why did we transform the real GDP data series above into their natural logarithms?

# Now plot it
title = "Log real GDP per capita"
df_sel.plot(
            ylabel=title,
            style=marker_style,
            linestyle="--",
            markersize=4.5,
           ).legend(
                    loc='best', 
                    bbox_to_anchor=(0.9, 0.5, 0.5, 0.5),
                   );

../../_images/49514ff9b48f511f109cf187ba195871f0d6e1376455396e69ebe9ca1c7eecfa.png

Exercise 1.2

Comment on the evolution of (log) per-capita GDP of the countries in the figure above.

Use the United States as a reference country. Comment on the trajectories of Singapore, Botswana, Brazil and Spain.

What economic questions can we potentially be asking?

We do the same too for the by-region sorted dataframe dfr:

dfr["lrgdpnapc"] = dfr["rgdpnapc"].apply(np.log)
dfr

	region	region_name	cgdppc	rgdpnapc	pop	lrgdpnapc
year
1870	af	Africa	NaN	NaN	NaN	NaN
1871	af	Africa	NaN	NaN	NaN	NaN
1872	af	Africa	NaN	NaN	NaN	NaN
1873	af	Africa	NaN	NaN	NaN	NaN
1874	af	Africa	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...
2012	wd	World	13821.0	13818.0	6992923.0	9.533728
2013	wd	World	14038.0	14090.0	7072213.0	9.553221
2014	wd	World	14261.0	14376.0	7152269.0	9.573316
2015	wd	World	14500.0	14616.0	7231375.0	9.589872
2016	wd	World	14574.0	14692.0	7311687.0	9.595058

1039 rows × 6 columns

dfr.groupby("region_name")["rgdpnapc"].plot(style=marker_style, 
                                            linestyle="--", 
                                            legend=True,
                                            );

../../_images/6348174622c9d8d52e819a05f6a0a0c02cbb95ab1a9d7e18bc9ac944129d0bee.png

1.1.2. Cross-country inequality over time#

Let’s take snapshots of the distribution of income per capita across countries.

We’ll pick a few same years for the snapshots to see how the distributions evolve of the decades.

# Sample snapshot years
select_years = [1950,1980,2008, 2018]

# Apply the sample years, transpose the dataframe
# just for ease of reading. We'll assign this to a 
# new dataframe, df_dens
df_dens = df_pivot.loc[select_years].T

Instead of plotting the empirical distributions as histograms, we’ll fit smooth kernel density estimators of these.

First, let’s take a look at the distributions over the level of real GDP per capita:

# Kernel density estimates
df_dens.plot.kde( style=marker_style, 
                  markersize=3.5,
                  linestyle="-",
                  title="Cross-country density estimates by GDP per capita",
                );

../../_images/29f86245f74e6f679fc00d0a21fdcf72dce7a4a66111bb2f6eda422aff5c6c06.png

Second, we will look at the densities of the natural log transform of the same data:

df_pivot_log = df_pivot.apply(np.log)
df_pivot_log_dens = df_pivot_log.loc[select_years].T

figure_title="Cross-country density estimates by log GDP per capita"
df_pivot_log_dens.plot.kde( style=marker_style,
                            markersize=3.5,
                            linestyle="-",
                            title=figure_title,
                          );

../../_images/2212f5b512a1a16c1356b1478c9d10bbc29839bde5341033e5cbbbf572cc3b66.png

Exercise 1.3

Comment on the figures above.

What can you conclude about world income per capita and inequality as time progressed?

What sorts of questions can we ask?

Hint: Read Chapter 1.1 to 1.4 of Acemoglu [1]!

1.2. Conditional Convergence#

Let’s work with a concrete example:

We want to import some data about countries and their long-run macroeconomic outcomes.

Original data source: Penn World Tables data provided through the Groningen Growth and Development Centre.

This example was adapted from Jon Conning’s material.

Let’s download the data …

data_dir = './data/'

if not os.path.exists(data_dir):
    os.mkdir(data_dir)

filename = "country.dta"
URL = "https://github.com/jhconning/Dev-II/blob/master/notebooks/data/"

try:
    # if previously downloaded to directory ``data_dir``
    df = pd.read_stata(data_dir + filename)
except:
    # otherwise download from J. Conning's repo ...
    # Escape to server shell (!) and use WGET to download to current directory
    !wget -L "https://github.com/jhconning/Dev-II/blob/master/notebooks/data/country.dta?raw=true" -O country.dta
    # import data as a Pandas dataframe:
    df = pd.read_stata("country.dta")
       
    # then write/save to directory ``data_dir``
    df.to_stata(data_dir + filename, 
                    write_index=False, version=117)

# Display dataframe content
df

	isocode	country	cont	ggdp	gpop	open60	sav60	lxrd60	lxrdav	savav	openav	lgdp60	lpop60
0	GHA	Ghana	Africa	2.990764	2.577389	67.866135	60.527485	0.852252	0.600432	13.394326	42.227081	-0.634748	1.939933
1	MAR	Morocco	Africa	2.453598	2.214207	46.277290	9.332626	0.503584	0.918777	11.590545	47.401264	0.515019	2.519584
2	COM	Comoros	Africa	-0.014582	2.878329	57.538712	6.700409	1.902229	1.482772	11.446098	57.030586	0.555129	-1.698821
3	MLI	Mali	Africa	0.755422	2.165016	42.134323	4.308089	1.916554	1.203254	8.405947	48.412022	0.020564	1.500997
4	GAB	Gabon	Africa	1.268704	2.521714	72.452179	20.357958	1.315058	0.710294	7.738871	94.919189	2.073140	-0.807430
...	...	...	...	...	...	...	...	...	...	...	...	...	...
92	CHL	Chile	S. America	1.848451	1.730081	29.795073	29.349722	0.650374	0.655199	18.732784	43.862640	1.846538	2.026219
93	DOM	Dominican Republic	S. America	2.663763	2.374352	68.969223	5.131092	0.606269	0.755264	10.170477	76.201607	0.998264	1.172943
94	BRB	Barbados	S. America	1.812256	0.411359	98.748535	5.806423	1.417267	0.932263	4.997591	114.205284	2.162862	-1.459558
95	URY	Uruguay	S. America	1.352161	0.688982	17.299664	14.389812	0.324334	0.583185	13.030991	33.351215	1.965475	0.928602
96	PAN	Panama	S. America	2.719175	2.262045	132.864792	14.117195	0.334513	0.488325	18.435352	155.472107	1.157346	0.137681

97 rows × 13 columns

The series of interest here are cross-country observations of

lgdp60 (the log of per-capita real GDP in the year 1960), and,
ggdp (the average growth rate between 1960 and 1990).

Let’s plot their relationship as scatterplots, clustered by:

all countries, and,
all countries excluding African continent countries.

1.2.1. All Countries#

g = sns.jointplot(x="lgdp60", y="ggdp", 
                  data=df, 
                  kind="reg",
                  color ="orange")

../../_images/b056765f508f588973c5cec224ea504d00c4072e198b6c63637eb20c9368c595.png

1.2.2. All but African countries#

g = sns.jointplot(x="lgdp60", y="ggdp", 
                  data=df[df.cont !="Africa"], 
                  kind="reg",
                  color ="green")

../../_images/269f77e3db42547ddd612896927d2bb8e1a21e0b836e62afc5a7861168bf8f84.png

1.2.3. Exercise#

Exercise 1.4

Comment on the two scatterplot above.

What does the last figure suggest about the possibility of some poorer countries catching up to the richer ones?

Exercise 1.5

What is conditional convergence in cross-country incomes per capita?

How do economists (Robert Barro) take the casual, scatterplot motivations above to find evidence of conditional convergence?

Provide a detailed discussion of the empirical method and model and the evidence discovered.

1.3. Further reading#

Quah [21] uses data from a large number of countries to analyze the dynamics of cross-country income inequality and economic growth. He was the first to document the “emerging twin-peaks” phenomenon in the cross-sectional distribution of income across countries. See also the survey article by Jones [12].

Acemoglu, Johnson, and Robinson [2] suggest that the world income distribution underwent a ‘reversal of fortune’ from 1500 to the present. In their study, they claim that formerly rich countries in the current developing world became poor while some poor countries had grown rich. They hypothesize that this reversal was driven by changes in institutions arising from European colonialism. (The authors use data on urbanization patterns and population density, arguing that these are good proxies for economic prosperity.) Acemoglu, Naidu, Restrepo, and Robinson [3] make a further causal claim that democracy is an important institution that has a positive effect on GDP per capita.

Empirical Regularities

Contents

1. Empirical Regularities#

1.1. The Maddison Dataset#

1.1.1. Postcards for our journey ahead#

1.1.2. Cross-country inequality over time#

1.2. Conditional Convergence#

1.2.1. All Countries#

1.2.2. All but African countries#

1.2.3. Exercise#

1.3. Further reading#