Data Science Corner (Beta)

We are starting a data science corner on TCdata360. Send us an email at contact email to tell us what you would like to see here. Meanwhile, here are a few starters.

Interactive exploratory tool featuring TCdata360 data

Explore TCdata360 data in a different way. We put together the Open TCdata360 Topology tool, a cloud visualization that groups countries according to the main TCdata360 indicators across time. This is an innovative way to look at big datasets. It is written in R and reads directly from TCdata360 API. This blog describes a few analytical possibilities of the tool. Open TCdata360 Topology.

Data Resources

APIs

Methodologies

Suggested Peers algorithm

Suggested Peers uses countries' similarities calculated by computing the distance between countries in an embedded country space following the t-SNE algorithm.

For each country, values are found for the following indicators:

    Export Basket Composition
  • Export Product Share - WITS
  • Import Product Share - WITS
  • ICT goods exports (% of total goods exports) - WDI
  • Agricultural raw materials exports (% of merchandise exports) - WDI
  • Human Capital
  • Adult literacy rate, population 15+ years, both sexes (%) - WDI
  • Current education expenditure, total (%of total expenditure in public institutions) - WDI
  • Labor force, total - WDI
  • Unemployment, total (% of total labor force) - WDI
  • Physical Capital
  • Gross fixed capital formation (current US$) - WDI
  • Gross capital formation (current US$) - WDI
  • GDP per capita
  • GDP per capita (US$) - GCI
  • GDP per capita (constant 2005 US$) - WDI
  • GDP per capita, PPP (current international $) - WDI
  • Population
  • Population - IMF WEO
  • Population, total - WDI

By-product indicators are split up into one 'indicator' for each product.

From these values, a data matrix A is constructed (where A_{ij} is the value of the jth indicator for the ith country). Missing values (not all indicators have values for all countries) are calculated as the mean of all present values for that indicator.

t-SNE is then run on A with a Euclidean metric (and perplexity of 40, early exaggeration of 4 and learning rate of 1000) to create a 2D embedding space.

A k-d tree is then created from the embedded countries, allowing us to efficiently perform (Euclidean) nearest neighbors search. For a given country, the similar countries are then defined to be:

  • The 4 nearest neighbours
  • And the 16 next nearest neighbours closer than a specified threshold (currently 1000)
giving between 4 and 20 similar countries as a result.

Measuring Export Competitiveness Algorithm

Measuring Export Competitiveness Country Comparator uses a methodology developed at the World Bank. Read more.

Available Indicators By Country