Data Science Corner (Beta)
Interactive exploratory tool featuring TCdata360 dataExplore TCdata360 data in a different way. We put together the Open TCdata360 Topology tool, a cloud visualization that groups countries according to the main TCdata360 indicators across time. This is an innovative way to look at big datasets. It is written in R and reads directly from TCdata360 API. This blog describes a few analytical possibilities of the tool. Open TCdata360 Topology.
Packages and Libraries
Suggested Peers algorithm
Suggested Peers uses countries' similarities calculated by computing the distance between countries in an embedded country space following the t-SNE algorithm.
For each country, values are found for the following indicators:
Export Basket Composition
- Export Product Share - WITS
- Import Product Share - WITS
- ICT goods exports (% of total goods exports) - WDI
- Agricultural raw materials exports (% of merchandise exports) - WDI Human Capital
- Adult literacy rate, population 15+ years, both sexes (%) - WDI
- Current education expenditure, total (%of total expenditure in public institutions) - WDI
- Labor force, total - WDI
- Unemployment, total (% of total labor force) - WDI Physical Capital
- Gross fixed capital formation (current US$) - WDI
- Gross capital formation (current US$) - WDI GDP per capita
- GDP per capita (US$) - GCI
- GDP per capita (constant 2005 US$) - WDI
- GDP per capita, PPP (current international $) - WDI Population
- Population - IMF WEO
- Population, total - WDI
By-product indicators are split up into one 'indicator' for each product.
From these values, a data matrix is constructed (where is the value of the th indicator for the th country). Missing values (not all indicators have values for all countries) are calculated as the mean of all present values for that indicator.
t-SNE is then run on with a Euclidean metric (and perplexity of 40, early exaggeration of 4 and learning rate of 1000) to create a 2D embedding space.
A k-d tree is then created from the embedded countries, allowing us to efficiently perform (Euclidean) nearest neighbors search. For a given country, the similar countries are then defined to be:
- The 4 nearest neighbours
- And the 16 next nearest neighbours closer than a specified threshold (currently 1000)