---
title: "`wbids` Data Model"
format: html
vignette: >
%\VignetteIndexEntry{`wbids` Data Model}
%\VignetteEngine{quarto::html}
%\VignetteEncoding{UTF-8}
---
## Overview
The diagram below shows the data model of the `wbids` package. In our design, we primarily have R users in mind, particularly those who rely heavily on the popular data manipulation packages `dplyr` and `data.table`. For these users, having consistent and descriptive primary key column names across different tables (e.g., `geography_id`, `series_id`) simplifies writing joins across tables, reduces the risk of column name conflicts, and avoids ambiguity. We hence deliberately deviate from a common practice in data modeling to omit the entity prefix from a entity table (e.g. `geography_` in the geographies table).
```{mermaid}
%%| fig-width: 12
%%| fig-height: 8
erDiagram
GEOGRAPHIES ||--o{ DEBT_STATISTICS : has
GEOGRAPHIES {
string geography_id PK
string geography_name
string geography_iso2code
string geography_type
string capital_city
string region_id
string region_iso2code
string region_name
string admin_region_id
string admin_region_iso2code
string admin_region_name
string lending_type_id
string lending_type_iso2code
string lending_type_name
}
SERIES ||--o{ DEBT_STATISTICS : has
SERIES ||--o{ SERIES_TOPICS : has
SERIES {
string series_id PK
string series_name
int source_id
string source_note
string source_organization
}
COUNTERPARTS ||--o{ DEBT_STATISTICS : has
COUNTERPARTS {
string counterpart_id PK
string counterpart_name
string counterpart_iso2code
string counterpart_iso3code
string counterpart_type
}
SERIES_TOPICS {
string series_id FK
int topic_id
string topic_name
}
DEBT_STATISTICS {
string series_id FK
string geography_id FK
string counterpart_id FK
int year
float value
}
```
## Table Details
### Geographies
| Column name | Description | Example value |
|-----------------|-------------------------------------|-----------------|
| geography_id | ISO 3166-1 alpha-3 code of the geography | ZMB |
| geography_name | Standardized name of the geography | Zambia |
| geography_iso2code | ISO 3166-1 alpha-2 code of the geography | ZM |
| geography_type | Type of geography (e.g., country, region) | Country |
| capital_city | Capital city of the geography | Lusaka |
| region_id | Unique identifier for the region | SSF |
| region_iso2code | ISO 3166-1 alpha-2 code of the region | ZG |
| region_name | Name of the region | Sub-Saharan Africa |
| admin_region_id | Unique identifier for the administrative region | SSA |
| admin_region_iso2code | Unique identifier for the administrative region | ZF |
| admin_region_name | Name of the administrative region | Sub-Saharan Africa (excluding high income) |
| lending_type_id | Unique identifier for the lending type | IDX |
| lending_type_iso2code | ISO code of the lending type | XI |
| lending_type_name | Name of the lending type | IDA |
### Counterparts
| Column name | Description | Example value |
|-----------------|-------------------------------------|-----------------|
| counterpart_id | Unique identifier for the counterpart | 730 |
| counterpart_name | Standardized name of the counterpart | China |
| counterpart_iso2code | ISO 3166-1 alpha-2 code of the counterpart | CN |
| counterpart_iso3code | ISO 3166-1 alpha-3 code of the counterpart | CHN |
| counterpart_type | Type of counterpart (e.g., institution, country, region) | Country |
### Series
| Column name | Description | Example value |
|-----------------|--------------------|----------------------------------|
| series_id | Unique identifier for the data series | DT.DOD.DPPG.CD |
| series_name | Name of the series | External debt stocks, public and publicly guaranteed (PPG) (DOD, current US\$) |
| source_id | Unique identifier for the data source | 2 |
| source_note | Note about the data source | Public and publicly guaranteed debt comprises long-term external obligations of public debtors, including the national government, Public Corporations, State Owned Enterprises, Development Banks and Other Mixed Enterprises, political subdivisions (or an agency of either), autonomous public bodies, and external obligations of private debtors that are guaranteed for repayment by a public entity. Data are in current U.S. dollars. |
| source_organization | Organization responsible for the data series | World Bank, International Debt Statistics. |
### Debt Statistics
| Column name | Description | Example value |
|----------------|--------------------------------|----------------|
| series_id | Identifier for the series | DT.DOD.DPPG.CD |
| geography_id | Identifier for the geography | ZMB |
| counterpart_id | Identifier for the counterpart | 061. |
| year | Year of the data point | 2020 |
| value | Value of the data point | 4298957000 |
## Assignment of Geography and Counterpart Types
The original World Bank IDS data includes a ‘country’ field, containing both countries and regions, and a ‘counterpart-area’ field, which may include countries, regions, and institutions. In our data model, these fields are renamed to ‘geography’ and ‘counterpart’ to clarify the types of entities in each column.
We also introduce corresponding type columns that specify whether a geography is a country (e.g., “Aruba”) or a region (e.g., “Africa Eastern and Southern”), and whether a counterpart is a country, region, or a special category (e.g., “Global IFIs”, “Global MDBs”). Each counterpart is represented in the geography table if it is a country or region, ensuring consistency across both tables.
## Harmonization of Geography and Counterpart Names
In some cases, the IDS data provides different names for geographies that appear both in the 'counterpart-area' and the 'country' data. We use the geography names whenever they are available and drop counterpart names with different wording. For instance, if the original data features "Cote D`Ivoire, Republic Of" in the counterpart table, but the country name is "Cote d'Ivoire", then we overwrite the former with the latter.