oTYPE html> Using sparseDFM - Nowcasting UK Trade in Goods (Exports)

Using sparseDFM - Nowcasting UK Trade in Goods (Exports)

Load the sparseDFM package and exports dataframe into R. We also require the gridExtra package for this vignette.

library(sparseDFM)
library(gridExtra)
data <- exports 

This vignette provides a tutorial on how to apply the package sparseDFM onto a large-scale data set for the purpose of nowcasting UK trade in goods. The data contains 445 columns, including 9 target series (UK exports of the 9 main commodities worldwide) and 434 monthly indicator series, and 226 rows representing monthly values from January 2004 to October 2022. For a small-scale example see the vignette inflation-example.

Introduction

Nowcasting1 is a method used in econometrics that involves estimating the current state of the economy based on the most recent data available. It is an important tool because it allows policy makers and businesses to make more informed decisions in real-time, rather than relying on outdated information due to publication delays that may no longer be accurate. Trade in Goods (imports and exports) is currently published with a 2 month lag by the UK’s Office for National Statistics (ONS), which is quite a long time to wait for current assessments of trade, especially during times of economic uncertainty or instability. Nowcasting UK trade information has become particularly important in recent years due to the key events of the Brexit referendum, held in 2016, and the coronavirus pandemic, reaching UK shores early 2020. While the cause of these shocks are drastically different, both have imposed restrictions on trade in goods.

We consider the task of understanding and nowcasting the movements of 9 monthly target series representing 9 of the main commodities the UK exports worldwide. These include:

These target series are released with a 2 month publication delay and hence the last two rows of the dataframe for these variables are missing. To try and estimate the targets in these months, we use a large collection of 434 monthly indicator series including:

This vignette uses the sparseDFM() function to fit a regular DFM and a Sparse DFM to the entire dataset of January 2004 to October 2022 with the goal of estimating the missing target series data in September and October of 2022. We explore the plot() and predict() capabilities of the package and assess the benefit of a sparse DFM in terms of interpreting factor structure and accuracy of predictions.

Exploring the Data

Before we fit any models it is first worthwhile to perform some exploratory data analysis to assess stationarity and missing data.

# Dimension of the data: n = 226, p = 445.
dim(data)
#> [1] 226 445

# Plot the 9 target series using ts.plot with a legend on the right 
def.par <- par(no.readonly = TRUE) # initial graphic parameters 
goods <- data[,1:9]
layout(matrix(c(1,2),nrow=1), width=c(4,3)) 
par(mar=c(5,4,4,0)) 
ts.plot(goods, gpars= list(col=10:1,lty=1:10))
par(mar=c(5,0,4,2)) 
plot(c(0,1),type="n", axes=F, xlab="", ylab="")
legend("center", legend = colnames(goods), col = 10:1, lty = 1:10, cex = 0.7)

par(def.par) # reset graphic parameters to initial

This plot provides us with the monthly dynamics of UK exports worldwide for 9 categories of goods. We see exports of machinery and transport being the largest. We also see two main drops during the 2009 and 2020 recessions and an upwards trend in the past year or so.

The only missing data present in the data is at the end of the sample during the months of September and October 2022 depending on publication delays of the variables. We can see this ragged edge2 structure at the end of the sample by zooming in on the past 12 months:

# last 12 months 
data_last12 = tail(data, 12)

# Missing data plot. Too many variable names so use.names is set to FALSE for clearer output.
missing_data_plot(data_last12, use.names = FALSE)

We see the 2 month delay for the targets and IoP, the 1 month delay for CPI, PPI, exchange rates, BCI and CCI, and no delay for google trends. We hope to exploit this available data when predicting September and October 2022.

Fitting the Models

We first make the data stationary by simply taking first-differences like so:

# first-differences correspond to stationary_transform set to 2 for each series
new_data = transformData(data, stationary_transform = rep(2,ncol(data)))

We now tune for the number of factors to use:

tuneFactors(new_data)
#> Data contains missing values: imputing data with fillNA()

#> [1] "The chosen number of factors using criteria type  2  is  7"

According to the Bai and Ng (2002)3 information criteria, the best number of factors to use is 7. However, the screeplot seems to suggest that after 4 factors, the addition of more factors does not add that much in terms of explaining the variance of the data. For this reason, we choose to use 4 factors when modelling.

We now fit a regular DFM and a Sparse DFM to the data with 4 factors:

# Regular DFM fit - takes around 18 seconds 
fit.dfm <- sparseDFM(new_data, r = 4, alg = 'EM')

# Sparse DFM fit - takes around 2 mins to tune 
# set q = 9 as the first 9 variables (targets) should not be regularised
# L1 penalty grid set to logspace(0.4,1,15) after exploration
fit.sdfm <- sparseDFM(new_data, r = 4, q = 9, alg = 'EM-sparse', alphas = logspace(0.4,1,15))

We can explore the convergence and tuning of each algorithm like so:

# Number of iterations the DFM took to converge
fit.dfm$em$num_iter
#> [1] 14

# Number of iterations the Sparse DFM took to converge at each L1 norm penalty 
fit.sdfm$em$num_iter
#>  [1] 17  6 14  2  2  2  3  3  3  3 18  3 12  5  5

# Optimal L1 norm penalty chosen
fit.sdfm$em$alpha_opt
#> [1] 4.54091

# Plot of BIC values for each L1 norm penalty 
plot(fit.sdfm, type = 'lasso.bic')

Estimated Factor Structure

We first explore the estimated factors and loadings for the regular DFM. We are able to group the indicator series into colours depending on the source of the indicator and use the type = "loading.grouplineplot" setting in plot(). We set the trade in goods (TiG) target black, IoP blue, CPI red, PPI pink, exchange rate (Exch) green, BCI & CCI (Conf) navy and google trends (GT) brown. This will make it easier to visualise which indicators are loading onto specific factors.

## Plot the estimated factors for the DFM
plot(fit.dfm, type = 'factor')


## Plot the estimated loadings for each of the 4 factors in a grid 

# Specify the name of the group each indicator belongs too
groups = c(rep('TiG',9), rep('IoP',89), rep('CPI',166), rep('PPI',153),
           rep('Exch',12), rep('Conf',2), rep('GT',14))

# Specify the colours for each of the groups 
group_cols = c('black','blue','red','pink','green','navy','brown')

# Plot the group lineplot in a 2 x 2 grid 
p1 = plot(fit.dfm, type = 'loading.grouplineplot', loading.factor = 1, group.names = groups, group.cols = group_cols)
p2 = plot(fit.dfm, type = 'loading.grouplineplot', loading.factor = 2, group.names = groups, group.cols = group_cols)
p3 = plot(fit.dfm, type = 'loading.grouplineplot', loading.factor = 3, group.names = groups, group.cols = group_cols)
p4 = plot(fit.dfm, type = 'loading.grouplineplot', loading.factor = 4, group.names = groups, group.cols = group_cols)

grid.arrange(p1, p2, p3, p4, nrow = 2)