% \VignetteIndexEntry{Vectorised objective functions} % \VignetteKeyword{optimize} \documentclass[a4paper,11pt]{article} \usepackage[left = 2.5cm, top = 2cm, bottom = 3cm, right = 3.5cm]{geometry} \usepackage[noae]{Sweave} \usepackage{mathptmx} \usepackage{amsmath,amstext} \usepackage{hyperref} \usepackage{natbib} \SweaveOpts{keep.source = TRUE, eps = TRUE, pdf = FALSE} \usepackage{color} \definecolor{grau2}{rgb}{.2,.2,.2} \definecolor{grau7}{rgb}{.7,.7,.7} % define *Sweave* layout \DefineVerbatimEnvironment{Sinput}{Verbatim}{} \DefineVerbatimEnvironment{Soutput}{Verbatim}{frame=single,xleftmargin=0em,% formatcom=\color{grau2},rulecolor=\color{grau7}} \DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em} \fvset{listparameters={\setlength{\topsep}{0pt}}} \renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}} <>= options(continue = " ", digits = 5) @ \begin{document} {\raggedright{\LARGE Vectorised objective functions}}\medskip \noindent Enrico Schumann\\ \noindent \texttt{es@enricoschumann.net}\\ \bigskip \section{Introduction} \noindent Heuristics often manipulate and evolve solutions through functions: new solutions are created as functions of existing solutions; solutions are evaluated through the objective function; whether new solutions are accepted is a function of (typically) the quality of the new solutions; and so on. This gives us much flexibility in how solutions are represented; in essence, any data structure (eg, a graph) could be directly handled, provided we define appropriate functions to work with it. Yet a number of (quite successful) heuristics, such as Differential Evolution (DE) or Particle Swarm (PS), prescribe precisely how solutions are represented and manipulated. In fact, these specific prescriptions essentially define those heuristics. For DE and PS, for instance, a solution is a numeric vector; new solutions are created as (noisy) linear combinations of existing solutions. While this reduces the algorithms' flexibility, it allows for a simpler (and more efficient) generic implementation. Let us be more concrete here. Since both DE and PS represent solutions as numerical vectors, a natural way to store the solutions is a matrix~$P$. In this matrix, each column is one solution; each row represents a specific decision variable. When we compute the objective function values for these solutions, a straightforward strategy is to loop over the columns of~$P$ and call the objective function for each solution. In this case, the objective function should take as arguments a single numeric vector (and possibly other data passed through \texttt{\ldots}); the function should return a single number. In somes cases, however, it may be preferable to actually write the objective function such that it expects the whole population as an argument, and then returns a vector of objective function values. To accommodate this behaviour, the functions \texttt{DEopt}, \texttt{GAopt} and \texttt{PSopt} have settings \texttt{algo\$loopFun}, in which `\texttt{Fun}' can be `\texttt{OF}' for objective function, but also, for instance, `\texttt{repair}'. These settings default to \texttt{TRUE}, so the functions will loop over the solutions. When such a loop-setting is \texttt{FALSE}, the respective function receives the whole population as an argument. In the next section we give three examples when this `evaluation in one step' can be advantegeous. The functions \texttt{DEopt}, \texttt{GAopt} and \texttt{PSopt} allow to implement the objective function (and also repair and penalty functions) like this. For more details and examples, see \citet{Gilli2011b}. \section{Examples for vectorised computations} We attach the package. <<>>= require("NMOF") @ \subsection{A test function} As an example, we use the Rosenbrock function, given by \begin{align*} \sum_{i=1}^{n-1}\left(100 (x_{i+1}-x_i^2)^2 + (1-x_i)^2\right)\,. \end{align*} This test function is available in the package as the function \texttt{tfRosenbrock} (see \texttt{?testFunctions}). The Rosenbrock function has a minimum of zero when all elements of $x$ are one. (In higher dimensions, this minimum may not be unique.) <<>>= tfRosenbrock @ So we define the objective function \texttt{OF} and test it with the known solution. <<>>= OF <- tfRosenbrock ## see ?testFunctions size <- 5L ## set dimension x <- rep.int(1, size) ## the known solution ... OF(x) ## ... should give zero @ We set the parameters for \texttt{DEopt}. Note that in this example we are only concerned with the speed of the computation, so the actual settings do not matter so much. <<>>= algo <- list(printBar = FALSE, nP = 50L, nG = 500L, F = 0.6, CR = 0.9, min = rep(-100, size), max = rep( 100, size)) @ Suppose we have several solutions, put into a matrix such that every column is one solution. Then we could rewrite the function like so: <<>>= ## a vectorised OF: works only with *matrix* x OF2 <- function(x) { n <- dim(x)[1L] xi <- x[1L:(n - 1L), ] colSums(100 * (x[2L:n, ] - xi * xi)^2 + (1 - xi)^2) } @ We can test it by creating a number of random solutions. <<>>= x <- matrix(rnorm(size * algo$nP), size, algo$nP) c(OF(x[ ,1L]), OF(x[ ,2L]), OF(x[ ,3L])) OF2(x)[1L:3L] ## should give the same result all.equal(OF2(x)[1L:3L], c(OF(x[ ,1L]), OF(x[ ,2L]), OF(x[ ,3L]))) @ As pointed out above, \texttt{DEopt} either can loop over the solutions, or it can evaluate the whole population in one step. The first behaviour is triggered when \texttt{algo\$loopOF} is set to \texttt{TRUE}, which is the default setting. When we want to use \texttt{OF2}, we need to set \texttt{algo\$loopOF} to \texttt{FALSE}. <<>>= set.seed(1223445) (t1 <- system.time(sol <- DEopt(OF = OF, algo = algo))) algo$loopOF <- FALSE set.seed(1223445) (t2 <- system.time(sol2 <- DEopt(OF = OF2, algo = algo))) @ We can compare the solutions, and compute the speedup. <<>>= sol$OFvalue ## both should be zero (with luck) sol2$OFvalue t1[[3L]]/t2[[3L]] ## speedup @ \subsection{Portfolio optimisation} A portfolio can be described by a weight vector $w$. Given a variance--covariance matrix $\Sigma$, we can calculate the variance of such a portfolio like so: \begin{align*} w' \Sigma w\,. \end{align*} Suppose now that we have a number of solutions, and we collect them in a matrix $W$, such that every column is one solution $w$. One approach would be now to loop over the columns, and for every column compute the variance. But we can use a one-line computation as well: the variances of the solutions are given by \begin{align*} \mathsf{diag}(W'\Sigma W)\,. \end{align*} This can be written consisely, but we are unnessarily computing the off-diagonal elements of the resulting matrix. One solution, then, is to recognise that $\mathsf{diag}(W'\Sigma W)$ is equivalent to \begin{align*} \iota'\underbrace{\overbrace{\ \Sigma W \ }^{\stackrel{\text{\tiny matrix}}{\text{\tiny multiplication}}} W}_% {\stackrel{\text{\tiny elementwise}}{\text{\tiny multiplication}}} \end{align*} which is consise and more efficient. The following example illustrates this. We start by setting up a variance--covariance matrix \texttt{Sigma} and a population \texttt{W}. (We would not need to include the budget constraint here since we are only interested in computing time.) <<>>= na <- 100L ## number of assets np <- 100L ## size of population trials <- seq_len(100L) ## for speed test ## a covariance matrix Sigma <- array(0.7, dim = c(na, na)); diag(Sigma) <- 1 ## set up population W <- array(runif(na * np), dim = c(na, np)) ## budget constraint scaleFun <- function(x) x/sum(x); W <- apply(W, 2L, scaleFun) @ Now we can test the three variants described above. <<>>= ## variant 1 t1 <- system.time({ for (i in trials) { res1 <- numeric(np) for (j in seq_len(np)) { w <- W[ ,j] res1[j] <- w %*% Sigma %*% w } } }) ## variant 2 t2 <- system.time({ for (i in trials) res2 <- diag(t(W) %*% Sigma %*% W) }) ## variant 3 t3 <- system.time({ for (i in trials) res3 <- colSums(Sigma %*% W * W) }) @ All three computations should give the same result. <<>>= all.equal(res1,res2) all.equal(res2,res3) @ But the first variant requires more code than the others, and it is slower. <<>>= ## time required # ... variant 1 t1 ## ... variant 2 t2 ## ... variant 3 t3 @ \subsection{Residuals in a linear model} We wish to compute the residuals~$r$ of a linear model, $y=X\theta+r$. Suppose we have a population $\Theta$ of solution vectors; each column in $\Theta$ is one particular solution $\theta$. Now, as before we could compute \begin{align*} \text{r}=y-X\theta_i \end{align*} for every $i \in \{1, \ldots, \text{population size}\}$. Alternatively, we may replace the loop over those solutions with the computation \begin{align*} \text{R}=y\iota'-X\Theta\,, \end{align*} in which $R$ is the matrix of residuals. Again, an example. As before, we set up random data and a random population of solutions. <<>>= n <- 100L # number of observation p <- 5L # number of regressors np <- 100L # population size trials <- seq_len(1000L) ## random data X <- array(rnorm(n * p), dim = c(n, p)) y <- rnorm(n) ## random population Theta <- array(rnorm(p * np), dim = c(p, np)) ## empty residuals matrix R1 <- array(NA, dim = c(n, np)) @ Now we can compare both variants. <<>>= system.time({ for (i in trials) for (j in seq_len(np)) R1[ ,j] <- y - X %*% Theta[ ,j] }) system.time({ for (i in trials) R2 <- y - X %*% Theta }) @ Note that we have not explicitly computed $y \iota'$ but have used \textsf{R}'s recycling rule. We check whether we actually obtain the same result. <<>>= all.equal(R1, R2) ## ... should be TRUE @ See Chapter~14 in \citet{Gilli2011b}. \bibliographystyle{plainnat} \bibliography{NMOF} \end{document}