if(FALSE){
install.packages(c("data.table", "R.utils", "ggplot2", "torch"))
torch::torch_tensor(pi)
rmarkdown::render("2023-04-19-deep-learning.Rmd")
}
library(data.table)
library(ggplot2)
In R one of the biggest strengths is the variety of packages that are available for different kinds of data analysis. In this tutorial we use packages:
data.table
for reading data from disk, and converting into a format useful for visualization.ggplot2
for visualizing the data and results.torch
for machine learning using linear models and neural networks.In R there are two methods for accessing objects which are exported from packages:
pkg::fun
means to get fun
which is exported from pkg
, for example data.table::fread
is a function for reading text data files into R. This is useful for teaching/understanding because it is explicit (easy to see which package each object comes from), so it should be preferred for most use cases.library(package)
to attach all of the exported functions in package. For example after doing library(data.table)
you can just write fread
(without data.table::
). This is useful for convenience (no need to use the double colon syntax), but it hides which package each object comes from (potentially confusing), so it should be used sparingly.Download zip code image data set.
url.vec <- c(
zip.gz="https://web.stanford.edu/~hastie/ElemStatLearn/datasets/zip.train.gz")
for(f in names(url.vec)){
u <- url.vec[[f]]
if(!file.exists(f)){
download.file(u, f)
}
}
Read data from gzipped plain text (one row per image, columns separated by spaces) into R as data table.
zip.dt <- fread("zip.gz")
Convert data table to two multi-dimensional arrays (one for inputs, one for outputs).
zip.feature.dt <- zip.dt[,-1]
n.features <- ncol(zip.feature.dt)
(zip.pixels <- sqrt(n.features))
## [1] 16
n.images <- nrow(zip.dt)
zip.X.array <- array(
unlist(zip.feature.dt),
c(n.images, 1, zip.pixels, zip.pixels))
str(zip.X.array)
## num [1:7291, 1, 1:16, 1:16] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
zip.y.array <- array(zip.dt$V1, n.images)
str(zip.y.array)
## num [1:7291(1d)] 6 5 4 7 3 6 3 1 0 1 ...
table(zip.y.array)
## zip.y.array
## 0 1 2 3 4 5 6 7 8 9
## 1194 1005 731 658 652 556 664 645 542 644
Visualize one image.
image(zip.X.array[1,,,])
Below we convert several digits to long/tall form for display. Note data table syntax below, by=observation
is like a for loop over observations, and the data tables returned for each value of observation
are combined into a single result table, zip.some
, with all of the images.
(zip.some <- data.table(observation=1:12)[, {
data.table(
label=zip.y.array[observation],
col=rep(1:zip.pixels, zip.pixels),
row=rep(1:zip.pixels, each=zip.pixels),
intensity=as.numeric(zip.X.array[observation,,,]))
}, by=observation])
## observation label col row intensity
## 1: 1 6 1 1 -1
## 2: 1 6 2 1 -1
## 3: 1 6 3 1 -1
## 4: 1 6 4 1 -1
## 5: 1 6 5 1 -1
## ---
## 3068: 12 0 12 16 -1
## 3069: 12 0 13 16 -1
## 3070: 12 0 14 16 -1
## 3071: 12 0 15 16 -1
## 3072: 12 0 16 16 -1
Display images using a panel/facet for each observation.
breaks <- c(1, zip.pixels)
ggplot()+
geom_tile(aes(
x=col, y=row, fill=intensity),
data=zip.some)+
facet_wrap(observation + label ~ ., labeller=label_both)+
scale_x_continuous(breaks=breaks)+
scale_y_reverse(breaks=breaks)+
coord_equal()+
theme_bw()+
theme(panel.spacing=grid::unit(0, "lines"))+
scale_fill_gradient(low="black", high="white")
Torch is a machine learning library which is popular for deep learning, because it provides automatic differentiation (also known as auto-grad or automatic gradients), which makes it easy to learn parameters in deep neural networks.
The popular python module torch
uses a C++ library libtorch
under the hood, which R package torch
also uses, so it has most of the same functionality/concepts, and similar naming conventions.
The main data structure in torch is a tensor, which is analogous to the R multi-dimensional array. Below we convert R arrays to torch tensors.
(zip.y.tensor <- torch::torch_tensor(zip.y.array+1L, torch::torch_long()))
## torch_tensor
## 7
## 6
## 5
## 8
## 4
## 7
## 4
## 2
## 1
## 2
## 8
## 1
## 2
## 2
## 8
## 8
## 5
## 9
## 1
## 2
## 5
## 9
## 8
## 5
## 9
## 8
## 4
## 8
## 5
## 2
## ... [the output was truncated (use n=-1 to disable)]
## [ CPULongType{7291} ]
str(zip.y.tensor)
## Long [1:7291]
typeof(zip.y.array)
## [1] "double"
(zip.X.tensor <- torch::torch_tensor(zip.X.array))
## torch_tensor
## (1,1,.,.) =
## Columns 1 to 9 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.7970 0.2780
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.2570 0.9090 1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -0.9380 0.1000 0.9500 1.0000 0.8770
## -1.0000 -1.0000 -1.0000 -0.6830 0.5400 1.0000 1.0000 0.3000 -0.8240
## -1.0000 -0.9920 -0.4100 0.8250 1.0000 0.9220 -0.1620 -0.9610 -1.0000
## -0.6310 0.2970 1.0000 1.0000 0.7780 -0.4390 -1.0000 -1.0000 -0.9050
## 0.8620 1.0000 0.9860 0.5620 -0.7150 -1.0000 -1.0000 -1.0000 0.1450
## -0.1670 0.3070 -0.5650 -1.0000 -1.0000 -1.0000 -1.0000 -0.5500 0.9770
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.9870 0.4850 1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.7140 0.9960 1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.8320 0.8670 1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 0.0920 0.9900
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.7450
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
##
## Columns 10 to 16 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -0.9500 -0.6300 -0.6770 -0.9030 -1.0000 -1.0000 -1.0000
## 0.8470 1.0000 1.0000 0.7920 -0.4520 -1.0000 -1.0000
## 1.0000 1.0000 1.0000 1.0000 0.8280 -0.4830 -1.0000
## 0.3270 0.0680 0.7530 1.0000 1.0000 0.8130 -0.9740
## -1.0000 -0.9250 0.3410 1.0000 1.0000 1.0000 -0.4290
## -1.0000 0.1130 1.0000 1.0000 1.0000 1.0000 0.3040
## 0.3550 0.9600 0.7070 0.5360 1.0000 1.0000 0.8230
## 1.0000 0.3080 -0.9420 0.1840 1.0000 1.0000 1.0000
## 0.6550 -0.8840 -1.0000 0.8120 1.0000 1.0000 0.4820
## -0.1090 -1.0000 -1.0000 0.8370 1.0000 1.0000 -0.4740
## -0.1850 -0.0750 0.5450 0.9780 1.0000 0.2190 -0.9910
## ... [the output was truncated (use n=-1 to disable)]
## [ CPUFloatType{7291,1,16,16} ]
str(zip.X.tensor)
## Float [1:7291, 1:1, 1:16, 1:16]
typeof(zip.X.array)
## [1] "double"
An important difference between R and torch is that R is very permissive about types (double/numeric can be used for almost anything), whereas torch is very strict (single precision float typically used for inputs, long int used for output).
Before explaining deep learning (neural networks), we first explain linear models. To implement a linear model in torch, we first need to flatten each 2d image back to a 1d vector of pixel intensity values,
flatten <- torch::nn_flatten()
(zip.X.flat <- flatten(zip.X.tensor))
## torch_tensor
## Columns 1 to 10-1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.9600
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.9220
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -0.9960 -0.5530 -0.9350 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.6010 -0.8260 -0.9020
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -0.6250 0.6200 -0.4540
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000 -1.0000
## ... [the output was truncated (use n=-1 to disable)]
## [ CPUFloatType{7291,256} ]
str(zip.X.flat)
## Float [1:7291, 1:256]
A linear model for classification is defined by two kinds of learned parameters. First, a matrix of real-valued weights, with one row for each output class, and one column for each input feature. Second, an intercept/bias, which is a vector of real numbers, one for each output class.
(n.classes <- length(unique(zip.y.array)))
## [1] 10
torch::torch_manual_seed(1)#for reproducibility.
linear <- torch::nn_linear(n.features, n.classes)
str(linear$parameters)
## List of 2
## $ weight:Float [1:10, 1:256]
## $ bias :Float [1:10]
Calling the linear object with the flattened data as input, yields a matrix of predicted scores as output (one row per image, one column per class, larger scores mean more likely to predict that class).
(zip.linear.pred <- linear(zip.X.flat))
## torch_tensor
## Columns 1 to 6 8.3722e-02 4.0556e-01 -2.1138e-01 -1.4984e-01 -8.6050e-01 8.7388e-02
## 1.5125e-01 3.1454e-01 7.8810e-01 -7.8996e-01 2.6592e-01 1.1697e-01
## -5.5851e-01 -9.3562e-02 -4.3682e-01 4.9550e-01 2.0649e-01 -2.1504e-01
## -1.0399e+00 8.6033e-01 -6.8142e-02 2.4952e-01 6.5793e-02 1.0810e-01
## 8.1950e-02 1.9625e-01 6.1302e-01 -3.7451e-02 1.5670e-01 -1.7778e-01
## -8.1263e-01 1.4397e-01 8.8294e-02 -1.7617e-01 3.0649e-01 5.3261e-01
## -4.3530e-01 2.6380e-03 1.5146e-01 9.2552e-02 -3.7762e-01 2.6511e-01
## -3.2516e-01 5.2262e-01 1.0229e-02 8.4166e-02 -5.1177e-01 -4.0212e-01
## 2.8215e-01 9.8304e-02 4.7157e-01 1.8188e-01 -5.1238e-01 2.2311e-02
## -1.8938e-01 6.6965e-01 -7.9038e-02 1.8598e-01 -5.6698e-01 -4.3820e-01
## 1.6045e-01 1.5566e-01 9.5257e-02 6.5550e-01 5.5383e-01 9.3469e-01
## -4.3449e-01 1.6752e-01 2.1992e-01 -1.6706e-01 -8.0140e-02 1.5735e-01
## -2.1069e-01 5.1881e-01 9.3807e-02 2.1360e-01 -4.7565e-01 -6.1004e-01
## -2.2263e-01 5.3057e-01 7.2251e-02 2.2420e-01 -5.2688e-01 -5.7601e-01
## -8.1430e-01 6.1648e-01 -7.8843e-02 3.6643e-01 1.0322e-01 5.3075e-01
## -7.0662e-01 5.7915e-02 8.3268e-02 -1.1987e-01 4.4936e-01 -9.4456e-03
## -1.3251e-01 2.2643e-01 -2.5267e-01 6.8703e-01 -1.1746e-01 -4.9890e-01
## 1.5346e-01 6.5914e-01 2.3883e-01 1.1236e+00 -1.2175e-01 2.3762e-01
## 2.5917e-01 2.5903e-01 -4.0113e-01 4.0930e-01 7.9426e-02 2.2673e-01
## -2.2484e-01 4.4847e-01 6.4907e-01 4.0093e-01 -1.6742e-01 1.1783e-01
## -1.5034e-01 1.6825e-01 3.4139e-01 2.9184e-01 -1.1814e+00 -1.5261e-01
## 1.4874e-01 7.8191e-01 3.5824e-01 7.3465e-01 -4.4917e-01 -5.9663e-02
## -1.4576e+00 1.0044e+00 1.5806e-01 1.5137e-01 1.3848e-01 3.1238e-01
## -1.1245e+00 -1.1530e-01 -5.5778e-01 8.9810e-03 -2.4352e-01 -6.6138e-01
## 5.3762e-01 -4.9214e-01 5.9736e-01 -1.1622e-01 9.6620e-01 -9.5962e-02
## -1.1260e+00 3.1140e-01 -2.6743e-01 1.9571e-03 5.1711e-01 -2.4674e-02
## -1.7920e-01 7.9927e-01 6.7342e-01 4.3888e-02 -5.6315e-01 2.6405e-01
## -9.5916e-01 4.4528e-01 -4.3726e-01 1.2315e-01 3.3845e-01 3.6729e-01
## -4.9499e-01 2.8039e-01 -3.3418e-01 8.1388e-02 6.0696e-01 -3.5467e-01
## -2.8936e-01 7.5328e-01 7.9180e-02 3.3542e-01 -6.8469e-01 -4.2743e-01
## ... [the output was truncated (use n=-1 to disable)]
## [ CPUFloatType{7291,10} ][ grad_fn = <AddmmBackward0> ]
str(zip.linear.pred)
## Float [1:7291, 1:10]
Rather than using flatten and linear operations in separate lines of code, below we combine them into a sequential object,
(sequential <- torch::nn_sequential(flatten, linear))
## An `nn_module` containing 2,570 parameters.
##
## ── Modules ─────────────────────────────────────────────────────────────────────
## • 0: <nn_flatten> #0 parameters
## • 1: <nn_linear> #2,570 parameters
zip.sequential.pred <- sequential(zip.X.tensor)
str(zip.sequential.pred)
## Float [1:7291, 1:10]
What are the current predicted classes? We have not yet done any learning, and the initialization is random for the weight/bias parameters, so the initial predictions are very bad, as we see below.
pred.class.vec <- apply(zip.sequential.pred, 1, which.max)
zip.y.int <- as.integer(zip.y.tensor)
table(prediction=pred.class.vec, label=zip.y.int)
## label
## prediction 1 2 3 4 5 6 7 8 9 10
## 1 203 2 32 7 3 18 7 0 7 0
## 2 30 149 63 62 23 72 47 40 21 61
## 3 106 39 165 146 41 113 185 10 100 12
## 4 25 9 43 70 77 48 4 34 20 45
## 5 4 0 60 7 11 8 0 3 36 5
## 6 141 0 34 6 12 79 59 2 28 15
## 7 130 0 7 6 0 2 14 0 1 0
## 8 449 0 123 283 347 99 98 258 247 239
## 9 44 0 53 34 20 38 19 0 20 0
## 10 62 806 151 37 118 79 231 298 62 267
Above is the confusion matrix (predicted classes on rows, true label on columns, entries are counts of images), and below is the error rate, which is the diagonal of the confusion matrix.
is.error <- pred.class.vec != zip.y.int
(percent.error <- 100*mean(is.error))
## [1] 83.04759
Above is the linear model prediction error at initialization, below is the baseline featureless model error (always predict most frequent class label),
(zip.y.tab <- table(zip.y.int))
## zip.y.int
## 1 2 3 4 5 6 7 8 9 10
## 1194 1005 731 658 652 556 664 645 542 644
(most.frequent.count <- zip.y.tab[which.max(zip.y.tab)])
## 1
## 1194
(most.frequent.label <- as.integer(names(most.frequent.count)))
## [1] 1
100*mean(zip.y.int!=most.frequent.label)
## [1] 83.62365
To improve predictions, we need to use a gradient descent learning algorithm. First we compute a differentiable loss (cross entropy loss in this case, multi-class classification), then we compute gradients using the backward method, then we update parameters by taking steps in the negative gradient direction. Below we define the loss and optimizer,
loss.fun <- torch::nn_cross_entropy_loss()
step.size <- 0.1
optimizer <- torch::optim_sgd(sequential$parameters, lr=step.size)
Below we compute predictions and loss,
zip.sequential.pred <- sequential(zip.X.tensor)
(loss.tensor <- loss.fun(zip.sequential.pred, zip.y.tensor))
## torch_tensor
## 2.18719
## [ CPUFloatType{} ][ grad_fn = <NllLossBackward0> ]
Below we set gradients to zero, then call backward to compute and store gradients of the loss with respect to model parameters.
optimizer$zero_grad()
linear$bias$grad
## torch_tensor
## [ Tensor (undefined) ]
loss.tensor$backward()
linear$bias$grad
## torch_tensor
## 0.01 *
## -8.9679
## -2.0456
## 1.0085
## 1.4969
## -1.2661
## 1.5369
## -1.9076
## 4.6543
## 1.2375
## 4.2530
## [ CPUFloatType{10} ]
Below we use the gradients to update the model parameters.
linear$bias
## torch_tensor
## 0.01 *
## 4.7054
## -3.1756
## -0.6997
## -3.6115
## -4.5345
## -4.8868
## -4.4242
## -3.7722
## 2.0563
## -0.8581
## [ CPUFloatType{10} ][ requires_grad = TRUE ]
optimizer$step()
## NULL
linear$bias
## torch_tensor
## 0.01 *
## 5.6022
## -2.9711
## -0.8005
## -3.7611
## -4.4079
## -5.0405
## -4.2334
## -4.2376
## 1.9325
## -1.2834
## [ CPUFloatType{10} ][ requires_grad = TRUE ]
And that’s the basic idea of gradient descent learning! Just keep updating the model parameters until you get good predictions!
Keep in mind our goal is generalization, meaning we want to get good predictions on new data/images that the algorithm was not trained on. We therefore need to use cross-validation for two purposes:
First we need to split the whole data set into train and test, using cross-validation.
cv_list <- function(index.vec, keep, hold.out, n.folds=3){
uniq.folds <- 1:n.folds
fold.vec <- sample(rep(uniq.folds, l=length(index.vec)))
out.list <- list()
for(fold in uniq.folds){
is.held.out <- fold.vec==fold
fold.list <- list()
fold.list[[hold.out]] <- index.vec[is.held.out]
fold.list[[keep]] <- index.vec[!is.held.out]
out.list[[paste(hold.out,"fold",fold)]] <- fold.list
}
out.list
}
set.seed(1)
train.test.cv.list <- cv_list(1:n.images, "train", "test")
str(train.test.cv.list)
## List of 3
## $ test fold 1:List of 2
## ..$ test : int [1:2431] 4 6 7 11 15 18 21 22 27 28 ...
## ..$ train: int [1:4860] 1 2 3 5 8 9 10 12 13 14 ...
## $ test fold 2:List of 2
## ..$ test : int [1:2430] 2 3 14 17 19 24 25 31 32 34 ...
## ..$ train: int [1:4861] 1 4 5 6 7 8 9 10 11 12 ...
## $ test fold 3:List of 2
## ..$ test : int [1:2430] 1 5 8 9 10 12 13 16 20 23 ...
## ..$ train: int [1:4861] 2 3 4 6 7 11 14 15 17 18 ...
Above we used K-fold cross-validation to create several train/test splits, represented in R as a list of lists of indices. Below we split the train set into subtrain and validation sets.
test.fold <- 1
train.test.index.list <- train.test.cv.list[[test.fold]]
sub.val.cv.list <- cv_list(
train.test.index.list$train, "subtrain", "validation")
str(sub.val.cv.list)
## List of 3
## $ validation fold 1:List of 2
## ..$ validation: int [1:1620] 8 9 13 16 17 19 35 41 42 45 ...
## ..$ subtrain : int [1:3240] 1 2 3 5 10 12 14 20 23 24 ...
## $ validation fold 2:List of 2
## ..$ validation: int [1:1620] 5 12 14 20 24 26 31 32 36 47 ...
## ..$ subtrain : int [1:3240] 1 2 3 8 9 10 13 16 17 19 ...
## $ validation fold 3:List of 2
## ..$ validation: int [1:1620] 1 2 3 10 23 25 34 37 38 39 ...
## ..$ subtrain : int [1:3240] 5 8 9 12 13 14 16 17 19 20 ...
Below we fix one subtrain/validation split,
validation.fold <- 1
sub.val.index.list <- sub.val.cv.list[[validation.fold]]
We compute the average loss and gradient over the entire subtrain set below,
get_pred_loss <- function(model, index.vec){
X.subset <- zip.X.tensor[index.vec,,,,drop=FALSE]
y.subset <- zip.y.tensor[index.vec]
pred.subset <- model(X.subset)
list(pred=pred.subset, loss=loss.fun(pred.subset, y.subset))
}
optimizer$zero_grad()
(subtrain.loss <- get_pred_loss(sequential, sub.val.index.list$subtrain)$loss)
## torch_tensor
## 1.81546
## [ CPUFloatType{} ][ grad_fn = <NllLossBackward0> ]
subtrain.loss$backward()
optimizer$step()
## NULL
Above is the mean loss over the subtrain set, which can be used for gradients (deterministic optimization), or for monitoring convergence (subtrain loss should decrease with each step/epoch, if step size is chosen appropriately). For monitoring convergence and avoiding overfitting, we would like to compute the loss and error rate, as in the function below.
loss_one_set <- function(set.indices, model){
torch::with_no_grad({#for efficiency, gradients not necessary.
set.label.vec <- zip.y.tensor[set.indices]
set.pred.loss <- get_pred_loss(model, set.indices)
set.pred.class <- apply(set.pred.loss$pred, 1, function(prob.vec){
if(any(is.na(prob.vec)))NA_integer_ else which.max(prob.vec)
})
error.rate <- if(any(is.na(set.pred.class))){
NA_real_
}else{
mean(as.numeric(set.label.vec != set.pred.class))
}
rbind(
data.table(variable="loss",value=as.numeric(set.pred.loss$loss)),
data.table(variable="error.percent",value=100*error.rate))
})
}
loss_one_set(sub.val.index.list$subtrain, sequential)
## variable value
## 1: loss 1.575817
## 2: error.percent 40.771605
To compute loss for each set, we use the function below,
loss_each_set <- function(list.of.indices, model){
data.table(set=names(list.of.indices))[, {
loss_one_set(list.of.indices[[set]], model)
}, by=set]
}
loss_each_set(sub.val.index.list, sequential)
## set variable value
## 1: validation loss 1.561952
## 2: validation error.percent 39.629630
## 3: subtrain loss 1.575817
## 4: subtrain error.percent 40.771605
In previous sections we have computed loss and gradients over all the observations/images in the subtrain set (deterministic optimization). For learning it is more common to compute gradients with respect to a random sample of observations/images (stochastic optimization), as below,
get_batch_list <- function(index.vec, batch_size=200){
n_data <- length(index.vec)
n_batches <- ceiling(n_data / batch_size)
batch.vec <- sample(rep(1:n_batches,each=batch_size)[1:n_data])
split(index.vec, batch.vec)
}
big.batches <- get_batch_list(sub.val.index.list$subtrain, 1000)
str(big.batches)
## List of 4
## $ 1: int [1:1000] 2 10 12 23 25 34 46 47 60 74 ...
## $ 2: int [1:1000] 14 26 32 36 37 55 64 69 70 82 ...
## $ 3: int [1:1000] 1 5 20 24 38 39 49 56 57 62 ...
## $ 4: int [1:240] 3 31 80 86 160 201 228 265 317 361 ...
After creating the batches above, we use a for loop below to compute the loss and gradient for each batch.
for(batch.number in seq_along(big.batches)){
batch.indices <- big.batches[[batch.number]]
batch.loss <- get_pred_loss(sequential, batch.indices)$loss
print(batch.loss)
optimizer$zero_grad()
batch.loss$backward()
optimizer$step()
}
## torch_tensor
## 1.56357
## [ CPUFloatType{} ][ grad_fn = <NllLossBackward0> ]
## torch_tensor
## 1.4151
## [ CPUFloatType{} ][ grad_fn = <NllLossBackward0> ]
## torch_tensor
## 1.26867
## [ CPUFloatType{} ][ grad_fn = <NllLossBackward0> ]
## torch_tensor
## 1.23161
## [ CPUFloatType{} ][ grad_fn = <NllLossBackward0> ]
We use the function below to implement the same logic as above (take a step for each subtrain batch). It modifies the parameters of the input model
in-place.
take_steps <- function(subtrain.indices, model){
optimizer <- torch::optim_sgd(model$parameters, lr=step.size)
batch.list <- get_batch_list(subtrain.indices)
for(batch.number in seq_along(batch.list)){
batch.indices <- batch.list[[batch.number]]
batch.loss <- get_pred_loss(model, batch.indices)$loss
optimizer$zero_grad()
batch.loss$backward()
optimizer$step()
}
}
Below we can do a gradient descent learning for loop over epochs, while monitoring the validation loss, to learn how many epochs results in the best predictions (smallest loss/error).
max_epochs <- 100
torch::torch_manual_seed(1)#control weight/bias initialization.
set.seed(1)#control random batching.
linear.model <- torch::nn_sequential(
torch::nn_flatten(),
torch::nn_linear(n.features, n.classes))
loss.dt.list <- list()
for(epoch in seq(0, max_epochs)){
epoch.loss.dt <- loss_each_set(sub.val.index.list, linear.model)
loss.dt.list[[paste(epoch)]] <- data.table(epoch, epoch.loss.dt)
take_steps(sub.val.index.list$subtrain, linear.model)
}
(loss.dt <- rbindlist(loss.dt.list))
## epoch set variable value
## 1: 0 validation loss 2.1905527
## 2: 0 validation error.percent 83.7654321
## 3: 0 subtrain loss 2.1951709
## 4: 0 subtrain error.percent 82.9320988
## 5: 1 validation loss 0.6515086
## ---
## 400: 99 subtrain error.percent 2.3148148
## 401: 100 validation loss 0.2156566
## 402: 100 validation error.percent 6.1728395
## 403: 100 subtrain loss 0.1040122
## 404: 100 subtrain error.percent 2.2839506
The code above first initializes a new linear model, then for each epoch, it computes the loss on each set, and takes steps using gradients computed on batches from the subtrain set. The loss/error values at each epoch are combined into the data table displayed above, and plotted below.
plot_loss <- function(loss.dt){
(min.dt <- loss.dt[, .SD[which.min(value)], by=.(set, variable)])
gg <- ggplot()+
ggtitle(paste("Linear model, validation fold", validation.fold))+
facet_grid(variable ~ ., scales="free")+
theme_bw()+
theme(panel.spacing=grid::unit(0, "lines"))+
geom_line(aes(
epoch, value, color=set),
data=loss.dt)+
geom_point(aes(
epoch, value, shape=point),
data=data.table(point="min", min.dt))+
scale_y_continuous("loss/error value (lower for better predictions)")+
scale_x_continuous(
"epoch (gradient descent passes through subtrain data)",
limits=c(0, max_epochs*1.2),
breaks=seq(0, max_epochs, by=10))
directlabels::direct.label(gg, "right.polygons")
}
plot_loss(loss.dt)
Since the loss/error values are so large when the number of epochs of learning is so small, it makes the max on the y axis too large to see details for larger epoch numbers, so we exclude the first ten epochs of data in the plot below.
plot_loss(loss.dt[epoch>=10])
It is clear from the plot above that
Instead of trusting a single random subtrain/validation split to determine the best number of epochs (sensitive to random split), we typically use K-fold cross-validation, and average the loss/error curves over several validation sets (less sensitive, more stable).
gradient_descent <- function(index.list, model, n_epochs, gradient.set){
loss.dt.list <- list()
for(epoch in seq(1, n_epochs)){
take_steps(index.list[[gradient.set]], model)
epoch.loss.dt <- loss_each_set(index.list, model)
loss.dt.list[[paste(epoch)]] <- data.table(epoch, epoch.loss.dt)
}
rbindlist(loss.dt.list)
}
new_linear_model <- function(){
torch::nn_sequential(
torch::nn_flatten(),
torch::nn_linear(n.features, n.classes))
}
torch::torch_manual_seed(1)#control weight/bias initialization.
set.seed(1)#control random batching.
fold.loss.dt.list <- list()
for(validation.fold in seq_along(sub.val.cv.list)){
fold.model <- new_linear_model()
one.index.list <- sub.val.cv.list[[validation.fold]]
fold.result <- gradient_descent(
one.index.list, fold.model, max_epochs, "subtrain")
fold.loss.dt.list[[validation.fold]] <- data.table(
validation.fold, fold.result)
}
(fold.loss.dt <- rbindlist(fold.loss.dt.list))
## validation.fold epoch set variable value
## 1: 1 1 validation loss 0.65150863
## 2: 1 1 validation error.percent 14.19753086
## 3: 1 1 subtrain loss 0.65193254
## 4: 1 1 subtrain error.percent 13.79629630
## 5: 1 2 validation loss 0.46156967
## ---
## 1196: 3 99 subtrain error.percent 2.25308642
## 1197: 3 100 validation loss 0.22860475
## 1198: 3 100 validation error.percent 6.23456790
## 1199: 3 100 subtrain loss 0.09782747
## 1200: 3 100 subtrain error.percent 2.31481481
The result above is a table of loss/error for each epoch and validation fold. It can be visualized as three separate curves, as below.
gg <- ggplot()+
ggtitle("Linear model")+
facet_grid(variable ~ validation.fold, scales="free", labeller=label_both)+
theme_bw()+
theme(panel.spacing=grid::unit(0, "lines"))+
geom_line(aes(
epoch, value, color=set),
data=fold.loss.dt[epoch>=10])+
scale_y_continuous("loss/error value (lower for better predictions)")+
scale_x_continuous(
"epoch (gradient descent passes through subtrain data)",
limits=c(0, max_epochs*1.5),
breaks=seq(0, max_epochs, by=10))
directlabels::direct.label(gg, "right.polygons")
Typically to select the best number of epochs, we first compute the mean validation loss curves (over the three folds), as below.
(mean.curves.dt <- fold.loss.dt[, .(
mean=mean(value)
), by=.(variable, set, epoch)])
## variable set epoch mean
## 1: loss validation 1 0.6871229
## 2: error.percent validation 1 14.6913580
## 3: loss subtrain 1 0.6708252
## 4: error.percent subtrain 1 14.4238683
## 5: loss validation 2 0.4831940
## ---
## 396: error.percent subtrain 99 2.2016461
## 397: loss validation 100 0.2152812
## 398: error.percent validation 100 5.9053498
## 399: loss subtrain 100 0.1007051
## 400: error.percent subtrain 100 2.2222222
Then we take the min of the validation loss curve, as below.
(selected.dt <- mean.curves.dt[set=="validation"&variable=="loss"][which.min(mean)])
## variable set epoch mean
## 1: loss validation 61 0.2129522
gg <- ggplot()+
ggtitle("Linear model, mean curves over subtrain/validation splits")+
facet_grid(variable ~ ., scales="free")+
theme_bw()+
theme(panel.spacing=grid::unit(0, "lines"))+
geom_line(aes(
epoch, mean, color=set),
data=mean.curves.dt[epoch>=10])+
geom_point(aes(
epoch, mean),
data=selected.dt)+
scale_y_continuous("loss/error value (lower for better predictions)")+
scale_x_continuous(
"epoch (gradient descent passes through subtrain data)",
limits=c(0, max_epochs*1.5),
breaks=seq(0, max_epochs, by=10))
directlabels::direct.label(gg, "right.polygons")
Finally, after having learned the best number of epochs, we re-initialize the model, and run gradient descent on the train set (not subtrain), using that number of epochs.
final.model <- new_linear_model()
final.loss <- gradient_descent(
train.test.index.list, final.model, selected.dt$epoch, "train")
final.loss[epoch==max(epoch)]
## epoch set variable value
## 1: 61 test loss 0.1851047
## 2: 61 test error.percent 5.0185109
## 3: 61 train loss 0.1253111
## 4: 61 train error.percent 3.0041152
Everything we have done above for the linear model generalizes easily to neural networks; the only change is in the definition of the model.
For example, a fully connected (dense weight matrices) neural network with several hidden layers, and the same number of hidden units per layer, can be defined as below.
n.hidden.units <- 100
new_fully_connected_units <- function(units.per.layer){
seq.args <- list(torch::nn_flatten())
for(output.i in seq(2, length(units.per.layer))){
input.i <- output.i-1
seq.args[[length(seq.args)+1]] <- torch::nn_linear(
units.per.layer[[input.i]],
units.per.layer[[output.i]])
if(output.i<length(units.per.layer)){
seq.args[[length(seq.args)+1]] <- torch::nn_relu()
}
}
do.call(torch::nn_sequential, seq.args)
}
new_fully_connected <- function(n.hidden.layers){
units.per.layer <- c(
n.features,
rep(n.hidden.units,n.hidden.layers),
n.classes)
new_fully_connected_units(units.per.layer)
}
new_fully_connected(0)# linear model.
## An `nn_module` containing 2,570 parameters.
##
## ── Modules ─────────────────────────────────────────────────────────────────────
## • 0: <nn_flatten> #0 parameters
## • 1: <nn_linear> #2,570 parameters
new_fully_connected(1)# neural network with one hidden layer (not deep).
## An `nn_module` containing 26,710 parameters.
##
## ── Modules ─────────────────────────────────────────────────────────────────────
## • 0: <nn_flatten> #0 parameters
## • 1: <nn_linear> #25,700 parameters
## • 2: <nn_relu> #0 parameters
## • 3: <nn_linear> #1,010 parameters
new_fully_connected(2)# "deep" neural network.
## An `nn_module` containing 36,810 parameters.
##
## ── Modules ─────────────────────────────────────────────────────────────────────
## • 0: <nn_flatten> #0 parameters
## • 1: <nn_linear> #25,700 parameters
## • 2: <nn_relu> #0 parameters
## • 3: <nn_linear> #10,100 parameters
## • 4: <nn_relu> #0 parameters
## • 5: <nn_linear> #1,010 parameters
new_fully_connected(3)# "deeper" neural network.
## An `nn_module` containing 46,910 parameters.
##
## ── Modules ─────────────────────────────────────────────────────────────────────
## • 0: <nn_flatten> #0 parameters
## • 1: <nn_linear> #25,700 parameters
## • 2: <nn_relu> #0 parameters
## • 3: <nn_linear> #10,100 parameters
## • 4: <nn_relu> #0 parameters
## • 5: <nn_linear> #10,100 parameters
## • 6: <nn_relu> #0 parameters
## • 7: <nn_linear> #1,010 parameters
## "deep" with about same # of params as conv net.
(dense.network <- new_fully_connected(8))
## An `nn_module` containing 97,410 parameters.
##
## ── Modules ─────────────────────────────────────────────────────────────────────
## • 0: <nn_flatten> #0 parameters
## • 1: <nn_linear> #25,700 parameters
## • 2: <nn_relu> #0 parameters
## • 3: <nn_linear> #10,100 parameters
## • 4: <nn_relu> #0 parameters
## • 5: <nn_linear> #10,100 parameters
## • 6: <nn_relu> #0 parameters
## • 7: <nn_linear> #10,100 parameters
## • 8: <nn_relu> #0 parameters
## • 9: <nn_linear> #10,100 parameters
## • 10: <nn_relu> #0 parameters
## • 11: <nn_linear> #10,100 parameters
## • 12: <nn_relu> #0 parameters
## • 13: <nn_linear> #10,100 parameters
## • 14: <nn_relu> #0 parameters
## • 15: <nn_linear> #10,100 parameters
## • 16: <nn_relu> #0 parameters
## • 17: <nn_linear> #1,010 parameters
dense.network(zip.X.tensor[1:2,])
## torch_tensor
## 0.01 *
## 0.7152 -8.4251 -5.3038 -2.3340 2.4992 -9.4213 -8.9800 0.9778 5.0086 7.9127
## 0.7090 -8.4154 -5.3267 -2.3038 2.5110 -9.4004 -9.0140 1.0051 4.9921 7.9313
## [ CPUFloatType{2,10} ][ grad_fn = <AddmmBackward0> ]
linear.model(zip.X.tensor[1:2,])
## torch_tensor
## Columns 1 to 8 5.0040 -1.3596 1.3680 -7.8165 2.2423 5.0716 13.7993 -7.6465
## 1.5468 -2.3510 2.5304 2.3051 -3.8605 11.6086 -3.1735 -3.2827
##
## Columns 9 to 10 1.7977 -12.3845
## 1.6323 -6.1035
## [ CPUFloatType{2,10} ][ grad_fn = <AddmmBackward0> ]
Note that in the code above if we specify more than one hidden layer, that is considered a deep neural network (deep learning).
In the code below, we define another kind of deep neural network, with a sparse/convolutional layer:
new_convolutional <- function(){
seq2flat <- torch::nn_sequential(
torch::nn_conv2d(in_channels = 1, out_channels = 20, kernel_size = 3),
torch::nn_relu(),
torch::nn_max_pool2d(kernel_size = 2),
torch::nn_flatten())
one.flat <- seq2flat(zip.X.tensor[1,,,,drop=FALSE])
n.flat <- length(one.flat)
torch::nn_sequential(
seq2flat,
torch::nn_linear(n.flat, n.hidden.units),
torch::nn_relu(),
torch::nn_linear(n.hidden.units, n.classes))
}
(convolutional.network <- new_convolutional())
## An `nn_module` containing 99,310 parameters.
##
## ── Modules ─────────────────────────────────────────────────────────────────────
## • 0: <nn_sequential> #200 parameters
## • 1: <nn_linear> #98,100 parameters
## • 2: <nn_relu> #0 parameters
## • 3: <nn_linear> #1,010 parameters
We combine all of the models into the list below,
new_model_list <- list(
linear=new_linear_model,
convolutional=new_convolutional,
one_dense_hidden_layer=function()new_fully_connected(1),
eight_dense_hidden_layers=function()new_fully_connected(8))
The code below runs the full model training (including learning number of epochs), for each model and test fold.
loss_each_split <- function(new_model, train.indices){
cv.list <- cv_list(train.indices, "subtrain", "validation")
fold.loss.dt.list <- list()
for(validation.fold in seq_along(cv.list)){
fold.model <- new_model()
one.index.list <- sub.val.cv.list[[validation.fold]]
fold.result <- gradient_descent(
one.index.list, fold.model, max_epochs, "subtrain")
fold.loss.dt.list[[validation.fold]] <- data.table(
validation.fold, fold.result)
}
rbindlist(fold.loss.dt.list)
}
train_model <- function(new_model, tt.index.list){
fold.loss.dt <- loss_each_split(new_model, tt.index.list$train)
mean.curves.dt <- fold.loss.dt[, .(
mean=mean(value)
), by=.(variable, set, epoch)]
selected.dt <- mean.curves.dt[
set=="validation"&variable=="loss"][which.min(mean)]
final.model <- new_model()
final.loss <- gradient_descent(
tt.index.list, final.model, selected.dt$epoch, "train")
list(
model=final.model,
final.loss=final.loss,
fold.loss=fold.loss.dt,
mean.curves=mean.curves.dt)
}
test.loss.list <- list()
for(test.fold in seq_along(train.test.cv.list)){
tt.index.list <- train.test.cv.list[[test.fold]]
train.label.tab <- table(zip.y.int[tt.index.list$train])
featureless.pred <- names(train.label.tab)[which.max(train.label.tab)]
test.label <- zip.y.int[tt.index.list$test]
is.test.error <- test.label != featureless.pred
test.loss.list[[paste(test.fold,"featureless")]] <- data.table(
test.fold, model.name="featureless", epoch=NA_integer_, set="test",
variable="error.percent",
value=100*mean(is.test.error))
for(model.name in names(new_model_list)){
out.name <- paste(test.fold,model.name)
print(out.name)
new_model <- new_model_list[[model.name]]
result <- train_model(new_model, tt.index.list)
test.loss.list[[out.name]] <- data.table(
test.fold, model.name,
result$final.loss[epoch==max(epoch)&set=="test"])
}
}
## [1] "1 linear"
## [1] "1 convolutional"
## [1] "1 one_dense_hidden_layer"
## [1] "1 eight_dense_hidden_layers"
## [1] "2 linear"
## [1] "2 convolutional"
## [1] "2 one_dense_hidden_layer"
## [1] "2 eight_dense_hidden_layers"
## [1] "3 linear"
## [1] "3 convolutional"
## [1] "3 one_dense_hidden_layer"
## [1] "3 eight_dense_hidden_layers"
(test.loss <- rbindlist(test.loss.list))
## test.fold model.name epoch set variable value
## 1: 1 featureless NA test error.percent 83.66927190
## 2: 1 linear 70 test loss 0.18385553
## 3: 1 linear 70 test error.percent 4.97737557
## 4: 1 convolutional 62 test loss 0.06202247
## 5: 1 convolutional 62 test error.percent 2.01563143
## 6: 1 one_dense_hidden_layer 65 test loss 0.13344371
## 7: 1 one_dense_hidden_layer 65 test error.percent 3.78445084
## 8: 1 eight_dense_hidden_layers 99 test loss 0.93828464
## 9: 1 eight_dense_hidden_layers 99 test error.percent 30.93377211
## 10: 2 featureless NA test error.percent 83.82716049
## 11: 2 linear 62 test loss 0.20835865
## 12: 2 linear 62 test error.percent 5.34979424
## 13: 2 convolutional 64 test loss 0.08277620
## 14: 2 convolutional 64 test error.percent 2.42798354
## 15: 2 one_dense_hidden_layer 71 test loss 0.16659287
## 16: 2 one_dense_hidden_layer 71 test error.percent 4.36213992
## 17: 2 eight_dense_hidden_layers 97 test loss 0.55776656
## 18: 2 eight_dense_hidden_layers 97 test error.percent 14.19753086
## 19: 3 featureless NA test error.percent 83.37448560
## 20: 3 linear 81 test loss 0.20346425
## 21: 3 linear 81 test error.percent 5.26748971
## 22: 3 convolutional 46 test loss 0.09574776
## 23: 3 convolutional 46 test error.percent 2.42798354
## 24: 3 one_dense_hidden_layer 82 test loss 0.16080725
## 25: 3 one_dense_hidden_layer 82 test error.percent 3.95061728
## 26: 3 eight_dense_hidden_layers 93 test loss 0.36736831
## 27: 3 eight_dense_hidden_layers 93 test error.percent 5.72016461
## test.fold model.name epoch set variable value
The for loops above are embarassingly parallel (can be done independently, in any order), so great speedups could be obtained if you run in them parallel, for example on a computer cluster such as NAU Monsoon, see my tutorial here.
ggplot()+
geom_point(aes(
value, model.name),
shape=1,
data=test.loss)+
facet_grid(. ~ variable, labeller=label_both, scales="free")
Above is all data, below excludes outliers.
test.some <- test.loss[
!model.name %in% c("featureless","eight_dense_hidden_layers")]
ggplot()+
geom_point(aes(
value, model.name),
shape=1,
data=test.some)+
facet_grid(. ~ variable, labeller=label_both, scales="free")
sessionInfo()
## R Under development (unstable) (2023-03-29 r84123)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Red Hat Enterprise Linux 8.7 (Ootpa)
##
## Matrix products: default
## BLAS: /projects/genomic-ml/lib64/R/lib/libRblas.so
## LAPACK: /projects/genomic-ml/lib64/R/lib/libRlapack.so; LAPACK version 3.11.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/Phoenix
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics utils datasets grDevices methods base
##
## other attached packages:
## [1] ggplot2_3.4.2 data.table_1.14.8
##
## loaded via a namespace (and not attached):
## [1] bit_4.0.5 gtable_0.3.3 jsonlite_1.8.4
## [4] dplyr_1.1.1 compiler_4.4.0 highr_0.10
## [7] tidyselect_1.2.0 Rcpp_1.0.10 callr_3.7.3
## [10] jquerylib_0.1.4 directlabels_2021.2.24 scales_1.2.1
## [13] fastmap_1.1.1 R6_2.5.1 labeling_0.4.2
## [16] generics_0.1.3 knitr_1.42 tibble_3.2.1
## [19] munsell_0.5.0 bslib_0.4.2 pillar_1.9.0
## [22] R.utils_2.12.2 rlang_1.1.0 utf8_1.2.3
## [25] cachem_1.0.7 xfun_0.38 quadprog_1.5-8
## [28] sass_0.4.5 bit64_4.0.5 cli_3.6.1
## [31] withr_2.5.0 magrittr_2.0.3 ps_1.7.3
## [34] processx_3.8.0 digest_0.6.31 grid_4.4.0
## [37] torch_0.10.0 lifecycle_1.0.3 coro_1.0.3
## [40] R.methodsS3_1.8.2 R.oo_1.25.0 vctrs_0.6.1
## [43] evaluate_0.20 glue_1.6.2 farver_2.1.1
## [46] fansi_1.0.4 colorspace_2.1-1 rmarkdown_2.21
## [49] tools_4.4.0 pkgconfig_2.0.3 htmltools_0.5.5