Title: | Building and Training Neural Networks |
---|---|
Description: | The 'cito' package provides a user-friendly interface for training and interpreting deep neural networks (DNN). 'cito' simplifies the fitting of DNNs by supporting the familiar formula syntax, hyperparameter tuning under cross-validation, and helps to detect and handle convergence problems. DNNs can be trained on CPU, GPU and MacOS GPUs. In addition, 'cito' has many downstream functionalities such as various explainable AI (xAI) metrics (e.g. variable importance, partial dependence plots, accumulated local effect plots, and effect estimates) to interpret trained DNNs. 'cito' optionally provides confidence intervals (and p-values) for all xAI metrics and predictions. At the same time, 'cito' is computationally efficient because it is based on the deep learning framework 'torch'. The 'torch' package is native to R, so no Python installation or other API is required for this package. |
Authors: | Christian Amesöder [aut], Maximilian Pichler [aut, cre] , Florian Hartig [ctb] , Armin Schenk [ctb] |
Maintainer: | Maximilian Pichler <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.1.1 |
Built: | 2024-11-09 05:55:20 UTC |
Source: | https://github.com/citoverse/cito |
Performs an ALE for one or more features.
ALE( model, variable = NULL, data = NULL, K = 10, ALE_type = c("equidistant", "quantile"), plot = TRUE, parallel = FALSE, ... ) ## S3 method for class 'citodnn' ALE( model, variable = NULL, data = NULL, K = 10, ALE_type = c("equidistant", "quantile"), plot = TRUE, parallel = FALSE, ... ) ## S3 method for class 'citodnnBootstrap' ALE( model, variable = NULL, data = NULL, K = 10, ALE_type = c("equidistant", "quantile"), plot = TRUE, parallel = FALSE, ... )
ALE( model, variable = NULL, data = NULL, K = 10, ALE_type = c("equidistant", "quantile"), plot = TRUE, parallel = FALSE, ... ) ## S3 method for class 'citodnn' ALE( model, variable = NULL, data = NULL, K = 10, ALE_type = c("equidistant", "quantile"), plot = TRUE, parallel = FALSE, ... ) ## S3 method for class 'citodnnBootstrap' ALE( model, variable = NULL, data = NULL, K = 10, ALE_type = c("equidistant", "quantile"), plot = TRUE, parallel = FALSE, ... )
model |
a model created by |
variable |
variable as string for which the PDP should be done |
data |
data on which ALE is performed on, if NULL training data will be used. |
K |
number of neighborhoods original feature space gets divided into |
ALE_type |
method on how the feature space is divided into neighborhoods. |
plot |
plot ALE or not |
parallel |
parallelize over bootstrap models or not |
... |
arguments passed to |
A list of plots made with 'ggplot2' consisting of an individual plot for each defined variable.
Accumulated Local Effect plots (ALE) quantify how the predictions change when the features change. They are similar to partial dependency plots but are more robust to feature collinearity.
If the defined variable is a numeric feature, the ALE is performed. Here, the non centered effect for feature j with k equally distant neighborhoods is defined as:
Where is the k-th neighborhood and
is the number of observations in the k-th neighborhood.
The last part of the equation,
represents the difference in model prediction when the value of feature j is exchanged with the upper and lower border of the current neighborhood.
if(torch::torch_is_installed()){ library(cito) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris) ALE(nn.fit, variable = "Petal.Length") }
if(torch::torch_is_installed()){ library(cito) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris) ALE(nn.fit, variable = "Petal.Length") }
After training a model with cito, this function helps to analyze the training process and decide on best performing model. Creates a 'plotly' figure which allows to zoom in and out on training graph
analyze_training(object)
analyze_training(object)
object |
The baseline loss is the most important reference. If the model was not able to achieve a better (lower) loss than the baseline (which is the loss for a intercept only model), the model probably did not converge. Possible reasons include an improper learning rate, too few epochs, or too much regularization. See the ?dnn
help or the vignette("B-Training_neural_networks")
.
a 'plotly' figure
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,],validation = 0.1) # show zoomable plot of training and validation losses analyze_training(nn.fit) # Use model on validation set predictions <- predict(nn.fit, iris[validation_set,]) # Scatterplot plot(iris[validation_set,]$Sepal.Length,predictions) }
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,],validation = 0.1) # show zoomable plot of training and validation losses analyze_training(nn.fit) # Use model on validation set predictions <- predict(nn.fit, iris[validation_set,]) # Scatterplot plot(iris[validation_set,]$Sepal.Length,predictions) }
avgPool
layer object of class citolayer
for use in constructing a Convolutional Neural Network (CNN) architecture. The resulting layer object can be passed to the create_architecture
function to define the structure of the network.This function creates an avgPool
layer object of class citolayer
for use in constructing a Convolutional Neural Network (CNN) architecture. The resulting layer object can be passed to the create_architecture
function to define the structure of the network.
avgPool(kernel_size = NULL, stride = NULL, padding = NULL)
avgPool(kernel_size = NULL, stride = NULL, padding = NULL)
kernel_size |
(integer or tuple) The size of the kernel in this layer. Use a tuple if the kernel size differs across dimensions. |
stride |
(integer or tuple) The stride of the kernel in this layer. If |
padding |
(integer or tuple) The amount of zero-padding added to the input on both sides. Use a tuple if the padding differs across dimensions. |
This function creates an avgPool
layer object, which represents an average pooling layer in a CNN architecture. Parameters not specified (and thus set to NULL
) will be filled with default values provided to the create_architecture
function.
An S3 object of class "avgPool" "citolayer"
, representing an average pooling layer in the CNN architecture.
Armin Schenk
if(torch::torch_is_installed()){ library(cito) # A average pooling layer where all available parameters are assigned # No value will be overwritten by 'create_architecture()' layer1 <- avgPool(3, 1, 0) # A average pooling layer where only the kernel size is assigned # stride and padding are filled with the defaults # passed to the 'create_architecture()' function layer2 <- avgPool(kernel_size=4) }
if(torch::torch_is_installed()){ library(cito) # A average pooling layer where all available parameters are assigned # No value will be overwritten by 'create_architecture()' layer1 <- avgPool(3, 1, 0) # A average pooling layer where only the kernel size is assigned # stride and padding are filled with the defaults # passed to the 'create_architecture()' function layer2 <- avgPool(kernel_size=4) }
The 'cito' package provides a user-friendly interface for training and interpreting deep neural networks (DNN). 'cito' simplifies the fitting of DNNs by supporting the familiar formula syntax, hyperparameter tuning under cross-validation, and helps to detect and handle convergence problems. DNNs can be trained on CPU, GPU and MacOS GPUs. In addition, 'cito' has many downstream functionalities such as various explainable AI (xAI) metrics (e.g. variable importance, partial dependence plots, accumulated local effect plots, and effect estimates) to interpret trained DNNs. 'cito' optionally provides confidence intervals (and p-values) for all xAI metrics and predictions. At the same time, 'cito' is computationally efficient because it is based on the deep learning framework 'torch'. The 'torch' package is native to R, so no Python installation or other API is required for this package.
Cito is built around its main function dnn
, which creates and trains a deep neural network. Various tools for analyzing the trained neural network are available.
in order to install cito please follow these steps:
install.packages("cito")
library(torch)
install_torch(reinstall = TRUE)
library(cito)
dnn
: train deep neural network
analyze_training
: check for convergence by comparing training loss with baseline loss
continue_training
: continues training of an existing cito dnn model for additional epochs
summary.citodnn
: extract xAI metrics/effects to understand how predictions are made
PDP
: plot the partial dependency plot for a specific feature
ALE
: plot the accumulated local effect plot for a specific feature
Check out the vignettes for more details on training NN and how a typical workflow with 'cito' could look like.
Maintainer: Maximilian Pichler [email protected] (ORCID)
Authors:
Christian Amesöder [email protected]
Other contributors:
Florian Hartig [email protected] (ORCID) [contributor]
Armin Schenk [email protected] [contributor]
Useful links:
if(torch::torch_is_installed()){ library(cito) # Example workflow in cito ## Build and train Network ### softmax is used for multi-class responses (e.g., Species) nn.fit<- dnn(Species~., data = datasets::iris, loss = "softmax") ## The training loss is below the baseline loss but at the end of the ## training the loss was still decreasing, so continue training for another 50 ## epochs nn.fit <- continue_training(nn.fit, epochs = 50L) # Sturcture of Neural Network print(nn.fit) # Plot Neural Network plot(nn.fit) ## 4 Input nodes (first layer) because of 4 features ## 3 Output nodes (last layer) because of 3 response species (one node for each ## level in the response variable). ## The layers between the input and output layer are called hidden layers (two ## of them) ## We now want to understand how the predictions are made, what are the ## important features? The summary function automatically calculates feature ## importance (the interpretation is similar to an anova) and calculates ## average conditional effects that are similar to linear effects: summary(nn.fit) ## To visualize the effect (response-feature effect), we can use the ALE and ## PDP functions # Partial dependencies PDP(nn.fit, variable = "Petal.Length") # Accumulated local effect plots ALE(nn.fit, variable = "Petal.Length") # Per se, it is difficult to get confidence intervals for our xAI metrics (or # for the predictions). But we can use bootstrapping to obtain uncertainties # for all cito outputs: ## Re-fit the neural network with bootstrapping nn.fit<- dnn(Species~., data = datasets::iris, loss = "softmax", epochs = 150L, verbose = FALSE, bootstrap = 20L) ## convergence can be tested via the analyze_training function analyze_training(nn.fit) ## Summary for xAI metrics (can take some time): summary(nn.fit) ## Now with standard errors and p-values ## Note: Take the p-values with a grain of salt! We do not know yet if they are ## correct (e.g. if you use regularization, they are likely conservative == too ## large) ## Predictions with bootstrapping: dim(predict(nn.fit)) ## predictions are by default averaged (over the bootstrap samples) ## Multinomial and conditional logit regression m = dnn(Species~., data = iris, loss = "clogit", lr = 0.01) m = dnn(Species~., data = iris, loss = "multinomial", lr = 0.01) Y = t(stats::rmultinom(100, 10, prob = c(0.2, 0.2, 0.5))) m = dnn(cbind(X1, X2, X3)~., data = data.frame(Y, A = as.factor(runif(100))), loss = "multinomial", lr = 0.01) ## conditional logit for size > 1 is not supported yet # Hyperparameter tuning (experimental feature) hidden_values = matrix(c(5, 2, 4, 2, 10,2, 15,2), 4, 2, byrow = TRUE) ## Potential architectures we want to test, first column == number of nodes print(hidden_values) nn.fit = dnn(Species~., data = iris, epochs = 30L, loss = "softmax", hidden = tune(values = hidden_values), lr = tune(0.00001, 0.1) # tune lr between range 0.00001 and 0.1 ) ## Tuning results: print(nn.fit$tuning) # test = Inf means that tuning was cancelled after only one fit (within the CV) # Advanced: Custom loss functions and additional parameters ## Normal Likelihood with sd parameter: custom_loss = function(pred, true) { logLik = torch::distr_normal(pred, scale = torch::nnf_relu(scale)+ 0.001)$log_prob(true) return(-logLik$mean()) } nn.fit<- dnn(Sepal.Length~., data = datasets::iris, loss = custom_loss, verbose = FALSE, custom_parameters = list(scale = 1.0) ) nn.fit$parameter$scale ## Multivariate normal likelihood with parametrized covariance matrix ## Sigma = L*L^t + D ## Helper function to build covariance matrix create_cov = function(LU, Diag) { return(torch::torch_matmul(LU, LU$t()) + torch::torch_diag(Diag$exp()+0.01)) } custom_loss_MVN = function(true, pred) { Sigma = create_cov(SigmaPar, SigmaDiag) logLik = torch::distr_multivariate_normal(pred, covariance_matrix = Sigma)$ log_prob(true) return(-logLik$mean()) } nn.fit<- dnn(cbind(Sepal.Length, Sepal.Width, Petal.Length)~., data = datasets::iris, lr = 0.01, verbose = FALSE, loss = custom_loss_MVN, custom_parameters = list(SigmaDiag = rep(0, 3), SigmaPar = matrix(rnorm(6, sd = 0.001), 3, 2)) ) as.matrix(create_cov(nn.fit$loss$parameter$SigmaPar, nn.fit$loss$parameter$SigmaDiag)) }
if(torch::torch_is_installed()){ library(cito) # Example workflow in cito ## Build and train Network ### softmax is used for multi-class responses (e.g., Species) nn.fit<- dnn(Species~., data = datasets::iris, loss = "softmax") ## The training loss is below the baseline loss but at the end of the ## training the loss was still decreasing, so continue training for another 50 ## epochs nn.fit <- continue_training(nn.fit, epochs = 50L) # Sturcture of Neural Network print(nn.fit) # Plot Neural Network plot(nn.fit) ## 4 Input nodes (first layer) because of 4 features ## 3 Output nodes (last layer) because of 3 response species (one node for each ## level in the response variable). ## The layers between the input and output layer are called hidden layers (two ## of them) ## We now want to understand how the predictions are made, what are the ## important features? The summary function automatically calculates feature ## importance (the interpretation is similar to an anova) and calculates ## average conditional effects that are similar to linear effects: summary(nn.fit) ## To visualize the effect (response-feature effect), we can use the ALE and ## PDP functions # Partial dependencies PDP(nn.fit, variable = "Petal.Length") # Accumulated local effect plots ALE(nn.fit, variable = "Petal.Length") # Per se, it is difficult to get confidence intervals for our xAI metrics (or # for the predictions). But we can use bootstrapping to obtain uncertainties # for all cito outputs: ## Re-fit the neural network with bootstrapping nn.fit<- dnn(Species~., data = datasets::iris, loss = "softmax", epochs = 150L, verbose = FALSE, bootstrap = 20L) ## convergence can be tested via the analyze_training function analyze_training(nn.fit) ## Summary for xAI metrics (can take some time): summary(nn.fit) ## Now with standard errors and p-values ## Note: Take the p-values with a grain of salt! We do not know yet if they are ## correct (e.g. if you use regularization, they are likely conservative == too ## large) ## Predictions with bootstrapping: dim(predict(nn.fit)) ## predictions are by default averaged (over the bootstrap samples) ## Multinomial and conditional logit regression m = dnn(Species~., data = iris, loss = "clogit", lr = 0.01) m = dnn(Species~., data = iris, loss = "multinomial", lr = 0.01) Y = t(stats::rmultinom(100, 10, prob = c(0.2, 0.2, 0.5))) m = dnn(cbind(X1, X2, X3)~., data = data.frame(Y, A = as.factor(runif(100))), loss = "multinomial", lr = 0.01) ## conditional logit for size > 1 is not supported yet # Hyperparameter tuning (experimental feature) hidden_values = matrix(c(5, 2, 4, 2, 10,2, 15,2), 4, 2, byrow = TRUE) ## Potential architectures we want to test, first column == number of nodes print(hidden_values) nn.fit = dnn(Species~., data = iris, epochs = 30L, loss = "softmax", hidden = tune(values = hidden_values), lr = tune(0.00001, 0.1) # tune lr between range 0.00001 and 0.1 ) ## Tuning results: print(nn.fit$tuning) # test = Inf means that tuning was cancelled after only one fit (within the CV) # Advanced: Custom loss functions and additional parameters ## Normal Likelihood with sd parameter: custom_loss = function(pred, true) { logLik = torch::distr_normal(pred, scale = torch::nnf_relu(scale)+ 0.001)$log_prob(true) return(-logLik$mean()) } nn.fit<- dnn(Sepal.Length~., data = datasets::iris, loss = custom_loss, verbose = FALSE, custom_parameters = list(scale = 1.0) ) nn.fit$parameter$scale ## Multivariate normal likelihood with parametrized covariance matrix ## Sigma = L*L^t + D ## Helper function to build covariance matrix create_cov = function(LU, Diag) { return(torch::torch_matmul(LU, LU$t()) + torch::torch_diag(Diag$exp()+0.01)) } custom_loss_MVN = function(true, pred) { Sigma = create_cov(SigmaPar, SigmaDiag) logLik = torch::distr_multivariate_normal(pred, covariance_matrix = Sigma)$ log_prob(true) return(-logLik$mean()) } nn.fit<- dnn(cbind(Sepal.Length, Sepal.Width, Petal.Length)~., data = datasets::iris, lr = 0.01, verbose = FALSE, loss = custom_loss_MVN, custom_parameters = list(SigmaDiag = rep(0, 3), SigmaPar = matrix(rnorm(6, sd = 0.001), 3, 2)) ) as.matrix(create_cov(nn.fit$loss$parameter$SigmaPar, nn.fit$loss$parameter$SigmaDiag)) }
This function trains a Convolutional Neural Network (CNN) on the provided input data X
and the target data Y
using the specified architecture, loss function, and optimizer.
cnn( X, Y = NULL, architecture, loss = c("mse", "mae", "softmax", "cross-entropy", "gaussian", "binomial", "poisson", "mvp", "nbinom", "multinomial", "clogit"), optimizer = c("sgd", "adam", "adadelta", "adagrad", "rmsprop", "rprop"), lr = 0.01, alpha = 0.5, lambda = 0, validation = 0, batchsize = 32L, burnin = 30, shuffle = TRUE, epochs = 100, early_stopping = NULL, lr_scheduler = NULL, custom_parameters = NULL, device = c("cpu", "cuda", "mps"), plot = TRUE, verbose = TRUE )
cnn( X, Y = NULL, architecture, loss = c("mse", "mae", "softmax", "cross-entropy", "gaussian", "binomial", "poisson", "mvp", "nbinom", "multinomial", "clogit"), optimizer = c("sgd", "adam", "adadelta", "adagrad", "rmsprop", "rprop"), lr = 0.01, alpha = 0.5, lambda = 0, validation = 0, batchsize = 32L, burnin = 30, shuffle = TRUE, epochs = 100, early_stopping = NULL, lr_scheduler = NULL, custom_parameters = NULL, device = c("cpu", "cuda", "mps"), plot = TRUE, verbose = TRUE )
X |
An array of input data with a minimum of 3 and a maximum of 5 dimensions. The first dimension represents the samples, the second dimension represents the channels, and the third to fifth dimensions represent the input dimensions. |
Y |
The target data. It can be a factor, numeric vector, or a numeric or logical matrix. |
architecture |
An object of class 'citoarchitecture'. See |
loss |
The loss function to be used. Options include "mse", "mae", "softmax", "cross-entropy", "gaussian", "binomial", "poisson", "nbinom", "mvp", "multinomial", and "clogit". You can also specify your own loss function. See Details for more information. Default is "mse". |
optimizer |
The optimizer to be used. Options include "sgd", "adam", "adadelta", "adagrad", "rmsprop", and "rprop". See |
lr |
Learning rate for the optimizer. Default is 0.01. |
alpha |
Alpha value for L1/L2 regularization. Default is 0.5. |
lambda |
Lambda value for L1/L2 regularization. Default is 0.0. |
validation |
Proportion of the data to be used for validation. Default is 0.0. |
batchsize |
Batch size for training. Default is 32. |
burnin |
Number of epochs after which the training stops if the loss is still above the base loss. Default is 30. |
shuffle |
Whether to shuffle the data before each epoch. Default is TRUE. |
epochs |
Number of epochs to train the model. Default is 100. |
early_stopping |
Number of epochs with no improvement after which training will be stopped. Default is NULL. |
lr_scheduler |
Learning rate scheduler. See |
custom_parameters |
Parameters for the custom loss function. See the vignette for an example. Default is NULL. |
device |
Device to be used for training. Options are "cpu", "cuda", and "mps". Default is "cpu". |
plot |
Whether to plot the training progress. Default is TRUE. |
verbose |
Whether to print detailed training progress. Default is TRUE. |
An S3 object of class "citocnn"
is returned. It is a list containing everything there is to know about the model and its training process.
The list consists of the following attributes:
net |
An object of class "nn_sequential" "nn_module", originates from the torch package and represents the core object of this workflow. |
call |
The original function call. |
loss |
A list which contains relevant information for the target variable and the used loss function. |
data |
Contains the data used for the training of the model. |
base_loss |
The loss of the intercept-only model. |
weights |
List of parameters (weights and biases) of the models from the best and the last training epoch. |
buffers |
List of buffers (e.g. running mean and variance of batch normalization layers) of the models from the best and the last training epoch. |
use_model_epoch |
Integer, defines whether the model from the best (= 1) or the last (= 2) training epoch should be used for prediction. |
loaded_model_epoch |
Integer, shows whether the parameters and buffers of the model from the best (= 1) or the last (= 2) training epoch are currently loaded in |
model_properties |
A list of properties, that define the architecture of the model. |
training_properties |
A list of all the training parameters used the last time the model was trained. |
losses |
A data.frame containing training and validation losses of each epoch. |
Convolutional Neural Networks (CNNs) are a specialized type of neural network designed for processing structured data, such as images. The key components of a CNN are convolutional layers, pooling layers and fully-connected (linear) layers:
Convolutional layers are the core building blocks of CNNs. They consist of filters (also called kernels), which are small, learnable matrices. These filters slide over the input data to perform element-wise multiplication, producing feature maps that capture local patterns and features. Multiple filters are used to detect different features in parallel. They help the network learn hierarchical representations of the input data by capturing low-level features (edges, textures) and gradually combining them (in subsequent convolutional layers) to form higher-level features.
Pooling layers reduce the size of the feature maps created by convolutional layers, while retaining important information. A common type is max pooling, which keeps the highest value in a region, simplifying the data while preserving essential features.
Fully-connected (linear) layers connect every neuron in one layer to every neuron in the next layer. These layers are found at the end of the network and are responsible for combining high-level features to make final predictions.
We support loss functions and likelihoods for different tasks:
Name | Explanation | Example / Task |
mse | mean squared error | Regression, predicting continuous values |
mae | mean absolute error | Regression, predicting continuous values |
softmax | categorical cross entropy | Multi-class, species classification |
cross-entropy | categorical cross entropy | Multi-class, species classification |
gaussian | Normal likelihood | Regression, residual error is also estimated (similar to stats::lm() ) |
binomial | Binomial likelihood | Classification/Logistic regression, mortality |
poisson | Poisson likelihood | Regression, count data, e.g. species abundances |
nbinom | Negative binomial likelihood | Regression, count data with dispersion parameter |
mvp | multivariate probit model | joint species distribution model, multi species (presence absence) |
multinomial | Multinomial likelihood | step selection in animal movement models |
clogit | conditional binomial | step selection in animal movement models |
Ensuring convergence can be tricky when training neural networks. Their training is sensitive to a combination of the learning rate (how much the weights are updated in each optimization step), the batch size (a random subset of the data is used in each optimization step), and the number of epochs (number of optimization steps). Typically, the learning rate should be decreased with the size of the neural networks (amount of learnable parameters). We provide a baseline loss (intercept only model) that can give hints about an appropriate learning rate:
If the training loss of the model doesn't fall below the baseline loss, the learning rate is either too high or too low. If this happens, try higher and lower learning rates.
A common strategy is to try (manually) a few different learning rates to see if the learning rate is on the right scale.
See the troubleshooting vignette (vignette("B-Training_neural_networks")
) for more help on training and debugging neural networks.
As with the learning rate, there is no definitive guide to choosing the right architecture for the right task. However, there are some general rules/recommendations: In general, wider, and deeper neural networks can improve generalization - but this is a double-edged sword because it also increases the risk of overfitting. So, if you increase the width and depth of the network, you should also add regularization (e.g., by increasing the lambda parameter, which corresponds to the regularization strength). Furthermore, in Pichler & Hartig, 2023, we investigated the effects of the hyperparameters on the prediction performance as a function of the data size. For example, we found that the selu
activation function outperforms relu
for small data sizes (<100 observations).
We recommend starting with moderate sizes (like the defaults), and if the model doesn't generalize/converge, try larger networks along with a regularization that helps minimize the risk of overfitting (see vignette("B-Training_neural_networks")
).
Overfitting means that the model fits the training data well, but generalizes poorly to new observations. We can use the validation argument to detect overfitting. If the validation loss starts to increase again at a certain point, it often means that the models are starting to overfit your training data:
Solutions:
Re-train with epochs = point where model started to overfit
Early stopping, stop training when model starts to overfit, can be specified using the early_stopping=…
argument
Use regularization (dropout or elastic-net, see next section)
Elastic Net regularization combines the strengths of L1 (Lasso) and L2 (Ridge) regularization. It introduces a penalty term that encourages sparse weight values while maintaining overall weight shrinkage. By controlling the sparsity of the learned model, Elastic Net regularization helps avoid overfitting while allowing for meaningful feature selection. We advise using elastic net (e.g. lambda = 0.001 and alpha = 0.2).
Dropout regularization helps prevent overfitting by randomly disabling a portion of neurons during training. This technique encourages the network to learn more robust and generalized representations, as it prevents individual neurons from relying too heavily on specific input patterns. Dropout has been widely adopted as a simple yet effective regularization method in deep learning.
By utilizing these regularization methods in your neural network training with the cito package, you can improve generalization performance and enhance the network's ability to handle unseen data. These techniques act as valuable tools in mitigating overfitting and promoting more robust and reliable model performance.
When training a network, you have the flexibility to customize the optimizer settings and learning rate scheduler to optimize the learning process. In the cito package, you can initialize these configurations using the config_lr_scheduler
and config_optimizer
functions.
config_lr_scheduler
allows you to define a specific learning rate scheduler that controls how the learning rate changes over time during training. This is beneficial in scenarios where you want to adaptively adjust the learning rate to improve convergence or avoid getting stuck in local optima.
Similarly, the config_optimizer
function enables you to specify the optimizer for your network. Different optimizers, such as stochastic gradient descent (SGD), Adam, or RMSprop, offer various strategies for updating the network's weights and biases during training. Choosing the right optimizer can significantly impact the training process and the final performance of your neural network.
If you have an NVIDIA CUDA-enabled device and have installed the CUDA toolkit version 11.3 and cuDNN 8.4, you can take advantage of GPU acceleration for training your neural networks. It is crucial to have these specific versions installed, as other versions may not be compatible. For detailed installation instructions and more information on utilizing GPUs for training, please refer to the mlverse: 'torch' documentation.
Note: GPU training is optional, and the package can still be used for training on CPU even without CUDA and cuDNN installations.
Armin Schenk, Maximilian Pichler
predict.citocnn
, print.citocnn
, plot.citocnn
, summary.citocnn
, coef.citocnn
, continue_training
, analyze_training
if(torch::torch_is_installed()){ library(cito) # Example workflow in cito device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") ## Data ### We generate our own data: ### 320 images (3x50x50) of either rectangles or ellipsoids shapes <- cito:::simulate_shapes(n=320, size=50, channels=3) X <- shapes$data Y <- shapes$labels ## Architecture ### Declare the architecture of the CNN ### Note that the output layer is added automatically by cnn() architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network ### softmax is used for classification cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) ## The training loss is below the baseline loss but at the end of the ## training the loss was still decreasing, so continue training for another 50 ## epochs cnn.fit <- continue_training(cnn.fit, epochs = 50) # Structure of Neural Network print(cnn.fit) # Plot Neural Network plot(cnn.fit) ## Convergence can be tested via the analyze_training function analyze_training(cnn.fit) ## Transfer learning ### With the transfer() function we can use predefined architectures with pretrained weights transfer_architecture <- create_architecture(transfer("resnet18")) resnet <- cnn(X, Y, transfer_architecture, loss = "softmax", epochs = 10, validation = 0.1, lr = 0.05, device=device) print(resnet) plot(resnet) }
if(torch::torch_is_installed()){ library(cito) # Example workflow in cito device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") ## Data ### We generate our own data: ### 320 images (3x50x50) of either rectangles or ellipsoids shapes <- cito:::simulate_shapes(n=320, size=50, channels=3) X <- shapes$data Y <- shapes$labels ## Architecture ### Declare the architecture of the CNN ### Note that the output layer is added automatically by cnn() architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network ### softmax is used for classification cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) ## The training loss is below the baseline loss but at the end of the ## training the loss was still decreasing, so continue training for another 50 ## epochs cnn.fit <- continue_training(cnn.fit, epochs = 50) # Structure of Neural Network print(cnn.fit) # Plot Neural Network plot(cnn.fit) ## Convergence can be tested via the analyze_training function analyze_training(cnn.fit) ## Transfer learning ### With the transfer() function we can use predefined architectures with pretrained weights transfer_architecture <- create_architecture(transfer("resnet18")) resnet <- cnn(X, Y, transfer_architecture, loss = "softmax", epochs = 10, validation = 0.1, lr = 0.05, device=device) print(resnet) plot(resnet) }
This function returns the list of parameters (weights and biases) and buffers (e.g. running mean and variance of batch normalization layers) currently in use by the neural network model created using the cnn
function.
## S3 method for class 'citocnn' coef(object, ...)
## S3 method for class 'citocnn' coef(object, ...)
object |
A model created by |
... |
Additional arguments (currently not used). |
A list with two components:
parameters
: A list of the model's weights and biases for the currently used model epoch.
buffers
: A list of buffers (e.g., running statistics) for the currently used model epoch.
if(torch::torch_is_installed()){ library(cito) device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") set.seed(222) ## Data shapes <- cito:::simulate_shapes(320, 28) X <- shapes$data Y <- shapes$labels ## Architecture architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) # Weights of neural network coef(cnn.fit) }
if(torch::torch_is_installed()){ library(cito) device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") set.seed(222) ## Data shapes <- cito:::simulate_shapes(320, 28) X <- shapes$data Y <- shapes$labels ## Architecture architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) # Weights of neural network coef(cnn.fit) }
Returns list of parameters the neural network model currently has in use
## S3 method for class 'citodnn' coef(object, ...) ## S3 method for class 'citodnnBootstrap' coef(object, ...)
## S3 method for class 'citodnn' coef(object, ...) ## S3 method for class 'citodnnBootstrap' coef(object, ...)
object |
a model created by |
... |
nothing implemented yet |
list of weights of neural network
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,]) # Sturcture of Neural Network print(nn.fit) #analyze weights of Neural Network coef(nn.fit) }
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,]) # Sturcture of Neural Network print(nn.fit) #analyze weights of Neural Network coef(nn.fit) }
Average conditional effects calculate the local derivatives for each observation for each feature. They are similar to marginal effects. And the average of these conditional effects is an approximation of linear effects (see Pichler and Hartig, 2023 for more details). You can use this function to either calculate main effects (on the diagonal, take a look at the example) or interaction effects (off-diagonals) between features.
To obtain uncertainties for these effects, enable the bootstrapping option in the dnn(..)
function (see example).
conditionalEffects( object, interactions = FALSE, epsilon = 0.1, device = c("cpu", "cuda", "mps"), indices = NULL, data = NULL, type = "response", ... ) ## S3 method for class 'citodnn' conditionalEffects( object, interactions = FALSE, epsilon = 0.1, device = c("cpu", "cuda", "mps"), indices = NULL, data = NULL, type = "response", ... ) ## S3 method for class 'citodnnBootstrap' conditionalEffects( object, interactions = FALSE, epsilon = 0.1, device = c("cpu", "cuda", "mps"), indices = NULL, data = NULL, type = "response", ... )
conditionalEffects( object, interactions = FALSE, epsilon = 0.1, device = c("cpu", "cuda", "mps"), indices = NULL, data = NULL, type = "response", ... ) ## S3 method for class 'citodnn' conditionalEffects( object, interactions = FALSE, epsilon = 0.1, device = c("cpu", "cuda", "mps"), indices = NULL, data = NULL, type = "response", ... ) ## S3 method for class 'citodnnBootstrap' conditionalEffects( object, interactions = FALSE, epsilon = 0.1, device = c("cpu", "cuda", "mps"), indices = NULL, data = NULL, type = "response", ... )
object |
object of class |
interactions |
calculate interactions or not (computationally expensive) |
epsilon |
difference used to calculate derivatives |
device |
which device |
indices |
of variables for which the ACE are calculated |
data |
data which is used to calculate the ACE |
type |
ACE on which scale (response or link) |
... |
additional arguments that are passed to the predict function |
an S3 object of class "conditionalEffects"
is returned.
The list consists of the following attributes:
result |
3-dimensional array with the raw results |
mean |
Matrix, average conditional effects |
abs |
Matrix, summed absolute conditional effects |
sd |
Matrix, standard deviation of the conditional effects |
Maximilian Pichler
Scholbeck, C. A., Casalicchio, G., Molnar, C., Bischl, B., & Heumann, C. (2022). Marginal effects for non-linear prediction functions. arXiv preprint arXiv:2201.08837.
Pichler, M., & Hartig, F. (2023). Can predictive models be used for causal inference?. arXiv preprint arXiv:2306.10551.
if(torch::torch_is_installed()){ library(cito) # Build and train Network nn.fit = dnn(Sepal.Length~., data = datasets::iris) # Calculate average conditional effects ACE = conditionalEffects(nn.fit) ## Main effects (categorical features are not supported) ACE ## With interaction effects: ACE = conditionalEffects(nn.fit, interactions = TRUE) ## The off diagonal elements are the interaction effects ACE[[1]]$mean ## ACE is a list, elements correspond to the number of response classes ## Sepal.length == 1 Response so we have only one ## list element in the ACE object # Re-train NN with bootstrapping to obtain standard errors nn.fit = dnn(Sepal.Length~., data = datasets::iris, bootstrap = 30L) ## The summary method calculates also the conditional effects, and if ## bootstrapping was used, it will also report standard errors and p-values: summary(nn.fit) }
if(torch::torch_is_installed()){ library(cito) # Build and train Network nn.fit = dnn(Sepal.Length~., data = datasets::iris) # Calculate average conditional effects ACE = conditionalEffects(nn.fit) ## Main effects (categorical features are not supported) ACE ## With interaction effects: ACE = conditionalEffects(nn.fit, interactions = TRUE) ## The off diagonal elements are the interaction effects ACE[[1]]$mean ## ACE is a list, elements correspond to the number of response classes ## Sepal.length == 1 Response so we have only one ## list element in the ACE object # Re-train NN with bootstrapping to obtain standard errors nn.fit = dnn(Sepal.Length~., data = datasets::iris, bootstrap = 30L) ## The summary method calculates also the conditional effects, and if ## bootstrapping was used, it will also report standard errors and p-values: summary(nn.fit) }
Helps create custom learning rate schedulers for dnn
.
config_lr_scheduler( type = c("lambda", "multiplicative", "reduce_on_plateau", "one_cycle", "step"), verbose = FALSE, ... )
config_lr_scheduler( type = c("lambda", "multiplicative", "reduce_on_plateau", "one_cycle", "step"), verbose = FALSE, ... )
type |
String defining which type of scheduler should be used. See Details. |
verbose |
If TRUE, additional information about scheduler will be printed to console. |
... |
additional arguments to be passed to scheduler. See Details. |
different learning rate scheduler need different variables, these functions will tell you which variables can be set:
lambda: lr_lambda
multiplicative: lr_multiplicative
reduce_on_plateau: lr_reduce_on_plateau
one_cycle: lr_one_cycle
step: lr_step
object of class cito_lr_scheduler to give to dnn
if(torch::torch_is_installed()){ library(cito) # create learning rate scheduler object scheduler <- config_lr_scheduler(type = "step", step_size = 30, gamma = 0.15, verbose = TRUE) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris, lr_scheduler = scheduler) }
if(torch::torch_is_installed()){ library(cito) # create learning rate scheduler object scheduler <- config_lr_scheduler(type = "step", step_size = 30, gamma = 0.15, verbose = TRUE) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris, lr_scheduler = scheduler) }
Helps you create custom optimizer for dnn
. It is recommended to set learning rate in dnn
.
config_optimizer( type = c("adam", "adadelta", "adagrad", "rmsprop", "rprop", "sgd"), verbose = FALSE, ... )
config_optimizer( type = c("adam", "adadelta", "adagrad", "rmsprop", "rprop", "sgd"), verbose = FALSE, ... )
type |
character string defining which optimizer should be used. See Details. |
verbose |
If TRUE, additional information about scheduler will be printed to console |
... |
additional arguments to be passed to optimizer. See Details. |
different optimizer need different variables, this function will tell you how the variables are set. For more information see the corresponding functions:
adam: optim_adam
adadelta: optim_adadelta
adagrad: optim_adagrad
rmsprop: optim_rmsprop
rprop: optim_rprop
sgd: optim_sgd
object of class cito_optim to give to dnn
if(torch::torch_is_installed()){ library(cito) # create optimizer object opt <- config_optimizer(type = "adagrad", lr_decay = 1e-04, weight_decay = 0.1, verbose = TRUE) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris, optimizer = opt) }
if(torch::torch_is_installed()){ library(cito) # create optimizer object opt <- config_optimizer(type = "adagrad", lr_decay = 1e-04, weight_decay = 0.1, verbose = TRUE) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris, optimizer = opt) }
Config hyperparameter tuning
config_tuning( CV = 5, steps = 10, parallel = FALSE, NGPU = 1, cancel = TRUE, bootstrap_final = NULL, bootstrap_parallel = FALSE, return_models = FALSE )
config_tuning( CV = 5, steps = 10, parallel = FALSE, NGPU = 1, cancel = TRUE, bootstrap_final = NULL, bootstrap_parallel = FALSE, return_models = FALSE )
CV |
numeric, specifies k-folded cross validation |
steps |
numeric, number of random tuning steps |
parallel |
numeric, number of parallel cores (tuning steps are parallelized) |
NGPU |
numeric, set if more than one GPU is available, tuning will be parallelized over CPU cores and GPUs, only works for NCPU > 1 |
cancel |
CV/tuning for specific hyperparameter set if model cannot reduce loss below baseline after burnin or returns NA loss |
bootstrap_final |
bootstrap final model, if all models should be boostrapped it must be set globally via the bootstrap argument in the |
bootstrap_parallel |
should the bootstrapping be parallelized or not |
return_models |
return individual models |
Note that hyperparameter tuning can be expensive. We have implemented an option to parallelize hyperparameter tuning, including parallelization over one or more GPUs (the hyperparameter evaluation is parallelized, not the CV). This can be especially useful for small models. For example, if you have 4 GPUs, 20 CPU cores, and 20 steps (random samples from the random search), you could run ‘dnn(..., device="cuda",lr = tune(), batchsize=tune(), tuning=config_tuning(parallel=20, NGPU=4)’, which will distribute 20 model fits across 4 GPUs, so that each GPU will process 5 models (in parallel).
dnn
or cnn
for additional epochs.If the training/validation loss is still decreasing at the end of the training, it is often a sign that the NN has not yet converged. You can use this function to continue training instead of re-training the entire model.
continue_training(model, ...) ## S3 method for class 'citodnn' continue_training( model, epochs = 32, data = NULL, device = NULL, verbose = TRUE, changed_params = NULL, init_optimizer = TRUE, ... ) ## S3 method for class 'citodnnBootstrap' continue_training( model, epochs = 32, data = NULL, device = NULL, verbose = TRUE, changed_params = NULL, parallel = FALSE, init_optimizer = TRUE, ... ) ## S3 method for class 'citocnn' continue_training( model, epochs = 32, X = NULL, Y = NULL, device = NULL, verbose = TRUE, changed_params = NULL, init_optimizer = TRUE, ... )
continue_training(model, ...) ## S3 method for class 'citodnn' continue_training( model, epochs = 32, data = NULL, device = NULL, verbose = TRUE, changed_params = NULL, init_optimizer = TRUE, ... ) ## S3 method for class 'citodnnBootstrap' continue_training( model, epochs = 32, data = NULL, device = NULL, verbose = TRUE, changed_params = NULL, parallel = FALSE, init_optimizer = TRUE, ... ) ## S3 method for class 'citocnn' continue_training( model, epochs = 32, X = NULL, Y = NULL, device = NULL, verbose = TRUE, changed_params = NULL, init_optimizer = TRUE, ... )
model |
|
... |
class-specific arguments |
epochs |
additional epochs the training should continue for |
data |
matrix or data.frame. If not provided data from original training will be used |
device |
can be used to overwrite device used in previous training |
verbose |
print training and validation loss of epochs |
changed_params |
list of arguments to change compared to original training setup, see |
init_optimizer |
re-initialize optimizer or not |
parallel |
train bootstrapped model in parallel |
X |
array. If not provided X from original training will be used |
Y |
vector, factor, numerical matrix or logical matrix. If not provided Y from original training will be used |
a model of class citodnn, citodnnBootstrap or citocnn created by dnn
or cnn
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,], epochs = 32) # continue training for another 32 epochs nn.fit<- continue_training(nn.fit,epochs = 32) # Use model on validation set predictions <- predict(nn.fit, iris[validation_set,]) }
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,], epochs = 32) # continue training for another 32 epochs nn.fit<- continue_training(nn.fit,epochs = 32) # Use model on validation set predictions <- predict(nn.fit, iris[validation_set,]) }
This function creates a conv
layer object of class citolayer
for use in constructing a Convolutional Neural Network (CNN) architecture. The resulting layer object can be passed to the create_architecture
function to define the structure of the network.
conv( n_kernels = NULL, kernel_size = NULL, stride = NULL, padding = NULL, dilation = NULL, bias = NULL, activation = NULL, normalization = NULL, dropout = NULL )
conv( n_kernels = NULL, kernel_size = NULL, stride = NULL, padding = NULL, dilation = NULL, bias = NULL, activation = NULL, normalization = NULL, dropout = NULL )
n_kernels |
(integer) The number of kernels (or filters) in this layer. |
kernel_size |
(integer or tuple) The size of the kernels in this layer. Use a tuple if the kernel size is different in each dimension. |
stride |
(integer or tuple) The stride of the kernels in this layer. If |
padding |
(integer or tuple) The amount of zero-padding added to the input on both sides. Use a tuple if the padding is different in each dimension. |
dilation |
(integer or tuple) The dilation of the kernels in this layer. Use a tuple if the dilation is different in each dimension. |
bias |
(boolean) If |
activation |
(character) The activation function applied after this layer. Supported activation functions include "relu", "leaky_relu", "tanh", "elu", "rrelu", "prelu", "softplus", "celu", "selu", "gelu", "relu6", "sigmoid", "softsign", "hardtanh", "tanhshrink", "softshrink", "hardshrink", "log_sigmoid". |
normalization |
(boolean) If |
dropout |
(numeric) The dropout rate for this layer. Set to 0 to disable dropout. |
This function creates a conv
layer object, which is used to define a convolutional layer in a CNN architecture. Parameters that are not specified (and thus set to NULL
) will be filled with default values provided to the create_architecture
function.
An S3 object of class "conv" "citolayer"
, representing a convolutional layer in the CNN architecture.
Armin Schenk
if(torch::torch_is_installed()){ library(cito) # A convolutional layer where all available parameters are assigned # No value will be overwritten by 'create_architecture()' layer1 <- conv(10, 3, 1, 0, 1, TRUE, "relu", FALSE, 0.5) # A convolutional layer where only the activation function is assigned # n_kernels, kernel_size, stride, padding, dilation, bias, # normalization and dropout are filled with the defaults # passed to the 'create_architecture()' function layer2 <- conv(activation="selu") }
if(torch::torch_is_installed()){ library(cito) # A convolutional layer where all available parameters are assigned # No value will be overwritten by 'create_architecture()' layer1 <- conv(10, 3, 1, 0, 1, TRUE, "relu", FALSE, 0.5) # A convolutional layer where only the activation function is assigned # n_kernels, kernel_size, stride, padding, dilation, bias, # normalization and dropout are filled with the defaults # passed to the 'create_architecture()' function layer2 <- conv(activation="selu") }
This function constructs a citoarchitecture
object that defines the architecture of a Convolutional Neural Network (CNN). The citoarchitecture
object can be used by the cnn
function to specify the structure of the network, including layer types, parameters, and default values.
create_architecture( ..., default_n_neurons = 10, default_n_kernels = 10, default_kernel_size = list(conv = 3, maxPool = 2, avgPool = 2), default_stride = list(conv = 1, maxPool = NULL, avgPool = NULL), default_padding = list(conv = 0, maxPool = 0, avgPool = 0), default_dilation = list(conv = 1, maxPool = 1), default_bias = list(conv = TRUE, linear = TRUE), default_activation = list(conv = "relu", linear = "relu"), default_normalization = list(conv = FALSE, linear = FALSE), default_dropout = list(conv = 0, linear = 0) )
create_architecture( ..., default_n_neurons = 10, default_n_kernels = 10, default_kernel_size = list(conv = 3, maxPool = 2, avgPool = 2), default_stride = list(conv = 1, maxPool = NULL, avgPool = NULL), default_padding = list(conv = 0, maxPool = 0, avgPool = 0), default_dilation = list(conv = 1, maxPool = 1), default_bias = list(conv = TRUE, linear = TRUE), default_activation = list(conv = "relu", linear = "relu"), default_normalization = list(conv = FALSE, linear = FALSE), default_dropout = list(conv = 0, linear = 0) )
... |
Objects of class |
default_n_neurons |
(integer) Default number of neurons in a linear layer. Default is 10. |
default_n_kernels |
(integer) Default number of kernels in a convolutional layer. Default is 10. |
default_kernel_size |
(integer or tuple) Default size of kernels in convolutional and pooling layers. Can be a single integer or a tuple if sizes differ across dimensions. Default is |
default_stride |
(integer or tuple) Default stride of kernels in convolutional and pooling layers. Can be a single integer, a tuple if strides differ across dimensions, or |
default_padding |
(integer or tuple) Default zero-padding added to both sides of the input. Can be a single integer or a tuple if padding differs across dimensions. Default is |
default_dilation |
(integer or tuple) Default dilation of kernels in convolutional and max pooling layers. Can be a single integer or a tuple if dilation differs across dimensions. Default is |
default_bias |
(boolean) Default value indicating if a learnable bias should be added to neurons of linear layers and kernels of convolutional layers. Default is |
default_activation |
(character) Default activation function used after linear and convolutional layers. Supported activation functions include "relu", "leaky_relu", "tanh", "elu", "rrelu", "prelu", "softplus", "celu", "selu", "gelu", "relu6", "sigmoid", "softsign", "hardtanh", "tanhshrink", "softshrink", "hardshrink", "log_sigmoid". Default is |
default_normalization |
(boolean) Default value indicating if batch normalization should be applied after linear and convolutional layers. Default is |
default_dropout |
(numeric) Default dropout rate for linear and convolutional layers. Set to 0 for no dropout. Default is |
This function creates a citoarchitecture
object that outlines the CNN's architecture based on the provided layers and default parameters. The final architecture consists of layers in the order they are provided. Any unspecified parameters in the citolayer
objects are filled with the provided default values for their respective layer types. Defaults can be specified for each layer type individually or for all layers at once.
An S3 object of class "citoarchitecture"
that encapsulates the architecture of the CNN.
Armin Schenk
cnn
, linear
, conv
, maxPool
, avgPool
, transfer
, print.citoarchitecture
, plot.citoarchitecture
if(torch::torch_is_installed()){ library(cito) # Convolutional layers with different n_kernels and kernel_sizes c1 <- conv(n_kernels = 8, kernel_size = 5) c2 <- conv(n_kernels = 16, kernel_size = 3) # Linear layer l <- linear(n_neurons = 100) # MaxPooling layer mP <- maxPool(kernel_size = 2) # Create the architecture by using the created layers # Change the defaults with which the not assigned layer parameters will be filled e.g. # change default dropout to different values for linear and convolutional layer # only change the default normalization for linear layers # change default activation of both linear and convolutional layers to 'selu' architecture <- create_architecture(c1, c1, mP, c2, c2, mP, l, default_dropout = list(linear=0.6, conv=0.4), default_normalization = list(linear=TRUE), default_activation = "selu") # See how the finished CNN would look like for specific input and output shapes print(architecture, c(3,128,128), 10) # To use predefined architectures use the transfer() layer alexnet <- transfer("alexnet") # No other linear layers are used after the transfer layer: # The cnn() function will only replace the last linear layer of the architecture # to match the output dimensions of the data architecture <- create_architecture(alexnet) print(architecture, c(3,128,128), 10) # Some linear layers are used after the transfer layer: # The cnn() function will replace the whole "classifier" part of the architecture # with the specified linear layers + an output layer that matches the output dimensions architecture <- create_architecture(alexnet, linear(300), linear(100)) print(architecture, c(3,128,128), 10) }
if(torch::torch_is_installed()){ library(cito) # Convolutional layers with different n_kernels and kernel_sizes c1 <- conv(n_kernels = 8, kernel_size = 5) c2 <- conv(n_kernels = 16, kernel_size = 3) # Linear layer l <- linear(n_neurons = 100) # MaxPooling layer mP <- maxPool(kernel_size = 2) # Create the architecture by using the created layers # Change the defaults with which the not assigned layer parameters will be filled e.g. # change default dropout to different values for linear and convolutional layer # only change the default normalization for linear layers # change default activation of both linear and convolutional layers to 'selu' architecture <- create_architecture(c1, c1, mP, c2, c2, mP, l, default_dropout = list(linear=0.6, conv=0.4), default_normalization = list(linear=TRUE), default_activation = "selu") # See how the finished CNN would look like for specific input and output shapes print(architecture, c(3,128,128), 10) # To use predefined architectures use the transfer() layer alexnet <- transfer("alexnet") # No other linear layers are used after the transfer layer: # The cnn() function will only replace the last linear layer of the architecture # to match the output dimensions of the data architecture <- create_architecture(alexnet) print(architecture, c(3,128,128), 10) # Some linear layers are used after the transfer layer: # The cnn() function will replace the whole "classifier" part of the architecture # with the specified linear layers + an output layer that matches the output dimensions architecture <- create_architecture(alexnet, linear(300), linear(100)) print(architecture, c(3,128,128), 10) }
fits a custom deep neural network using the Multilayer Perceptron architecture. dnn()
supports the formula syntax and allows to customize the neural network to a maximal degree.
dnn( formula = NULL, data = NULL, hidden = c(50L, 50L), activation = "selu", bias = TRUE, dropout = 0, loss = c("mse", "mae", "softmax", "cross-entropy", "gaussian", "binomial", "poisson", "mvp", "nbinom", "multinomial", "clogit"), validation = 0, lambda = 0, alpha = 0.5, optimizer = c("sgd", "adam", "adadelta", "adagrad", "rmsprop", "rprop"), lr = 0.01, batchsize = NULL, burnin = Inf, baseloss = NULL, shuffle = TRUE, epochs = 100, bootstrap = NULL, bootstrap_parallel = FALSE, plot = TRUE, verbose = TRUE, lr_scheduler = NULL, custom_parameters = NULL, device = c("cpu", "cuda", "mps"), early_stopping = FALSE, tuning = config_tuning(), hooks = NULL, X = NULL, Y = NULL )
dnn( formula = NULL, data = NULL, hidden = c(50L, 50L), activation = "selu", bias = TRUE, dropout = 0, loss = c("mse", "mae", "softmax", "cross-entropy", "gaussian", "binomial", "poisson", "mvp", "nbinom", "multinomial", "clogit"), validation = 0, lambda = 0, alpha = 0.5, optimizer = c("sgd", "adam", "adadelta", "adagrad", "rmsprop", "rprop"), lr = 0.01, batchsize = NULL, burnin = Inf, baseloss = NULL, shuffle = TRUE, epochs = 100, bootstrap = NULL, bootstrap_parallel = FALSE, plot = TRUE, verbose = TRUE, lr_scheduler = NULL, custom_parameters = NULL, device = c("cpu", "cuda", "mps"), early_stopping = FALSE, tuning = config_tuning(), hooks = NULL, X = NULL, Y = NULL )
formula |
an object of class " |
data |
matrix or data.frame with features/predictors and response variable |
hidden units in layers, length of hidden corresponds to number of layers |
|
activation |
activation functions, can be of length one, or a vector of different activation functions for each layer |
bias |
whether use biases in the layers, can be of length one, or a vector (number of hidden layers + 1 (last layer)) of logicals for each layer. |
dropout |
dropout rate, probability of a node getting left out during training (see |
loss |
loss after which network should be optimized. Can also be distribution from the stats package or own function, see details |
validation |
percentage of data set that should be taken as validation set (chosen randomly) |
lambda |
strength of regularization: lambda penalty, |
alpha |
add L1/L2 regularization to training |
optimizer |
which optimizer used for training the network, for more adjustments to optimizer see |
lr |
learning rate given to optimizer |
batchsize |
number of samples that are used to calculate one learning rate step, default is 10% of the training data |
burnin |
training is aborted if the trainings loss is not below the baseline loss after burnin epochs |
baseloss |
baseloss, if null baseloss corresponds to intercept only models |
shuffle |
if TRUE, data in each batch gets reshuffled every epoch |
epochs |
epochs the training goes on for |
bootstrap |
bootstrap neural network or not, numeric corresponds to number of bootstrap samples |
bootstrap_parallel |
parallelize (CPU) bootstrapping |
plot |
plot training loss |
verbose |
print training and validation loss of epochs |
lr_scheduler |
learning rate scheduler created with |
custom_parameters |
List of parameters/variables to be optimized. Can be used in a custom loss function. See Vignette for example. |
device |
device on which network should be trained on. mps correspond to M1/M2 GPU devices. |
early_stopping |
if set to integer, training will stop if loss has gotten higher for defined number of epochs in a row, will use validation loss is available. |
tuning |
tuning options created with |
X |
Feature matrix or data.frame, alternative data interface |
Y |
Response vector, factor, matrix or data.frame, alternative data interface |
an S3 object of class "citodnn"
is returned. It is a list containing everything there is to know about the model and its training process.
The list consists of the following attributes:
net |
An object of class "nn_sequential" "nn_module", originates from the torch package and represents the core object of this workflow. |
call |
The original function call |
loss |
A list which contains relevant information for the target variable and the used loss function |
data |
Contains data used for training the model |
weights |
List of weights for each training epoch |
use_model_epoch |
Integer, which defines which model from which training epoch should be used for prediction. 1 = best model, 2 = last model |
loaded_model_epoch |
Integer, shows which model from which epoch is loaded currently into model$net. |
model_properties |
A list of properties of the neural network, contains number of input nodes, number of output nodes, size of hidden layers, activation functions, whether bias is included and if dropout layers are included. |
training_properties |
A list of all training parameters that were used the last time the model was trained. It consists of learning rate, information about an learning rate scheduler, information about the optimizer, number of epochs, whether early stopping was used, if plot was active, lambda and alpha for L1/L2 regularization, batchsize, shuffle, was the data set split into validation and training, which formula was used for training and at which epoch did the training stop. |
losses |
A data.frame containing training and validation losses of each epoch |
Supported activation functions: "relu", "leaky_relu", "tanh", "elu", "rrelu", "prelu", "softplus", "celu", "selu", "gelu", "relu6", "sigmoid", "softsign", "hardtanh", "tanhshrink", "softshrink", "hardshrink", "log_sigmoid"
We support loss functions and likelihoods for different tasks:
Name | Explanation | Example / Task |
mse | mean squared error | Regression, predicting continuous values |
mae | mean absolute error | Regression, predicting continuous values |
softmax | categorical cross entropy | Multi-class, species classification |
cross-entropy | categorical cross entropy | Multi-class, species classification |
gaussian | Normal likelihood | Regression, residual error is also estimated (similar to stats::lm() ) |
binomial | Binomial likelihood | Classification/Logistic regression, mortality |
poisson | Poisson likelihood | Regression, count data, e.g. species abundances |
nbinom | Negative binomial likelihood | Regression, count data with dispersion parameter |
mvp | multivariate probit model | joint species distribution model, multi species (presence absence) |
multinomial | Multinomial likelihood | step selection in animal movement models |
clogit | conditional binomial | step selection in animal movement models |
Ensuring convergence can be tricky when training neural networks. Their training is sensitive to a combination of the learning rate (how much the weights are updated in each optimization step), the batch size (a random subset of the data is used in each optimization step), and the number of epochs (number of optimization steps). Typically, the learning rate should be decreased with the size of the neural networks (depth of the network and width of the hidden layers). We provide a baseline loss (intercept only model) that can give hints about an appropriate learning rate:
If the training loss of the model doesn't fall below the baseline loss, the learning rate is either too high or too low. If this happens, try higher and lower learning rates.
A common strategy is to try (manually) a few different learning rates to see if the learning rate is on the right scale.
See the troubleshooting vignette (vignette("B-Training_neural_networks")
) for more help on training and debugging neural networks.
As with the learning rate, there is no definitive guide to choosing the right architecture for the right task. However, there are some general rules/recommendations: In general, wider, and deeper neural networks can improve generalization - but this is a double-edged sword because it also increases the risk of overfitting. So, if you increase the width and depth of the network, you should also add regularization (e.g., by increasing the lambda parameter, which corresponds to the regularization strength). Furthermore, in Pichler & Hartig, 2023, we investigated the effects of the hyperparameters on the prediction performance as a function of the data size. For example, we found that the selu
activation function outperforms relu
for small data sizes (<100 observations).
We recommend starting with moderate sizes (like the defaults), and if the model doesn't generalize/converge, try larger networks along with a regularization that helps minimize the risk of overfitting (see vignette("B-Training_neural_networks")
).
Overfitting means that the model fits the training data well, but generalizes poorly to new observations. We can use the validation argument to detect overfitting. If the validation loss starts to increase again at a certain point, it often means that the models are starting to overfit your training data:
Solutions:
Re-train with epochs = point where model started to overfit
Early stopping, stop training when model starts to overfit, can be specified using the early_stopping=…
argument
Use regularization (dropout or elastic-net, see next section)
Elastic Net regularization combines the strengths of L1 (Lasso) and L2 (Ridge) regularization. It introduces a penalty term that encourages sparse weight values while maintaining overall weight shrinkage. By controlling the sparsity of the learned model, Elastic Net regularization helps avoid overfitting while allowing for meaningful feature selection. We advise using elastic net (e.g. lambda = 0.001 and alpha = 0.2).
Dropout regularization helps prevent overfitting by randomly disabling a portion of neurons during training. This technique encourages the network to learn more robust and generalized representations, as it prevents individual neurons from relying too heavily on specific input patterns. Dropout has been widely adopted as a simple yet effective regularization method in deep learning.
By utilizing these regularization methods in your neural network training with the cito package, you can improve generalization performance and enhance the network's ability to handle unseen data. These techniques act as valuable tools in mitigating overfitting and promoting more robust and reliable model performance.
We can use bootstrapping to generate uncertainties for all outputs. Bootstrapping can be enabled by setting bootstrap = ...
to the number of bootstrap samples to be used. Note, however, that the computational cost can be excessive.
In some cases it may be worthwhile to parallelize bootstrapping, for example if you have a GPU and the neural network is small. Parallelization for bootstrapping can be enabled by setting the bootstrap_parallel = ...
argument to the desired number of calls to run in parallel.
When training a network, you have the flexibility to customize the optimizer settings and learning rate scheduler to optimize the learning process. In the cito package, you can initialize these configurations using the config_lr_scheduler
and config_optimizer
functions.
config_lr_scheduler
allows you to define a specific learning rate scheduler that controls how the learning rate changes over time during training. This is beneficial in scenarios where you want to adaptively adjust the learning rate to improve convergence or avoid getting stuck in local optima.
Similarly, the config_optimizer
function enables you to specify the optimizer for your network. Different optimizers, such as stochastic gradient descent (SGD), Adam, or RMSprop, offer various strategies for updating the network's weights and biases during training. Choosing the right optimizer can significantly impact the training process and the final performance of your neural network.
We have implemented experimental support for hyperparameter tuning. We can mark hyperparameters that should be tuned by cito by setting their values to tune()
, for example dnn (..., lr = tune()
. tune()
is a function that creates a range of random values for the given hyperparameter. You can change the maximum and minimum range of the potential hyperparameters or pass custom values to the tune(values = c(....))
function. The following table lists the hyperparameters that can currently be tuned:
Hyperparameter | Example | Details |
hidden | dnn(…,hidden=tune(10, 20, fixed=’depth’)) |
Depth and width can be both tuned or only one of them, if both of them should be tuned, vectors for lower and upper #' boundaries must be provided (first = number of nodes) |
bias | dnn(…, bias=tune()) |
Should the bias be turned on or off for all hidden layers |
lambda | dnn(…, lambda = tune(0.0001, 0.1)) |
lambda will be tuned within the range (0.0001, 0.1) |
alpha | dnn(…, lambda = tune(0.2, 0.4)) |
alpha will be tuned within the range (0.2, 0.4) |
activation | dnn(…, activation = tune()) |
activation functions of the hidden layers will be tuned |
dropout | dnn(…, dropout = tune()) |
Dropout rate will be tuned (globally for all layers) |
lr | dnn(…, lr = tune()) |
Learning rate will be tuned |
batchsize | dnn(…, batchsize = tune()) |
batch size will be tuned |
epochs | dnn(…, batchsize = tune()) |
batchsize will be tuned |
The hyperparameters are tuned by random search (i.e., random values for the hyperparameters within a specified range) and by cross-validation. The exact tuning regime can be specified with config_tuning.
Note that hyperparameter tuning can be expensive. We have implemented an option to parallelize hyperparameter tuning, including parallelization over one or more GPUs (the hyperparameter evaluation is parallelized, not the CV). This can be especially useful for small models. For example, if you have 4 GPUs, 20 CPU cores, and 20 steps (random samples from the random search), you could run dnn(..., device="cuda",lr = tune(), batchsize=tune(), tuning=config_tuning(parallel=20, NGPU=4)
, which will distribute 20 model fits across 4 GPUs, so that each GPU will process 5 models (in parallel).
As this is an experimental feature, we welcome feature requests and bug reports on our github site.
For the custom values, all hyperparameters except for the hidden layers require a vector of values. Hidden layers expect a two-column matrix where the first column is the number of hidden nodes and the second column corresponds to the number of hidden layers.
In Multilayer Perceptron (MLP) networks, each neuron is connected to every neuron in the previous layer and every neuron in the subsequent layer. The value of each neuron is computed using a weighted sum of the outputs from the previous layer, followed by the application of an activation function. Specifically, the value of a neuron is calculated as the weighted sum of the outputs of the neurons in the previous layer, combined with a bias term. This sum is then passed through an activation function, which introduces non-linearity into the network. The calculated value of each neuron becomes the input for the neurons in the next layer, and the process continues until the output layer is reached. The choice of activation function and the specific weight values determine the network's ability to learn and approximate complex relationships between inputs and outputs.
Therefore the value of each neuron can be calculated using: . Where
is the weight and
is the value from neuron j to the current one. a() is the activation function, e.g.
If you have an NVIDIA CUDA-enabled device and have installed the CUDA toolkit version 11.3 and cuDNN 8.4, you can take advantage of GPU acceleration for training your neural networks. It is crucial to have these specific versions installed, as other versions may not be compatible. For detailed installation instructions and more information on utilizing GPUs for training, please refer to the mlverse: 'torch' documentation.
Note: GPU training is optional, and the package can still be used for training on CPU even without CUDA and cuDNN installations.
Christian Amesoeder, Maximilian Pichler
predict.citodnn
, plot.citodnn
, coef.citodnn
,print.citodnn
, summary.citodnn
, continue_training
, analyze_training
, PDP
, ALE
,
if(torch::torch_is_installed()){ library(cito) # Example workflow in cito ## Build and train Network ### softmax is used for multi-class responses (e.g., Species) nn.fit<- dnn(Species~., data = datasets::iris, loss = "softmax") ## The training loss is below the baseline loss but at the end of the ## training the loss was still decreasing, so continue training for another 50 ## epochs nn.fit <- continue_training(nn.fit, epochs = 50L) # Sturcture of Neural Network print(nn.fit) # Plot Neural Network plot(nn.fit) ## 4 Input nodes (first layer) because of 4 features ## 3 Output nodes (last layer) because of 3 response species (one node for each ## level in the response variable). ## The layers between the input and output layer are called hidden layers (two ## of them) ## We now want to understand how the predictions are made, what are the ## important features? The summary function automatically calculates feature ## importance (the interpretation is similar to an anova) and calculates ## average conditional effects that are similar to linear effects: summary(nn.fit) ## To visualize the effect (response-feature effect), we can use the ALE and ## PDP functions # Partial dependencies PDP(nn.fit, variable = "Petal.Length") # Accumulated local effect plots ALE(nn.fit, variable = "Petal.Length") # Per se, it is difficult to get confidence intervals for our xAI metrics (or # for the predictions). But we can use bootstrapping to obtain uncertainties # for all cito outputs: ## Re-fit the neural network with bootstrapping nn.fit<- dnn(Species~., data = datasets::iris, loss = "softmax", epochs = 150L, verbose = FALSE, bootstrap = 20L) ## convergence can be tested via the analyze_training function analyze_training(nn.fit) ## Summary for xAI metrics (can take some time): summary(nn.fit) ## Now with standard errors and p-values ## Note: Take the p-values with a grain of salt! We do not know yet if they are ## correct (e.g. if you use regularization, they are likely conservative == too ## large) ## Predictions with bootstrapping: dim(predict(nn.fit)) ## predictions are by default averaged (over the bootstrap samples) ## Multinomial and conditional logit regression m = dnn(Species~., data = iris, loss = "clogit", lr = 0.01) m = dnn(Species~., data = iris, loss = "multinomial", lr = 0.01) Y = t(stats::rmultinom(100, 10, prob = c(0.2, 0.2, 0.5))) m = dnn(cbind(X1, X2, X3)~., data = data.frame(Y, A = as.factor(runif(100))), loss = "multinomial", lr = 0.01) ## conditional logit for size > 1 is not supported yet # Hyperparameter tuning (experimental feature) hidden_values = matrix(c(5, 2, 4, 2, 10,2, 15,2), 4, 2, byrow = TRUE) ## Potential architectures we want to test, first column == number of nodes print(hidden_values) nn.fit = dnn(Species~., data = iris, epochs = 30L, loss = "softmax", hidden = tune(values = hidden_values), lr = tune(0.00001, 0.1) # tune lr between range 0.00001 and 0.1 ) ## Tuning results: print(nn.fit$tuning) # test = Inf means that tuning was cancelled after only one fit (within the CV) # Advanced: Custom loss functions and additional parameters ## Normal Likelihood with sd parameter: custom_loss = function(pred, true) { logLik = torch::distr_normal(pred, scale = torch::nnf_relu(scale)+ 0.001)$log_prob(true) return(-logLik$mean()) } nn.fit<- dnn(Sepal.Length~., data = datasets::iris, loss = custom_loss, verbose = FALSE, custom_parameters = list(scale = 1.0) ) nn.fit$parameter$scale ## Multivariate normal likelihood with parametrized covariance matrix ## Sigma = L*L^t + D ## Helper function to build covariance matrix create_cov = function(LU, Diag) { return(torch::torch_matmul(LU, LU$t()) + torch::torch_diag(Diag$exp()+0.01)) } custom_loss_MVN = function(true, pred) { Sigma = create_cov(SigmaPar, SigmaDiag) logLik = torch::distr_multivariate_normal(pred, covariance_matrix = Sigma)$ log_prob(true) return(-logLik$mean()) } nn.fit<- dnn(cbind(Sepal.Length, Sepal.Width, Petal.Length)~., data = datasets::iris, lr = 0.01, verbose = FALSE, loss = custom_loss_MVN, custom_parameters = list(SigmaDiag = rep(0, 3), SigmaPar = matrix(rnorm(6, sd = 0.001), 3, 2)) ) as.matrix(create_cov(nn.fit$loss$parameter$SigmaPar, nn.fit$loss$parameter$SigmaDiag)) }
if(torch::torch_is_installed()){ library(cito) # Example workflow in cito ## Build and train Network ### softmax is used for multi-class responses (e.g., Species) nn.fit<- dnn(Species~., data = datasets::iris, loss = "softmax") ## The training loss is below the baseline loss but at the end of the ## training the loss was still decreasing, so continue training for another 50 ## epochs nn.fit <- continue_training(nn.fit, epochs = 50L) # Sturcture of Neural Network print(nn.fit) # Plot Neural Network plot(nn.fit) ## 4 Input nodes (first layer) because of 4 features ## 3 Output nodes (last layer) because of 3 response species (one node for each ## level in the response variable). ## The layers between the input and output layer are called hidden layers (two ## of them) ## We now want to understand how the predictions are made, what are the ## important features? The summary function automatically calculates feature ## importance (the interpretation is similar to an anova) and calculates ## average conditional effects that are similar to linear effects: summary(nn.fit) ## To visualize the effect (response-feature effect), we can use the ALE and ## PDP functions # Partial dependencies PDP(nn.fit, variable = "Petal.Length") # Accumulated local effect plots ALE(nn.fit, variable = "Petal.Length") # Per se, it is difficult to get confidence intervals for our xAI metrics (or # for the predictions). But we can use bootstrapping to obtain uncertainties # for all cito outputs: ## Re-fit the neural network with bootstrapping nn.fit<- dnn(Species~., data = datasets::iris, loss = "softmax", epochs = 150L, verbose = FALSE, bootstrap = 20L) ## convergence can be tested via the analyze_training function analyze_training(nn.fit) ## Summary for xAI metrics (can take some time): summary(nn.fit) ## Now with standard errors and p-values ## Note: Take the p-values with a grain of salt! We do not know yet if they are ## correct (e.g. if you use regularization, they are likely conservative == too ## large) ## Predictions with bootstrapping: dim(predict(nn.fit)) ## predictions are by default averaged (over the bootstrap samples) ## Multinomial and conditional logit regression m = dnn(Species~., data = iris, loss = "clogit", lr = 0.01) m = dnn(Species~., data = iris, loss = "multinomial", lr = 0.01) Y = t(stats::rmultinom(100, 10, prob = c(0.2, 0.2, 0.5))) m = dnn(cbind(X1, X2, X3)~., data = data.frame(Y, A = as.factor(runif(100))), loss = "multinomial", lr = 0.01) ## conditional logit for size > 1 is not supported yet # Hyperparameter tuning (experimental feature) hidden_values = matrix(c(5, 2, 4, 2, 10,2, 15,2), 4, 2, byrow = TRUE) ## Potential architectures we want to test, first column == number of nodes print(hidden_values) nn.fit = dnn(Species~., data = iris, epochs = 30L, loss = "softmax", hidden = tune(values = hidden_values), lr = tune(0.00001, 0.1) # tune lr between range 0.00001 and 0.1 ) ## Tuning results: print(nn.fit$tuning) # test = Inf means that tuning was cancelled after only one fit (within the CV) # Advanced: Custom loss functions and additional parameters ## Normal Likelihood with sd parameter: custom_loss = function(pred, true) { logLik = torch::distr_normal(pred, scale = torch::nnf_relu(scale)+ 0.001)$log_prob(true) return(-logLik$mean()) } nn.fit<- dnn(Sepal.Length~., data = datasets::iris, loss = custom_loss, verbose = FALSE, custom_parameters = list(scale = 1.0) ) nn.fit$parameter$scale ## Multivariate normal likelihood with parametrized covariance matrix ## Sigma = L*L^t + D ## Helper function to build covariance matrix create_cov = function(LU, Diag) { return(torch::torch_matmul(LU, LU$t()) + torch::torch_diag(Diag$exp()+0.01)) } custom_loss_MVN = function(true, pred) { Sigma = create_cov(SigmaPar, SigmaDiag) logLik = torch::distr_multivariate_normal(pred, covariance_matrix = Sigma)$ log_prob(true) return(-logLik$mean()) } nn.fit<- dnn(cbind(Sepal.Length, Sepal.Width, Petal.Length)~., data = datasets::iris, lr = 0.01, verbose = FALSE, loss = custom_loss_MVN, custom_parameters = list(SigmaDiag = rep(0, 3), SigmaPar = matrix(rnorm(6, sd = 0.001), 3, 2)) ) as.matrix(create_cov(nn.fit$loss$parameter$SigmaPar, nn.fit$loss$parameter$SigmaDiag)) }
Can be used for categorical variables, a more efficient alternative to one-hot encoding
e(dim = 1L, weights = NULL, train = TRUE, lambda = 0, alpha = 1)
e(dim = 1L, weights = NULL, train = TRUE, lambda = 0, alpha = 1)
dim |
integer, embedding dimension |
weights |
matrix, to use custom embedding matrices |
train |
logical, should the embeddings be trained or not |
lambda |
regularization strength on the embeddings |
alpha |
mix between L1 and L2 regularization |
list of specials – taken from enum.R
findReTrmClasses()
findReTrmClasses()
This function creates a linear
layer object of class citolayer
for use in constructing a Convolutional Neural Network (CNN) architecture. The resulting layer object can be passed to the create_architecture
function to define the structure of the network.
linear( n_neurons = NULL, bias = NULL, activation = NULL, normalization = NULL, dropout = NULL )
linear( n_neurons = NULL, bias = NULL, activation = NULL, normalization = NULL, dropout = NULL )
n_neurons |
(integer) The number of hidden neurons in this layer. |
bias |
(boolean) If |
activation |
(character) The activation function applied after this layer. Supported activation functions include "relu", "leaky_relu", "tanh", "elu", "rrelu", "prelu", "softplus", "celu", "selu", "gelu", "relu6", "sigmoid", "softsign", "hardtanh", "tanhshrink", "softshrink", "hardshrink", "log_sigmoid". |
normalization |
(boolean) If |
dropout |
(numeric) The dropout rate for this layer. Set to 0 to disable dropout. |
This function creates a linear
layer object, which is used to define a linear layer in a CNN architecture. Parameters not specified (and thus set to NULL
) will be filled with default values provided to the create_architecture
function.
An S3 object of class "linear" "citolayer"
, representing a linear layer in the CNN architecture.
Armin Schenk
if(torch::torch_is_installed()){ library(cito) # A linear layer where all available parameters are assigned # No value will be overwritten by 'create_architecture()' layer1 <- linear(100, TRUE, "relu", FALSE, 0.5) # A linear layer where only the activation function is assigned # n_neurons, bias, normalization and dropout are filled with the defaults # passed to the 'create_architecture()' function layer2 <- linear(activation="selu") }
if(torch::torch_is_installed()){ library(cito) # A linear layer where all available parameters are assigned # No value will be overwritten by 'create_architecture()' layer1 <- linear(100, TRUE, "relu", FALSE, 0.5) # A linear layer where only the activation function is assigned # n_neurons, bias, normalization and dropout are filled with the defaults # passed to the 'create_architecture()' function layer2 <- linear(activation="selu") }
This function creates a maxPool
layer object of class citolayer
for use in constructing a Convolutional Neural Network (CNN) architecture. The resulting layer object can be passed to the create_architecture
function to define the structure of the network.
maxPool(kernel_size = NULL, stride = NULL, padding = NULL, dilation = NULL)
maxPool(kernel_size = NULL, stride = NULL, padding = NULL, dilation = NULL)
kernel_size |
(integer or tuple) The size of the kernel in this layer. Use a tuple if the kernel size varies across dimensions. |
stride |
(integer or tuple) The stride of the kernel in this layer. If |
padding |
(integer or tuple) The amount of zero-padding added to the input on both sides. Use a tuple if the padding differs across dimensions. |
dilation |
(integer or tuple) The dilation of the kernel in this layer. Use a tuple if the dilation varies across dimensions. |
This function creates a maxPool
layer object, which represents a maximum pooling layer in a CNN architecture. Parameters not specified (and thus set to NULL
) will be filled with default values provided to the create_architecture
function.
An S3 object of class "maxPool" "citolayer"
, representing a maximum pooling layer in the CNN architecture.
Armin Schenk
if(torch::torch_is_installed()){ library(cito) # A maximum pooling layer where all available parameters are assigned # No value will be overwritten by 'create_architecture()' layer1 <- maxPool(3, 1, 0, 1) # A maximum pooling layer where only the kernel size is assigned # stride, padding and dilation are filled with the defaults # passed to the 'create_architecture()' function layer2 <- maxPool(kernel_size=4) }
if(torch::torch_is_installed()){ library(cito) # A maximum pooling layer where all available parameters are assigned # No value will be overwritten by 'create_architecture()' layer1 <- maxPool(3, 1, 0, 1) # A maximum pooling layer where only the kernel size is assigned # stride, padding and dilation are filled with the defaults # passed to the 'create_architecture()' function layer2 <- maxPool(kernel_size=4) }
This function trains a Multi-Modal Neural Network (MMN) model on the provided data.
mmn( formula, dataList = NULL, fusion_hidden = c(50L, 50L), fusion_activation = c("relu", "leaky_relu", "tanh", "elu", "rrelu", "prelu", "softplus", "celu", "selu", "gelu", "relu6", "sigmoid", "softsign", "hardtanh", "tanhshrink", "softshrink", "hardshrink", "log_sigmoid"), fusion_bias = TRUE, fusion_dropout = 0, loss = c("mse", "mae", "softmax", "cross-entropy", "gaussian", "binomial", "poisson"), optimizer = c("sgd", "adam", "adadelta", "adagrad", "rmsprop", "rprop"), lr = 0.01, alpha = 0.5, lambda = 0, validation = 0, batchsize = 32L, burnin = 10, shuffle = TRUE, epochs = 100, early_stopping = NULL, lr_scheduler = NULL, custom_parameters = NULL, device = c("cpu", "cuda", "mps"), plot = TRUE, verbose = TRUE )
mmn( formula, dataList = NULL, fusion_hidden = c(50L, 50L), fusion_activation = c("relu", "leaky_relu", "tanh", "elu", "rrelu", "prelu", "softplus", "celu", "selu", "gelu", "relu6", "sigmoid", "softsign", "hardtanh", "tanhshrink", "softshrink", "hardshrink", "log_sigmoid"), fusion_bias = TRUE, fusion_dropout = 0, loss = c("mse", "mae", "softmax", "cross-entropy", "gaussian", "binomial", "poisson"), optimizer = c("sgd", "adam", "adadelta", "adagrad", "rmsprop", "rprop"), lr = 0.01, alpha = 0.5, lambda = 0, validation = 0, batchsize = 32L, burnin = 10, shuffle = TRUE, epochs = 100, early_stopping = NULL, lr_scheduler = NULL, custom_parameters = NULL, device = c("cpu", "cuda", "mps"), plot = TRUE, verbose = TRUE )
formula |
A formula object specifying the model structure. See examples for more information |
dataList |
A list containing the data for training the model. The list should contain all variables used in the formula. |
A numeric vector specifying the number of units in each hidden layer of the fusion network. |
|
fusion_activation |
A character vector specifying the activation function for each hidden layer of the fusion network. Available options are: "relu", "leaky_relu", "tanh", "elu", "rrelu", "prelu", "softplus", "celu", "selu", "gelu", "relu6", "sigmoid", "softsign", "hardtanh", "tanhshrink", "softshrink", "hardshrink", "log_sigmoid". |
fusion_bias |
A logical value or vector (length(fusion_hidden) + 1) indicating whether to include bias terms in the layers of the fusion network. |
fusion_dropout |
The dropout rate for the fusion network, a numeric value or vector (length(fusion_hidden)) between 0 and 1. |
loss |
The loss function to be optimized during training. Available options are: "mse", "mae", "softmax", "cross-entropy", "gaussian", "binomial", "poisson". |
optimizer |
The optimization algorithm to be used during training. Available options are: "sgd", "adam", "adadelta", "adagrad", "rmsprop", "rprop". |
lr |
The learning rate for the optimizer. |
alpha |
The alpha parameter for elastic net regularization. Should be a value between 0 and 1. |
lambda |
The lambda parameter for elastic net regularization. Should be a positive value. |
validation |
The proportion of the training data to use for validation. Should be a value between 0 and 1. |
batchsize |
The batch size used during training. |
burnin |
training is aborted if the trainings loss is not below the baseline loss after burnin epochs |
shuffle |
A logical indicating whether to shuffle the training data in each epoch. |
epochs |
The number of epochs to train the model. |
early_stopping |
If provided, the training will stop if the validation loss does not improve for the specified number of epochs. If set to NULL, early stopping is disabled. |
lr_scheduler |
Learning rate scheduler created with |
custom_parameters |
A list of parameters used by custom loss functions. See vignette for examples. |
device |
The device on which to perform computations. Available options are: "cpu", "cuda", "mps". |
plot |
A logical indicating whether to plot training and validation loss curves. |
verbose |
A logical indicating whether to display verbose output during training. |
An object of class "citommn" containing the trained MMN model and other information.
predict.citommn
, print.citommn
, summary.citommn
, continue_training
, analyze_training
Multinomial log likelihood
multinomial_log_prob(probs, value)
multinomial_log_prob(probs, value)
probs |
probabilities |
value |
observed values Multinomial log likelihood |
Calculates the Partial Dependency Plot for one feature, either numeric or categorical. Returns it as a plot.
PDP( model, variable = NULL, data = NULL, ice = FALSE, resolution.ice = 20, plot = TRUE, parallel = FALSE, ... ) ## S3 method for class 'citodnn' PDP( model, variable = NULL, data = NULL, ice = FALSE, resolution.ice = 20, plot = TRUE, parallel = FALSE, ... ) ## S3 method for class 'citodnnBootstrap' PDP( model, variable = NULL, data = NULL, ice = FALSE, resolution.ice = 20, plot = TRUE, parallel = FALSE, ... )
PDP( model, variable = NULL, data = NULL, ice = FALSE, resolution.ice = 20, plot = TRUE, parallel = FALSE, ... ) ## S3 method for class 'citodnn' PDP( model, variable = NULL, data = NULL, ice = FALSE, resolution.ice = 20, plot = TRUE, parallel = FALSE, ... ) ## S3 method for class 'citodnnBootstrap' PDP( model, variable = NULL, data = NULL, ice = FALSE, resolution.ice = 20, plot = TRUE, parallel = FALSE, ... )
model |
a model created by |
variable |
variable as string for which the PDP should be done. If none is supplied it is done for all variables. |
data |
specify new data PDP should be performed . If NULL, PDP is performed on the training data. |
ice |
Individual Conditional Dependence will be shown if TRUE |
resolution.ice |
resolution in which ice will be computed |
plot |
plot PDP or not |
parallel |
parallelize over bootstrap models or not |
... |
arguments passed to |
A list of plots made with 'ggplot2' consisting of an individual plot for each defined variable.
Performs a Partial Dependency Plot (PDP) estimation to analyze the relationship between a selected feature and the target variable.
The PDP function estimates the partial function :
with a Monte Carlo Estimation:
using a Monte Carlo estimation method. It calculates the average prediction of the target variable for different values of the selected feature while keeping other features constant.
For categorical features, all data instances are used, and each instance is set to one level of the categorical feature. The average prediction per category is then calculated and visualized in a bar plot.
If the ice
parameter is set to TRUE
, the Individual Conditional Expectation (ICE) curves are also shown. These curves illustrate how each individual data sample reacts to changes in the feature value. Please note that this option is not available for categorical features. Unlike PDP, the ICE curves are computed using a value grid instead of utilizing every value of every data entry.
Note: The PDP analysis provides valuable insights into the relationship between a specific feature and the target variable, helping to understand the feature's impact on the model's predictions. If a categorical feature is analyzed, all data instances are used and set to each level. Then an average is calculated per category and put out in a bar plot.
If ice is set to true additional the individual conditional dependence will be shown and the original PDP will be colored yellow. These lines show, how each individual data sample reacts to changes in the feature. This option is not available for categorical features. Unlike PDP the ICE curves are computed with a value grid instead of utilizing every value of every data entry.
if(torch::torch_is_installed()){ library(cito) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris) PDP(nn.fit, variable = "Petal.Length") }
if(torch::torch_is_installed()){ library(cito) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris) PDP(nn.fit, variable = "Petal.Length") }
This method provides a visual representation of the network architecture defined by an object of class citoarchitecture
, including information about each layer's configuration. It helps in understanding the structure of the architecture defined by create_architecture
.
## S3 method for class 'citoarchitecture' plot(x, input_shape, output_shape = NULL, ...)
## S3 method for class 'citoarchitecture' plot(x, input_shape, output_shape = NULL, ...)
x |
An object of class |
input_shape |
A numeric vector specifying the dimensions of a single sample (e.g., |
output_shape |
An integer specifying the number of nodes in the output layer. If |
... |
Additional arguments (currently not used). |
The original citoarchitecture
object, returned invisibly.
if(torch::torch_is_installed()){ library(cito) c1 <- conv(n_kernels = 8, kernel_size = 5) c2 <- conv(n_kernels = 16, kernel_size = 3) l <- linear(n_neurons = 100) mP <- maxPool(kernel_size = 2) architecture <- create_architecture(c1, c1, mP, c2, c2, mP, l, default_dropout = list(linear=0.6, conv=0.4), default_normalization = list(linear=TRUE), default_activation = "selu") # See how the finished CNN would look like for specific input and output shapes plot(architecture, c(3,128,128), 10) }
if(torch::torch_is_installed()){ library(cito) c1 <- conv(n_kernels = 8, kernel_size = 5) c2 <- conv(n_kernels = 16, kernel_size = 3) l <- linear(n_neurons = 100) mP <- maxPool(kernel_size = 2) architecture <- create_architecture(c1, c1, mP, c2, c2, mP, l, default_dropout = list(linear=0.6, conv=0.4), default_normalization = list(linear=TRUE), default_activation = "selu") # See how the finished CNN would look like for specific input and output shapes plot(architecture, c(3,128,128), 10) }
This function plots the architecture of a Convolutional Neural Network (CNN) model created using the cnn
function.
## S3 method for class 'citocnn' plot(x, ...)
## S3 method for class 'citocnn' plot(x, ...)
x |
A model created by |
... |
Additional arguments (currently not used). |
The original model object x
, returned invisibly.
if(torch::torch_is_installed()){ library(cito) set.seed(222) device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") ## Data shapes <- cito:::simulate_shapes(320, 28) X <- shapes$data Y <- shapes$labels ## Architecture architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) ## Structure of Neural Network plot(cnn.fit) }
if(torch::torch_is_installed()){ library(cito) set.seed(222) device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") ## Data shapes <- cito:::simulate_shapes(320, 28) X <- shapes$data Y <- shapes$labels ## Architecture architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) ## Structure of Neural Network plot(cnn.fit) }
Creates graph plot which gives an overview of the network architecture.
## S3 method for class 'citodnn' plot(x, node_size = 1, scale_edges = FALSE, ...) ## S3 method for class 'citodnnBootstrap' plot(x, node_size = 1, scale_edges = FALSE, which_model = 1, ...)
## S3 method for class 'citodnn' plot(x, node_size = 1, scale_edges = FALSE, ...) ## S3 method for class 'citodnnBootstrap' plot(x, node_size = 1, scale_edges = FALSE, which_model = 1, ...)
x |
a model created by |
node_size |
size of node in plot |
scale_edges |
edge weight gets scaled according to other weights (layer specific) |
... |
no further functionality implemented yet |
which_model |
which model from the ensemble should be plotted |
A plot made with 'ggraph' + 'igraph' that represents the neural network
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,]) plot(nn.fit) }
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,]) plot(nn.fit) }
This function generates predictions from a Convolutional Neural Network (CNN) model that was created using the cnn
function.
## S3 method for class 'citocnn' predict( object, newdata = NULL, type = c("link", "response", "class"), device = NULL, batchsize = NULL, ... )
## S3 method for class 'citocnn' predict( object, newdata = NULL, type = c("link", "response", "class"), device = NULL, batchsize = NULL, ... )
object |
a model created by |
newdata |
A multidimensional array representing the new data for which predictions are to be made. The dimensions of |
type |
A character string specifying the type of prediction to be made. Options are:
|
device |
Device to be used for making predictions. Options are "cpu", "cuda", and "mps". Default is "cpu". |
batchsize |
An integer specifying the number of samples to be processed at the same time. If |
... |
Additional arguments (currently not used). |
A matrix of predictions. If type
is "class"
, a factor of predicted class labels is returned.
if(torch::torch_is_installed()){ library(cito) set.seed(222) device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") ## Data shapes <- cito:::simulate_shapes(320, 28) X <- shapes$data Y <- shapes$labels ## Architecture architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) ## Get predictions of the validation set valid <- cnn.fit$data$validation predictions <- predict(cnn.fit, newdata = X[valid,,,,drop=FALSE], type="class") ## Classification accuracy accuracy <- sum(predictions == Y[valid])/length(valid) }
if(torch::torch_is_installed()){ library(cito) set.seed(222) device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") ## Data shapes <- cito:::simulate_shapes(320, 28) X <- shapes$data Y <- shapes$labels ## Architecture architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) ## Get predictions of the validation set valid <- cnn.fit$data$validation predictions <- predict(cnn.fit, newdata = X[valid,,,,drop=FALSE], type="class") ## Classification accuracy accuracy <- sum(predictions == Y[valid])/length(valid) }
Predict from a fitted dnn model
## S3 method for class 'citodnn' predict( object, newdata = NULL, type = c("link", "response", "class"), device = c("cpu", "cuda", "mps"), batchsize = NULL, ... ) ## S3 method for class 'citodnnBootstrap' predict( object, newdata = NULL, type = c("link", "response", "class"), device = c("cpu", "cuda", "mps"), batchsize = NULL, reduce = c("mean", "median", "none"), ... )
## S3 method for class 'citodnn' predict( object, newdata = NULL, type = c("link", "response", "class"), device = c("cpu", "cuda", "mps"), batchsize = NULL, ... ) ## S3 method for class 'citodnnBootstrap' predict( object, newdata = NULL, type = c("link", "response", "class"), device = c("cpu", "cuda", "mps"), batchsize = NULL, reduce = c("mean", "median", "none"), ... )
object |
a model created by |
newdata |
new data for predictions |
type |
type of predictions. The default is on the scale of the linear predictor, "response" is on the scale of the response, and "class" means that class predictions are returned (if it is a classification task) |
device |
device on which network should be trained on. |
batchsize |
number of samples that are predicted at the same time |
... |
additional arguments |
reduce |
predictions from bootstrapped model are by default reduced (mean, optional median or none) |
prediction matrix
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,]) # Use model on validation set predictions <- predict(nn.fit, iris[validation_set,]) # Scatterplot plot(iris[validation_set,]$Sepal.Length,predictions) # MAE mean(abs(predictions-iris[validation_set,]$Sepal.Length)) }
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,]) # Use model on validation set predictions <- predict(nn.fit, iris[validation_set,]) # Scatterplot plot(iris[validation_set,]$Sepal.Length,predictions) # MAE mean(abs(predictions-iris[validation_set,]$Sepal.Length)) }
Predict from a fitted mmn model
## S3 method for class 'citommn' predict( object, newdata = NULL, type = c("link", "response", "class"), device = c("cpu", "cuda", "mps"), ... )
## S3 method for class 'citommn' predict( object, newdata = NULL, type = c("link", "response", "class"), device = c("cpu", "cuda", "mps"), ... )
object |
a model created by |
newdata |
new data for predictions |
type |
which value should be calculated, either raw response, output of link function or predicted class (in case of classification) |
device |
device on which network should be trained on. |
... |
additional arguments |
prediction matrix
This method provides a visual representation of the network architecture defined by an object of class citoarchitecture
, including information about each layer's configuration. It helps in understanding the structure of the architecture defined by create_architecture
.
## S3 method for class 'citoarchitecture' print(x, input_shape, output_shape = NULL, ...)
## S3 method for class 'citoarchitecture' print(x, input_shape, output_shape = NULL, ...)
x |
An object of class |
input_shape |
A numeric vector specifying the dimensions of a single sample (e.g., |
output_shape |
An integer specifying the number of nodes in the output layer. If |
... |
Additional arguments (currently not used). |
The original citoarchitecture
object, returned invisibly.
if(torch::torch_is_installed()){ library(cito) c1 <- conv(n_kernels = 8, kernel_size = 5) c2 <- conv(n_kernels = 16, kernel_size = 3) l <- linear(n_neurons = 100) mP <- maxPool(kernel_size = 2) architecture <- create_architecture(c1, c1, mP, c2, c2, mP, l, default_dropout = list(linear=0.6, conv=0.4), default_normalization = list(linear=TRUE), default_activation = "selu") # See how the finished CNN would look like for specific input and output shapes print(architecture, c(3,128,128), 10) }
if(torch::torch_is_installed()){ library(cito) c1 <- conv(n_kernels = 8, kernel_size = 5) c2 <- conv(n_kernels = 16, kernel_size = 3) l <- linear(n_neurons = 100) mP <- maxPool(kernel_size = 2) architecture <- create_architecture(c1, c1, mP, c2, c2, mP, l, default_dropout = list(linear=0.6, conv=0.4), default_normalization = list(linear=TRUE), default_activation = "selu") # See how the finished CNN would look like for specific input and output shapes print(architecture, c(3,128,128), 10) }
This function prints the architecture of a Convolutional Neural Network (CNN) model created using the cnn
function.
## S3 method for class 'citocnn' print(x, ...)
## S3 method for class 'citocnn' print(x, ...)
x |
A model created by |
... |
Additional arguments (currently not used). |
The original model object x
, returned invisibly.
if(torch::torch_is_installed()){ library(cito) set.seed(222) device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") ## Data shapes <- cito:::simulate_shapes(320, 28) X <- shapes$data Y <- shapes$labels ## Architecture architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) # Structure of Neural Network print(cnn.fit) }
if(torch::torch_is_installed()){ library(cito) set.seed(222) device <- ifelse(torch::cuda_is_available(), "cuda", "cpu") ## Data shapes <- cito:::simulate_shapes(320, 28) X <- shapes$data Y <- shapes$labels ## Architecture architecture <- create_architecture(conv(5), maxPool(), conv(5), maxPool(), linear(10)) ## Build and train network cnn.fit <- cnn(X, Y, architecture, loss = "softmax", epochs = 50, validation = 0.1, lr = 0.05, device=device) # Structure of Neural Network print(cnn.fit) }
Print class citodnn
## S3 method for class 'citodnn' print(x, ...) ## S3 method for class 'citodnnBootstrap' print(x, ...)
## S3 method for class 'citodnn' print(x, ...) ## S3 method for class 'citodnnBootstrap' print(x, ...)
x |
a model created by |
... |
additional arguments |
original object x gets returned
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,]) # Structure of Neural Network print(nn.fit) }
if(torch::torch_is_installed()){ library(cito) set.seed(222) validation_set<- sample(c(1:nrow(datasets::iris)),25) # Build and train Network nn.fit<- dnn(Sepal.Length~., data = datasets::iris[-validation_set,]) # Structure of Neural Network print(nn.fit) }
Print class citommn
## S3 method for class 'citommn' print(x, ...)
## S3 method for class 'citommn' print(x, ...)
x |
a model created by |
... |
additional arguments |
original object x
Print average conditional effects
## S3 method for class 'conditionalEffects' print(x, ...) ## S3 method for class 'conditionalEffectsBootstrap' print(x, ...)
## S3 method for class 'conditionalEffects' print(x, ...) ## S3 method for class 'conditionalEffectsBootstrap' print(x, ...)
x |
print ACE calculated by |
... |
optional arguments for compatibility with the generic function, no function implemented |
Matrix with average conditional effects
Print method for class summary.citodnn
## S3 method for class 'summary.citodnn' print(x, ...) ## S3 method for class 'summary.citodnnBootstrap' print(x, ...)
## S3 method for class 'summary.citodnn' print(x, ...) ## S3 method for class 'summary.citodnnBootstrap' print(x, ...)
x |
a summary object created by |
... |
additional arguments |
List with Matrices for importance, average CE, absolute sum of CE, and standard deviation of the CE
Returns residuals of training set.
## S3 method for class 'citodnn' residuals(object, ...)
## S3 method for class 'citodnn' residuals(object, ...)
object |
a model created by |
... |
no additional arguments implemented |
residuals of training set
generates images of rectangles and ellipsoids
simulate_shapes(n, size, channels = 1)
simulate_shapes(n, size, channels = 1)
n |
number of images |
size |
size of the (quadratic) images |
channels |
number of channels the generated data has (in each channel a new rectangle/ellipsoid is created) |
This function generates simple data to demonstrate the usage of cnn(). The generated images are of centered rectangles and ellipsoids with random widths and heights.
array of dimension (n, 1, size, size)
Armin Schenk
This function provides a summary of a Convolutional Neural Network (CNN) model created using the cnn
function. It currently replicates the output of the print.citocnn
method.
## S3 method for class 'citocnn' summary(object, ...)
## S3 method for class 'citocnn' summary(object, ...)
object |
A model created by |
... |
Additional arguments (currently not used). |
The original model object object
, returned invisibly.
Performs a Feature Importance calculation based on Permutations
## S3 method for class 'citodnn' summary(object, n_permute = NULL, device = NULL, type = "response", ...) ## S3 method for class 'citodnnBootstrap' summary( object, n_permute = NULL, device = NULL, adjust_se = FALSE, type = "response", ... )
## S3 method for class 'citodnn' summary(object, n_permute = NULL, device = NULL, type = "response", ...) ## S3 method for class 'citodnnBootstrap' summary( object, n_permute = NULL, device = NULL, adjust_se = FALSE, type = "response", ... )
object |
a model of class citodnn created by |
n_permute |
number of permutations performed. Default is |
device |
for calculating variable importance and conditional effects |
type |
on what scale should the average conditional effects be calculated ("response" or "link") |
... |
additional arguments |
adjust_se |
adjust standard errors for importance (standard errors are multiplied with 1/sqrt(3) ) |
Performs the feature importance calculation as suggested by Fisher, Rudin, and Dominici (2018), and the mean and standard deviation of the average conditional Effects as suggested by Pichler & Hartig (2023).
Feature importances are in their interpretation similar to a ANOVA. Main and interaction effects are absorbed into the features. Also, feature importances are prone to collinearity between features, i.e. if two features are collinear, the importances might be overestimated.
Average conditional effects (ACE) are similar to marginal effects and approximate linear effects, i.e. their interpretation is similar to effects in a linear regression model.
The standard deviation of the ACE informs about the non-linearity of the feature effects. Higher values correlate with stronger non-linearities.
For each feature n permutation get done and original and permuted predictive mean squared error ( &
) get evaluated with
. Based on Mean Squared Error.
summary.citodnn returns an object of class "summary.citodnn", a list with components
currently the same as the print.citommn method.
## S3 method for class 'citommn' summary(object, ...)
## S3 method for class 'citommn' summary(object, ...)
object |
a model created by |
... |
additional arguments |
original object
combine a list of formula terms as a sum
sumTerms(termList)
sumTerms(termList)
termList |
a list of formula terms |
This function creates a transfer
layer object of class citolayer
for use in constructing a Convolutional Neural Network (CNN) architecture. The resulting layer object allows the use of pretrained models available in the 'torchvision' package within cito.
transfer( name = c("alexnet", "inception_v3", "mobilenet_v2", "resnet101", "resnet152", "resnet18", "resnet34", "resnet50", "resnext101_32x8d", "resnext50_32x4d", "vgg11", "vgg11_bn", "vgg13", "vgg13_bn", "vgg16", "vgg16_bn", "vgg19", "vgg19_bn", "wide_resnet101_2", "wide_resnet50_2"), pretrained = TRUE, freeze = TRUE )
transfer( name = c("alexnet", "inception_v3", "mobilenet_v2", "resnet101", "resnet152", "resnet18", "resnet34", "resnet50", "resnext101_32x8d", "resnext50_32x4d", "vgg11", "vgg11_bn", "vgg13", "vgg13_bn", "vgg16", "vgg16_bn", "vgg19", "vgg19_bn", "wide_resnet101_2", "wide_resnet50_2"), pretrained = TRUE, freeze = TRUE )
name |
(character) The name of the pretrained model. Available options include: "alexnet", "inception_v3", "mobilenet_v2", "resnet101", "resnet152", "resnet18", "resnet34", "resnet50", "resnext101_32x8d", "resnext50_32x4d", "vgg11", "vgg11_bn", "vgg13", "vgg13_bn", "vgg16", "vgg16_bn", "vgg19", "vgg19_bn", "wide_resnet101_2", "wide_resnet50_2". |
pretrained |
(boolean) If |
freeze |
(boolean) If |
This function creates a transfer
layer object, which represents a pretrained model of the torchvision
package with the linear "classifier" part removed. This allows the pretrained features of the model to be utilized while enabling customization of the classifier. When using this function with create_architecture
, only linear layers can be added after the transfer
layer. These linear layers define the "classifier" part of the network. If no linear layers are provided following the transfer
layer, the default classifier will consist of a single output layer.
Additionally, the pretrained
argument specifies whether to use the pretrained weights or initialize the model with random weights. If freeze
is set to TRUE
, only the weights of the final linear layers (the "classifier") are updated during training, while the rest of the pretrained model remains unchanged. Note that freeze
has no effect unless pretrained
is set to TRUE
.
An S3 object of class "transfer" "citolayer"
, representing a pretrained model of the torchvision
package in the CNN architecture.
Armin Schenk
if(torch::torch_is_installed()){ library(cito) # Creates a "transfer" "citolayer" object that later tells the cnn() function that # the alexnet architecture and its pretrained weights should be used, but none # of the weights are frozen alexnet <- transfer(name="alexnet", pretrained=TRUE, freeze=FALSE) # Creates a "transfer" "citolayer" object that later tells the cnn() function that # the resnet18 architecture and its pretrained weights should be used. # Also all weights except from the linear layer at the end are frozen (and # therefore not changed during training) resnet18 <- transfer(name="resnet18", pretrained=TRUE, freeze=TRUE) }
if(torch::torch_is_installed()){ library(cito) # Creates a "transfer" "citolayer" object that later tells the cnn() function that # the alexnet architecture and its pretrained weights should be used, but none # of the weights are frozen alexnet <- transfer(name="alexnet", pretrained=TRUE, freeze=FALSE) # Creates a "transfer" "citolayer" object that later tells the cnn() function that # the resnet18 architecture and its pretrained weights should be used. # Also all weights except from the linear layer at the end are frozen (and # therefore not changed during training) resnet18 <- transfer(name="resnet18", pretrained=TRUE, freeze=TRUE) }
Control hyperparameter tuning
tune( lower = NULL, upper = NULL, fixed = NULL, additional = NULL, values = NULL )
tune( lower = NULL, upper = NULL, fixed = NULL, additional = NULL, values = NULL )
lower |
numeric, numeric vector, character, lower boundaries of tuning space |
upper |
numeric, numeric vector, character, upper boundaries of tuning space |
fixed |
character, used for multi-dimensional hyperparameters such as hidden, which dimensions should be fixed |
additional |
numeric, additional control parameter which sets the value of the fixed argument |
values |
custom values from which hyperparameters are sampled, must be a matrix for hidden layers (first column == nodes, second column == number of layers) |