--- title: Training & evaluation with the built-in methods date-created: 2019/03/01 last-modified: 2023/06/25 description: Complete guide to training & evaluation with `fit()` and `evaluate()`. output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Training & evaluation with the built-in methods} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Setup ``` r library(keras3) ``` ## Introduction This guide covers training, evaluation, and prediction (inference) models when using built-in APIs for training & validation (such as `fit()`, `evaluate()` and `predict()`). If you are interested in leveraging `fit()` while specifying your own training step function, see the [Customizing what happens in `fit()` guide](custom_train_step_in_tensorflow.html). If you are interested in writing your own training & evaluation loops from scratch, see the guide [Writing a training loop from scratch](writing_a_custom_training_loop_in_tensorflow.html). In general, whether you are using built-in loops or writing your own, model training & evaluation works strictly in the same way across every kind of Keras model -- Sequential models, models built with the Functional API, and models written from scratch via model subclassing. ## API overview: a first end-to-end example When passing data to the built-in training loops of a model, you should either use: - Arrays (if your data is small and fits in memory) - `tf_dataset` objects - PyTorch `DataLoader` instances In the next few paragraphs, we'll use the MNIST dataset as NumPy arrays, in order to demonstrate how to use optimizers, losses, and metrics. Afterwards, we'll take a close look at each of the other options. Let's consider the following model (here, we build in with the Functional API, but it could be a Sequential model or a subclassed model as well): ``` r inputs <- keras_input(shape = 784, name="digits") outputs <- inputs |> layer_dense(units = 64, activation = "relu", name = "dense_1") |> layer_dense(units = 64, activation = "relu", name = "dense_2") |> layer_dense(units = 10, activation = "softmax", name = "predictions") model <- keras_model(inputs = inputs, outputs = outputs) summary(model) ``` ``` ## [1mModel: "functional"[0m ## ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ## ┃[1m [0m[1mLayer (type) [0m[1m [0m┃[1m [0m[1mOutput Shape [0m[1m [0m┃[1m [0m[1m Param #[0m[1m [0m┃ ## ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ ## │ digits ([38;5;33mInputLayer[0m) │ ([38;5;45mNone[0m, [38;5;34m784[0m) │ [38;5;34m0[0m │ ## ├─────────────────────────────────┼────────────────────────┼───────────────┤ ## │ dense_1 ([38;5;33mDense[0m) │ ([38;5;45mNone[0m, [38;5;34m64[0m) │ [38;5;34m50,240[0m │ ## ├─────────────────────────────────┼────────────────────────┼───────────────┤ ## │ dense_2 ([38;5;33mDense[0m) │ ([38;5;45mNone[0m, [38;5;34m64[0m) │ [38;5;34m4,160[0m │ ## ├─────────────────────────────────┼────────────────────────┼───────────────┤ ## │ predictions ([38;5;33mDense[0m) │ ([38;5;45mNone[0m, [38;5;34m10[0m) │ [38;5;34m650[0m │ ## └─────────────────────────────────┴────────────────────────┴───────────────┘ ## [1m Total params: [0m[38;5;34m55,050[0m (215.04 KB) ## [1m Trainable params: [0m[38;5;34m55,050[0m (215.04 KB) ## [1m Non-trainable params: [0m[38;5;34m0[0m (0.00 B) ``` Here's what the typical end-to-end workflow looks like, consisting of: - Training - Validation on a holdout set generated from the original training data - Evaluation on the test data We'll use MNIST data for this example. ``` r c(c(x_train, y_train), c(x_test, y_test)) %<-% dataset_mnist() # Preprocess the data (these are NumPy arrays) x_train <- array_reshape(x_train, c(60000, 784)) / 255 x_test <- array_reshape(x_test, c(10000, 784)) / 255 # Reserve 10,000 samples for validation x_val <- x_train[1:10000,] y_val <- y_train[1:10000] x_train <- x_train[-c(1:10000),] y_train <- y_train[-c(1:10000)] ``` We specify the training configuration (optimizer, loss, metrics): ``` r model |> compile( # Optimizer optimizer = optimizer_rmsprop(), # Loss function to minimize loss = loss_sparse_categorical_crossentropy(), # List of metrics to monitor metrics = list(metric_sparse_categorical_accuracy()) ) ``` We call `fit()`, which will train the model by slicing the data into "batches" of size `batch_size`, and repeatedly iterating over the entire dataset for a given number of `epochs`. ``` r history <- model |> fit( x_train, y_train, batch_size = 64, epochs = 2, # We pass some validation for # monitoring validation loss and metrics # at the end of each epoch validation_data = list(x_val, y_val) ) ``` ``` ## Epoch 1/2 ## 782/782 - 3s - 3ms/step - loss: 0.3410 - sparse_categorical_accuracy: 0.9034 - val_loss: 0.1855 - val_sparse_categorical_accuracy: 0.9460 ## Epoch 2/2 ## 782/782 - 2s - 2ms/step - loss: 0.1589 - sparse_categorical_accuracy: 0.9538 - val_loss: 0.1323 - val_sparse_categorical_accuracy: 0.9619 ``` The returned `history` object holds a record of the loss values and metric values during training: ``` r history ``` ``` ## ## Final epoch (plot to see history): ## loss: 0.1589 ## sparse_categorical_accuracy: 0.9538 ## val_loss: 0.1323 ## val_sparse_categorical_accuracy: 0.9619 ``` We evaluate the model on the test data via `evaluate()`: ``` r # Evaluate the model on the test data using `evaluate` results <- model |> evaluate(x_test, y_test, batch_size=128) ``` ``` ## 79/79 - 0s - 4ms/step - loss: 0.1267 - sparse_categorical_accuracy: 0.9620 ``` ``` r str(results) ``` ``` ## List of 2 ## $ loss : num 0.127 ## $ sparse_categorical_accuracy: num 0.962 ``` ``` r # Generate predictions (probabilities -- the output of the last layer) # on new data using `predict` predictions <- model |> predict(x_test[1:2,]) ``` ``` ## 1/1 - 0s - 140ms/step ``` ``` r dim(predictions) ``` ``` ## [1] 2 10 ``` Now, let's review each piece of this workflow in detail. ## The `compile()` method: specifying a loss, metrics, and an optimizer To train a model with `fit()`, you need to specify a loss function, an optimizer, and optionally, some metrics to monitor. You pass these to the model as arguments to the `compile()` method: ``` r model |> compile( optimizer = optimizer_rmsprop(learning_rate = 1e-3), loss = loss_sparse_categorical_crossentropy(), metrics = list(metric_sparse_categorical_accuracy()) ) ``` The `metrics` argument should be a list -- your model can have any number of metrics. If your model has multiple outputs, you can specify different losses and metrics for each output, and you can modulate the contribution of each output to the total loss of the model. You will find more details about this in the **Passing data to multi-input, multi-output models** section. Note that if you're satisfied with the default settings, in many cases the optimizer, loss, and metrics can be specified via string identifiers as a shortcut: ``` r model |> compile( optimizer = "rmsprop", loss = "sparse_categorical_crossentropy", metrics = c("sparse_categorical_accuracy") ) ``` For later reuse, let's put our model definition and compile step in functions; we will call them several times across different examples in this guide. ``` r get_uncompiled_model <- function() { inputs <- keras_input(shape = 784, name = "digits") outputs <- inputs |> layer_dense(units = 64, activation = "relu", name = "dense_1") |> layer_dense(units = 64, activation = "relu", name = "dense_2") |> layer_dense(units = 10, activation = "softmax", name = "predictions") keras_model(inputs = inputs, outputs = outputs) } get_compiled_model <- function() { model <- get_uncompiled_model() model |> compile( optimizer = "rmsprop", loss = "sparse_categorical_crossentropy", metrics = c("sparse_categorical_accuracy") ) model } ``` ### Many built-in optimizers, losses, and metrics are available In general, you won't have to create your own losses, metrics, or optimizers from scratch, because what you need is likely to be already part of the Keras API: Optimizers: - [`optimizer_sgd()`] (with or without momentum) - [`optimizer_rmsprop()`] - [`optimizer_adam()`] - etc. Losses: - [`loss_mean_squared_error()`] - [`loss_kl_divergence()`] - [`loss_cosine_similarity()`] - etc. Metrics: - [`metric_auc()`] - [`metric_precision()`] - [`metric_recall()`] - etc. ### Custom losses If you need to create a custom loss, Keras provides three ways to do so. The first method involves creating a function that accepts inputs `y_true` and `y_pred`. The following example shows a loss function that computes the mean squared error between the real data and the predictions: ``` r custom_mean_squared_error <- function(y_true, y_pred) { op_mean(op_square(y_true - y_pred), axis = -1) } model <- get_uncompiled_model() model |> compile(optimizer = "adam", loss = custom_mean_squared_error) # We need to one-hot encode the labels to use MSE y_train_one_hot <- op_one_hot(y_train, num_classes = 10) model |> fit(x_train, y_train_one_hot, batch_size = 64, epochs = 2) ``` ``` ## Epoch 1/2 ## 782/782 - 2s - 2ms/step - loss: 0.0161 ## Epoch 2/2 ## 782/782 - 1s - 681us/step - loss: 0.0078 ``` If you need a loss function that takes in parameters beside `y_true` and `y_pred`, you can subclass the Keras base `Loss` class using [`Loss()`] and implement the following two methods: - `initialize()`: accept parameters to pass during the call of your loss function - `call(y_true, y_pred)`: use the targets (y_true) and the model predictions (y_pred) to compute the model's loss Let's say you want to use mean squared error, but with an added term that will de-incentivize prediction values far from 0.5 (we assume that the categorical targets are one-hot encoded and take values between 0 and 1). This creates an incentive for the model not to be too confident, which may help reduce overfitting (we won't know if it works until we try!). Here's how you would do it: ``` r loss_custom_mse <- Loss( classname = "CustomMSE", initialize = function(regularization_factor = 0.1, name = "custom_mse") { super$initialize(name = name) self$regularization_factor <- regularization_factor }, call = function(y_true, y_pred) { mse <- op_mean(op_square(y_true - y_pred), axis = -1) reg <- op_mean(op_square(0.5 - y_pred), axis = -1) mse + reg * self$regularization_factor } ) model <- get_uncompiled_model() model |> compile(optimizer="adam", loss = loss_custom_mse()) y_train_one_hot <- op_one_hot(y_train, num_classes=10) model |> fit(x_train, y_train_one_hot, batch_size=64, epochs=1) ``` ``` ## 782/782 - 2s - 2ms/step - loss: 0.0390 ``` ### Custom metrics If you need a metric that isn't part of the API, you can easily create custom metrics by subclassing the Keras base `Metric` class using [`Metric()`]. You will need to implement 4 methods: - `initialize()`, in which you will create state variables for your metric. - `update_state(y_true, y_pred, sample_weight = NULL)`, which uses the targets y_true and the model predictions y_pred to update the state variables. - `result()`, which uses the state variables to compute the final results. - `reset_state()`, which reinitializes the state of the metric. State update and results computation are kept separate (in `update_state()` and `result()`, respectively) because in some cases, the results computation might be very expensive and would only be done periodically. Here's a simple example showing how to implement a `CategoricalTruePositives` metric that counts how many samples were correctly classified as belonging to a given class: ``` r metric_categorical_true_positives <- Metric( "CategoricalTruePositives", initialize = function(name = "categorical_true_positives", ...) { super$initialize(name = name, ...) self$true_positives <- self$add_variable(shape = shape(), name = "ctp", initializer = "zeros") }, update_state = function(y_true, y_pred, sample_weight = NULL) { y_pred <- op_argmax(y_pred, axis = 2) |> op_reshape(c(-1, 1)) values <- op_cast(y_true, "int32") == op_cast(y_pred, "int32") values <- op_cast(values, "float32") if (!is.null(sample_weight)) { sample_weight <- op_cast(sample_weight, "float32") values <- op_multiply(values, sample_weight) } self$true_positives$assign_add(op_sum(values)) }, result = function() { self$true_positives$value }, reset_state = function() { self$true_positives$assign(0.0) } ) model <- get_uncompiled_model() model |> compile( optimizer = optimizer_rmsprop(learning_rate = 1e-3), loss = loss_sparse_categorical_crossentropy(), metrics = c(metric_categorical_true_positives()) ) history <- model |> fit(x_train, y_train, batch_size = 64, epochs = 3) ``` ``` ## Epoch 1/3 ## 782/782 - 2s - 2ms/step - categorical_true_positives: 360446.0000 - loss: 0.3444 ## Epoch 2/3 ## 782/782 - 1s - 736us/step - categorical_true_positives: 362538.0000 - loss: 0.1658 ## Epoch 3/3 ## 782/782 - 0s - 620us/step - categorical_true_positives: 363223.0000 - loss: 0.1205 ``` ### Handling losses and metrics that don't fit the standard signature The overwhelming majority of losses and metrics can be computed from `y_true` and `y_pred`, where `y_pred` is an output of your model -- but not all of them. For instance, a regularization loss may only require the activation of a layer (there are no targets in this case), and this activation may not be a model output. In such cases, you can call `self$add_loss(loss_value)` from inside the call method of a custom layer. Losses added in this way get added to the "main" loss during training (the one passed to `compile()`). Here's a simple example that adds activity regularization (note that activity regularization is built-in in all Keras layers -- this layer is just for the sake of providing a concrete example): ``` r layer_custom_activity_regularizer <- Layer( "ActivityRegularization", call = function(inputs) { self$add_loss(op_sum(inputs) * 0.1) inputs # Pass-through layer. } ) inputs <- keras_input(shape = 784, name = "digits") outputs <- inputs |> layer_dense(units = 32, activation = "relu", name = "dense_1") |> layer_custom_activity_regularizer() |> layer_dense(units = 64, activation = "relu", name = "dense_2") |> layer_dense(units = 10, name = "predictions") model <- keras_model(inputs = inputs, outputs = outputs) model |> compile(optimizer = optimizer_rmsprop(learning_rate = 1e-3), loss = loss_sparse_categorical_crossentropy(from_logits = TRUE)) # The displayed loss will be much higher than before # due to the regularization component. model |> fit(x_train, y_train, batch_size = 64, epochs = 1) ``` ``` ## 782/782 - 1s - 2ms/step - loss: 2.3721 ``` Note that when you pass losses via `add_loss()`, it becomes possible to call `compile()` without a loss function, since the model already has a loss to minimize. Consider the following `LogisticEndpoint` layer: it takes as inputs targets & logits, and it tracks a crossentropy loss via `add_loss()`. ``` r layer_logistic_endpoint <- Layer( "LogisticEndpoint", initialize = function(name = NULL) { super$initialize(name = name) self$loss_fn <- loss_binary_crossentropy(from_logits = TRUE) }, call = function(targets, logits, sample_weights = NULL) { # Compute the training-time loss value and add it # to the layer using `self.add_loss()`. loss <- self$loss_fn(targets, logits, sample_weights) self$add_loss(loss) # Return the inference-time prediction tensor (for `predict()`). op_softmax(logits) } ) ``` You can use it in a model with two inputs (input data & targets), compiled without a `loss` argument, like this: ``` r inputs <- keras_input(shape = 3, name = "inputs") targets <- keras_input(shape = 10, name = "targets") logits <- inputs |> layer_dense(10) predictions <- layer_logistic_endpoint(name = "predictions")(targets, logits) model <- keras_model(inputs = list(inputs, targets), outputs = predictions) model |> compile(optimizer = "adam") # No loss argument! data <- list( inputs = random_normal(c(3, 3)), targets = random_normal(c(3, 10)) ) model |> fit(data, epochs = 1) ``` ``` ## 1/1 - 0s - 469ms/step - loss: 0.9638 ``` For more information about training multi-input models, see the section **Passing data to multi-input, multi-output models**. ### Automatically setting apart a validation holdout set In the first end-to-end example you saw, we used the `validation_data` argument to pass a list of arrays `list(x_val, y_val)` to the model for evaluating a validation loss and validation metrics at the end of each epoch. Here's another option: the argument `validation_split` allows you to automatically reserve part of your training data for validation. The argument value represents the fraction of the data to be reserved for validation, so it should be set to a number higher than 0 and lower than 1. For instance, `validation_split = 0.2` means "use 20% of the data for validation", and `validation_split = 0.6` means "use 60% of the data for validation". The way the validation is computed is by taking the last x% samples of the arrays received by the `fit()` call, before any shuffling. Note that you can only use `validation_split` when training with NumPy data. ``` r model <- get_compiled_model() model |> fit(x_train, y_train, batch_size = 64, validation_split = 0.2, epochs = 1) ``` ``` ## 625/625 - 2s - 3ms/step - loss: 0.3817 - sparse_categorical_accuracy: 0.8919 - val_loss: 0.1953 - val_sparse_categorical_accuracy: 0.9431 ``` ## Training & evaluation using TF `Dataset` objects In the past few paragraphs, you've seen how to handle losses, metrics, and optimizers, and you've seen how to use the `validation_data` and `validation_split` arguments in `fit()`, when your data is passed as arrays. Another option is to use an iterator-like, such as a `tf.data.Dataset`, a PyTorch `DataLoader`, or an R generator function. Let's take look at the former. The `{tfdatasets}` R package containes a set of utilities for loading and preprocessing data in a way that's fast and scalable. For a complete guide about creating `Datasets`, see the [tf.data documentation](https://www.tensorflow.org/guide/data). **You can use `tf.data` to train your Keras models regardless of the backend you're using -- whether it's JAX, PyTorch, or TensorFlow.** You can pass a `Dataset` instance directly to the methods `fit()`, `evaluate()`, and `predict()`: ``` r library(tfdatasets, exclude = "shape") model <- get_compiled_model() # First, let's create a training Dataset instance. # For the sake of our example, we'll use the same MNIST data as before. train_dataset <- tensor_slices_dataset(list(x_train, y_train)) # Shuffle and slice the dataset. train_dataset <- train_dataset |> dataset_shuffle(buffer_size=1024) |> dataset_batch(64) # Now we get a test dataset. test_dataset <- tensor_slices_dataset(list(x_test, y_test)) |> dataset_batch(64) # Since the dataset already takes care of batching, # we don't pass a `batch_size` argument. model |> fit(train_dataset, epochs = 3) ``` ``` ## Epoch 1/3 ## 782/782 - 2s - 2ms/step - loss: 0.3365 - sparse_categorical_accuracy: 0.9041 ## Epoch 2/3 ## 782/782 - 1s - 682us/step - loss: 0.1605 - sparse_categorical_accuracy: 0.9524 ## Epoch 3/3 ## 782/782 - 1s - 655us/step - loss: 0.1185 - sparse_categorical_accuracy: 0.9647 ``` ``` r # You can also evaluate or predict on a dataset. result <- model |> evaluate(test_dataset) ``` ``` ## 157/157 - 1s - 4ms/step - loss: 0.1152 - sparse_categorical_accuracy: 0.9627 ``` ``` r result ``` ``` ## $loss ## [1] 0.1151979 ## ## $sparse_categorical_accuracy ## [1] 0.9627 ``` Note that the `Dataset` is reset at the end of each epoch, so it can be reused of the next epoch. If you want to run training only on a specific number of batches from this Dataset, you can pass the `steps_per_epoch` argument, which specifies how many training steps the model should run using this Dataset before moving on to the next epoch. ``` r model <- get_compiled_model() # Prepare the training dataset train_dataset <- tensor_slices_dataset(list(x_train, y_train)) train_dataset <- train_dataset |> dataset_shuffle(buffer_size = 1024) |> dataset_batch(64) # Only use the 100 batches per epoch (that's 64 * 100 samples) model |> fit(train_dataset, epochs = 3, steps_per_epoch = 100) ``` ``` ## Epoch 1/3 ## 100/100 - 1s - 8ms/step - loss: 0.8017 - sparse_categorical_accuracy: 0.7806 ## Epoch 2/3 ## 100/100 - 0s - 667us/step - loss: 0.3661 - sparse_categorical_accuracy: 0.9006 ## Epoch 3/3 ## 100/100 - 0s - 641us/step - loss: 0.3009 - sparse_categorical_accuracy: 0.9106 ``` You can also pass a `Dataset` instance as the `validation_data` argument in `fit()`: ``` r model <- get_compiled_model() # Prepare the training dataset train_dataset <- tensor_slices_dataset(list(x_train, y_train)) train_dataset <- train_dataset |> dataset_shuffle(buffer_size=1024) |> dataset_batch(64) # Prepare the validation dataset val_dataset <- tensor_slices_dataset(list(x_val, y_val)) val_dataset <- val_dataset |> dataset_batch(64) model |> fit(train_dataset, epochs = 1, validation_data = val_dataset) ``` ``` ## 782/782 - 2s - 2ms/step - loss: 0.3428 - sparse_categorical_accuracy: 0.9022 - val_loss: 0.2337 - val_sparse_categorical_accuracy: 0.9291 ``` At the end of each epoch, the model will iterate over the validation dataset and compute the validation loss and validation metrics. If you want to run validation only on a specific number of batches from this dataset, you can pass the `validation_steps` argument, which specifies how many validation steps the model should run with the validation dataset before interrupting validation and moving on to the next epoch: ``` r model <- get_compiled_model() # Prepare the training dataset train_dataset <- tensor_slices_dataset(list(x_train, y_train)) train_dataset <- train_dataset |> dataset_shuffle(buffer_size = 1024) |> dataset_batch(64) # Prepare the validation dataset val_dataset <- tensor_slices_dataset(list(x_val, y_val)) val_dataset <- val_dataset |> dataset_batch(64) model %>% fit( train_dataset, epochs = 1, # Only run validation using the first 10 batches of the dataset # using the `validation_steps` argument validation_data = val_dataset, validation_steps = 10, ) ``` ``` ## 782/782 - 2s - 3ms/step - loss: 0.3391 - sparse_categorical_accuracy: 0.9035 - val_loss: 0.1997 - val_sparse_categorical_accuracy: 0.9391 ``` Note that the validation dataset will be reset after each use (so that you will always be evaluating on the same samples from epoch to epoch). The argument `validation_split` (generating a holdout set from the training data) is not supported when training from `Dataset` objects, since this feature requires the ability to index the samples of the datasets, which is not possible in general with the `Dataset` API. ## Using sample weighting and class weighting With the default settings the weight of a sample is decided by its frequency in the dataset. There are two methods to weight the data, independent of sample frequency: * Class weights * Sample weights ### Class weights This is set by passing a named list to the `class_weight` argument to `fit()`. This list maps class indices to the weight that should be used for samples belonging to this class. This can be used to balance classes without resampling, or to train a model that gives more importance to a particular class. For instance, if class "0" is half as represented as class "1" in your data, you could use `model |> fit(..., class_weight = c("0" = 1, "1" = 0.5))`. Here's an R example where we use class weights or sample weights to give more importance to the correct classification of class #5 (which is the digit "5" in the MNIST dataset). ``` r class_weight <- c( "0" = 1.0, "1" = 1.0, "2" = 1.0, "3" = 1.0, "4" = 1.0, # Set weight "2" for class "5", # making this class 2x more important "5" = 2.0, "6" = 1.0, "7" = 1.0, "8" = 1.0, "9" = 1.0 ) model <- get_compiled_model() model |> fit(x_train, y_train, class_weight = class_weight, batch_size = 64, epochs = 1) ``` ``` ## 782/782 - 1s - 2ms/step - loss: 0.3713 - sparse_categorical_accuracy: 0.9018 ``` ### Sample weights For fine grained control, or if you are not building a classifier, you can use `sample_weights`. - When training from R arrays: Pass the `sample_weight` argument to `fit()`. - When training from `tf_dataset` or any other sort of iterator: yield `(input_batch, label_batch, sample_weight_batch)` tuples. A "sample weights" array is an array of numbers that specify how much weight each sample in a batch should have in computing the total loss. It is commonly used in imbalanced classification problems (the idea being to give more weight to rarely-seen classes). When the weights used are ones and zeros, the array can be used as a *mask* for the loss function (entirely discarding the contribution of certain samples to the total loss). ``` r sample_weight <- rep(1.0, length(y_train)) sample_weight[y_train == 5] <- 2.0 model <- get_compiled_model() model |> fit( x_train, y_train, sample_weight = sample_weight, batch_size = 64, epochs = 1 ) ``` ``` ## 782/782 - 1s - 2ms/step - loss: 0.3740 - sparse_categorical_accuracy: 0.9015 ``` Here's a matching `Dataset` example: ``` r sample_weight <- rep(1.0, length(y_train)) sample_weight[y_train == 5] <- 2.0 # Create a Dataset that includes sample weights # (3rd element in the return tuple). train_dataset <- tensor_slices_dataset(list( x_train, y_train, sample_weight )) # Shuffle and slice the dataset. train_dataset <- train_dataset |> dataset_shuffle(buffer_size = 1024) |> dataset_batch(64) model <- get_compiled_model() model |> fit(train_dataset, epochs = 1) ``` ``` ## 782/782 - 1s - 2ms/step - loss: 0.3654 - sparse_categorical_accuracy: 0.9057 ``` ## Passing data to multi-input, multi-output models In the previous examples, we were considering a model with a single input (a tensor of shape `(764)`) and a single output (a prediction tensor of shape `(10)`). But what about models that have multiple inputs or outputs? Consider the following model, which has an image input of shape `(32, 32, 3)` (that's `(height, width, channels)`) and a time series input of shape `(NA, 10)` (that's `(timesteps, features)`). Our model will have two outputs computed from the combination of these inputs: a "score" (of shape `(1)`) and a probability distribution over five classes (of shape `(5)`). ``` r image_input <- keras_input(c(32, 32, 3), name = "img_input") timeseries_input <- keras_input(c(NA, 10), name = "ts_input") x1 <- image_input |> layer_conv_2d(filters = 3, kernel_size = c(3, 3)) |> layer_global_max_pooling_2d() x2 <- timeseries_input |> layer_conv_1d(filters = 3, kernel_size = 3) |> layer_global_max_pooling_1d() x <- layer_concatenate(x1, x2) score_output <- layer_dense(x, 1, name = "score_output") class_output <- layer_dense(x, 5, name = "class_output") model <- keras_model( inputs = list(image_input, timeseries_input), outputs = list(score_output, class_output) ) ``` Let's plot this model, so you can clearly see what we're doing here (note that the shapes shown in the plot are batch shapes, rather than per-sample shapes). ``` r plot(model, show_shapes = TRUE) ```