--- title: "Insert, update, or remove rows in a database" date: "`r Sys.Date()`" author: James Wondrasek output: rmarkdown::html_vignette vignette: > %\VignetteEncoding{UTF-8} %\VignetteIndexEntry{How to: Insert, update or remove rows in a database} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ``````{r setup, include = FALSE} source("setup/setup.R") `````` This tutorial introduces the methods {dm} provides for modifying the data in the tables of a relational model. There are 6 methods: * [`dm_rows_insert()`](#insert) - adds new unique rows * [`dm_rows_append()`](#insert) - adds new rows unconditionally * [`dm_rows_update()`](#update) - changes values in rows * [`dm_rows_patch()`](#patch) - fills in missing values * [`dm_rows_upsert()`](#upsert) - adds new rows or changes values if pre-existing * [`dm_rows_delete()`](#delete) - deletes rows ## The dm_rows_* process All six methods take the same arguments and using them follows the same process: 1. Create a temporary *changeset dm* object that defines the intended changes on the RDBMS 1. If desired, simulate changes with `in_place = FALSE` to double-check 1. Apply changes with `in_place = TRUE`. To start, a `dm` object is created containing the tables and rows that you want to change. This changeset `dm` is then copied into the same source as the dm you want to modify. With the dm in the same RDBMS as the destination dm, you call the appropriate method, such as `dm_rows_insert()`, to make your planned changes, along with an argument of `in_place = FALSE` so you can confirm you achieve the changes that you want. This verification can be done visually, looking at row counts and the like, or using {dm}'s constraint checking method, `dm_examine_constraints()`. The biggest danger is damaging key relations between data spread across multiple tables by deleting or duplicating rows and their keys. `dm_examine_constraints()` will catch errors where primary keys are duplicated or foreign keys do not have a matching primary key (unless the foreign key value is `NA`). With the changes confirmed, you execute the method again, this time with the argument `in_place = TRUE` to make the changes permanent. Note that `in_place = FALSE` is the default: you must opt in to actually change data on the database. Each method has its own requirements in order to maintain database consistency. These involve constraints on primary key values that uniquely identify rows. | Method | Requirements | |--------|--------------| | `dm_rows_insert()` | Records with existing primary keys are silently ignored (via `dplyr::rows_insert(conflict = "ignore")`). | | `dm_rows_append()` | All records are inserted, the underlying database might check for uniqueness of primary keys (and fail the operation) if a constraint is set. | | `dm_rows_update()` | Primary keys must match for all records to be updated.| | `dm_rows_patch()` | Updates missing values in existing records. Primary keys must match for all records to be patched.| | `dm_rows_upsert()` | Updates existing records and adds new records, based on the primary key.| | `dm_rows_delete()` | Removes matching records based on the primary key. Primary keys must match for all records to be deleted.| To ensure the integrity of all relations during the process, all methods automatically determine the correct processing order for the tables involved. For operations that create records, parent tables (which hold primary keys) are processed before child tables (which hold foreign keys). For `dm_rows_delete()`, child tables are processed before their parent tables. Note that the user is still responsible for setting transactions to ensure integrity of operations across multiple tables. For more details on this see `vignette("howto-dm-theory")` and `vignette("howto-dm-db")`. ## Usage {#usage} To demonstrate the use of these table modifying methods, we will create a simple `dm` object with two tables linked by a foreign key. Note that the `child` table has a foreign key missing (`NA`). ``````{r} library(dm) parent <- tibble(value = c("A", "B", "C"), pk = 1:3) parent child <- tibble(value = c("a", "b", "c"), pk = 1:3, fk = c(1, 1, NA)) child demo_dm <- dm(parent = parent, child = child) %>% dm_add_pk(parent, pk) %>% dm_add_pk(child, pk) %>% dm_add_fk(child, fk, parent) demo_dm %>% dm_draw(view_type = "all") `````` {dm} doesn't check your key values when you create a dm, we add this check:[^null-fk] [^null-fk]: Be aware that when using `dm_examine_constraints()`, missing (denoted by `NULL` in SQL, while `NA` in R) foreign keys are allowed and will be counted as a match. In some cases this doesn't make sense and non-NULL columns should be enforced by the RDBMS. Currently, {dm} does not specify or check non-NULL constraints for columns. ``````{r } dm_examine_constraints(demo_dm) `````` Then we copy `demo_dm` into an SQLite database. Note: the default for the method used, `copy_dm_to()`, is to create temporary tables that will be automatically deleted when your session ends. As `demo_sql` will be the destination dm for the examples, the argument `temporary = FALSE` is used to make this distinction apparent. ``````{r } library(DBI) sqlite_db <- DBI::dbConnect(RSQLite::SQLite()) demo_sql <- copy_dm_to(sqlite_db, demo_dm, temporary = FALSE) demo_sql `````` {dm}'s table modification methods can be piped together to create a repeatable sequence of operations that returns a dm incorporating all the changes required. This is a common use case for {dm} -- manually building a sequence of operations using temporary results until it is complete and correct, and then committing the result. ## `dm_rows_insert()` {#insert} To demonstrate `dm_rows_insert()`, we create a dm with tables containing the rows to insert and copy it to `sqlite_db`, the same source as `demo_sql`. For all of the `dm_rows_...()` methods, the source and destination `dm` objects must be in the same RDBMS. You will get an error message if this is not the case. The code below adds `parent` and `child` table entries for the letter "D". First, the changeset dm is created and temporarily copied to the database: ``````{r } new_parent <- tibble(value = "D", pk = 4) new_parent new_child <- tibble(value = "d", pk = 4, fk = 4) new_child dm_insert_in <- dm(parent = new_parent, child = new_child) %>% copy_dm_to(sqlite_db, ., temporary = TRUE) `````` The changeset dm is then used as an argument to `dm_rows_insert()`. ``````{r } dm_insert_out <- demo_sql %>% dm_rows_insert(dm_insert_in) `````` This gives us a warning that changes will not persist (i.e., they are temporary). Inspecting the `child` table of the resulting `dm_insert_out` and `demo_sql`, we can see that's exactly what happened. {dm} returned to us a dm object with our inserted rows in place, but the underlying database has not changed. ``````{r } dm_insert_out$child demo_sql$child `````` We repeat the operation, this time with the argument `in_place = TRUE` and the changes now persist in `demo_sql`. ``````{r } dm_insert_out <- demo_sql %>% dm_rows_insert(dm_insert_in, in_place = TRUE) demo_sql$child `````` ## `dm_rows_update()` {#update} `dm_rows_update()` works the same as `dm_rows_insert()`. We create the dm object and copy it to the same source as the destination. Here we will change the foreign key for the row in `child` containing "b" to point to the correct row in `parent`. And we will persist the changes. ``````{r } updated_child <- tibble(value = "b", pk = 2, fk = 2) updated_child dm_update_in <- dm(child = updated_child) %>% copy_dm_to(sqlite_db, ., temporary = TRUE) dm_update_out <- demo_sql %>% dm_rows_update(dm_update_in, in_place = TRUE) demo_sql$child `````` ## `dm_rows_delete()` {#delete} `dm_rows_delete()` is not currently implemented to work with an RDBMS, so we will shift our demonstrations back to the local R environment. We've made changes to `demo_sql`, so we use `collect()` to copy the current tables out of SQLite. Note that persistence is not a concern for *local* `dm` objects. Every operation returns a new dm object containing the changes made. ``````{r } local_dm <- collect(demo_sql) local_dm$parent local_dm$child dm_deleted <- dm(parent = new_parent, child = new_child) %>% dm_rows_delete(local_dm, .) dm_deleted$child `````` ## `dm_rows_patch()` {#patch} `dm_rows_patch()` updates missing values in existing records. We use it here to fix the missing foreign key in the `child` table. ``````{r } patched_child <- tibble(value = "c", pk = 3, fk = 3) patched_child dm_patched <- dm(child = patched_child) %>% dm_rows_patch(dm_deleted, .) dm_patched$child `````` ## `dm_rows_upsert()` {#upsert} `dm_rows_upsert()` updates rows with supplied values if they exist or inserts the supplied values as new rows if they don't. In this example we add the letter "D" back to our dm, and update the foreign key for "b". ``````{r } upserted_parent <- tibble(value = "D", pk = 4) upserted_parent upserted_child <- tibble(value = c("b", "d"), pk = c(2, 4), fk = c(3, 4)) upserted_child dm_upserted <- dm(parent = upserted_parent, child = upserted_child) %>% dm_rows_upsert(dm_patched, .) dm_upserted$parent dm_upserted$child `````` When done, do not forget to disconnect: ``````{r disconnect} DBI::dbDisconnect(sqlite_db) `````` ## Conclusion {#conclusion} The `dm_rows_...()` methods give you row-level granularity over the modifications you need to make to your relational model. Using the common `in_place` argument, they all can construct and verify your modifications before committing them. There are a few limitations, as mentioned in the tutorial, but these will be addressed in future updates to {dm}. ## Further Reading If this tutorial answered some questions, but opened others, these resources might be of assistance. Is your data in an RDBMS? `vignette("howto-dm-db")` offers a detailed look at working with an existing relational data model. If your data is in data frames, then you may want to read `vignette("howto-dm-df")` next. If you would like to know more about relational data models in order to get the most out of dm, check out `vignette("howto-dm-theory")`. If you're familiar with relational data models but want to know how to work with them in dm, then any of `vignette("tech-dm-join")`, `vignette("tech-dm-filter")`, or `vignette("tech-dm-zoom")` is a good next step.