---
title: "mpathr"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{mpathr}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(mpathr)
```
The main goal of `mpathr` is to provide functions to import data from the m-Path
platform, as well as provide functions for common manipulations for
ESM data.
## Importing m-Path data
To show how to import data using `mpathr`, we provide example data within
the package:
```{r show m-Path example data}
mpath_example()
```
As shown above, the package comes with an example of the `basic.csv` that can be
exported from the m-Path platform.
To read this data into R, we can use the `read_mpath()` function. We will also
need a path to the meta data. The meta data is a file that contains information
about the data types of each column, as well as the possible responses for
categorical columns.
The main advantage of using `read_mpath()`, as opposed to other functions like
`read.csv()`, is that `read_mpath()` uses the meta data to correctly interpret the
data types. Furthermore it will also automatically convert columns that store
multiple responses into lists. For a response with multiple options like `1,4,6`,
`read_mpath()` will store a list with each number, which facilitates further
preprocessing of these responses.
We can obtain the paths to the example basic data and meta data
using the `mpath_example()` function:
```{r use read_mpath}
# find paths to example basic and meta data:
basic_path <- mpath_example(file = "example_basic.csv")
meta_path <- mpath_example("example_meta.csv")
# read the data
data <- read_mpath(
file = basic_path,
meta_data = meta_path
)
data
```
#### Saving m-Path data
The resulting data frame will contain columns with lists,
which can be problematic when saving the data. To save the data, we suggest the
following two options:
If you want to save the data as a comma-separated values (CSV) file to use it in another program,
use `write_mpath()`. This function will collapse most list columns to a single string and parses
all character columns to JSON strings, essentially reversing the operations performed by
`read_mpath()`. Note that this does not mean that data can be read back using `read_mpath()`,
because the data may have been modified and thus no longer be in line with the meta data.
```{r write data as csv, eval = FALSE}
write_mpath(
x = data,
file = "data.csv"
)
```
Otherwise, if the data will be used exclusively in R, we suggest saving it as an R object (.RData
or .RDS):
```{r write data as an R object, eval = FALSE}
# As an .RData file. When using `load()`, note that the data will be stored in the `data` object
# in the global environment.
save(
data,
file = 'data.RData'
)
# As an RDS file.
saveRDS(
data,
file = 'data.RDS'
)
```
## Obtaining response rates
### response_rate function
Some common operations that are done on Experience Sampling Methodology (ESM) data have to do with
the participants' response rate. We provide a function `response_rate()` that
calculates the response_rate per participant for the entire duration of the
study, or for a specific time frame.
This function takes as argument a `valid_col`, that takes a logical column that
stores whether the beep was answered by the participant, or not, as well as a
`participant_col`, that identifies each distinct participant.
We will show how to use this function with the `example_data`, that contains data from the same
study as the `example_basic.csv` file, but after some cleaning.
```{r calculate response rate}
example_data
response_rates <- response_rate(
data = example_data,
valid_col = answered,
participant_col = participant
)
response_rates
```
The function returns a data frame with:
* The `participant` column, as specified in `participant_col`
* The `number_of_beeps` used to calculate the response rate.
* The `response_rate` column, which is the proportion of valid responses
(specified in `valid_col`) per participant.
The output of this function can further be used to identify participants with
low response rates:
```{r show low response rates}
response_rates[response_rates$response_rate < 0.5,]
```
We could also be interested in seeing the participants' response rate during
a specific period of time (for example, if we think a participant's compliance
significantly dropped a certain date). In this case, we should supply the
function with the (otherwise optional) argument `time_col`, that should contain
times stored as `POSIXct` objects, and specify the date period that we are
interested in (in the format `yyyy-mm-dd` or `yyyy/mm/dd`):
```{r calculate response rate after 15th of May 2024}
response_rates_after_15 <- response_rate(
data = example_data,
valid_col = answered,
participant_col = participant,
time_col = sent,
period_start = '2024-05-15'
)
```
This will return the participant's response rate after the 15th of May 2024.
```{r show low response rates after 15th of May 2024}
response_rates_after_15
```
### plot_response_rate function
We also suggest a way to plot the participant response rates, to identify
patterns like response rates dropping over time. For this, we provide the `plot_response_rate()` function.
```{r plot response rate, fig.width=7, fig.height=5}
plot_response_rate(
data = example_data,
time_col = sent,
participant_col = participant,
valid_col = answered
)
```
Note that the resulting plot can be further customized using the `ggplot2`
package.
```{r customize plot response rate plot, fig.width=7, fig.height=5}
library(ggplot2)
plot_response_rate(
data = example_data,
time_col = sent,
participant_col = participant,
valid_col = answered
) +
theme_minimal() +
ggtitle('Response rate over time') +
xlab('Day in study')
```