--- title: 'gtfs2gps: Converting GTFS data to GPS-like format' author: "Rafael H. M. Pereira, Pedro R. Andrade, Joao Bazzo" output: rmarkdown::html_vignette abstract: Package `gtfs2gps` has a set of functions to convert public transport GTFS data to GPS-like format using `data.table`. It also has some functions to convert both representations to simple feature format. urlcolor: blue vignette: | %\VignetteIndexEntry{gtfs2gps} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction Package `gtfs2gps` allows users to convert public transport GTFS data into a single `data.table` format with GPS-like records, which can then be used in various applications such as running transport simulations or scenario analyses. Before using the package, just install it from GitHub. ```{r, eval = FALSE} install.packages("gtfs2gps") ``` # Loading data After loading the package, GTFS data can be read into R by using `read_gtfs()`. This function gets a zipped GTFS file and returns a list of `data.table` objects. The returning list contains the data of each GTFS file indexed according to their file names without extension. ```{r} library("gtfs2gps") poa <- read_gtfs(system.file("extdata/poa.zip", package ="gtfs2gps")) names(poa) head(poa$trips) ``` Note that not all GTFS files are loaded into R. This function only loads the necessary data to spatially and temporally handle trips and stops, which are: "shapes.txt", "stop_times.txt", "stops.txt", "trips.txt", "agency.txt", "calendar.txt", "routes.txt", and "frequencies.txt", with this last four being optional. If a given GTFS zipped file does not contain all of these required files then `read_gtfs()` will stop with an error. In the code below we filter only the shape ids `c("T2-1", "A141-1") to allow faster execution the next scripts. ```{r} object.size(poa) |> format(units = "Kb") poa_small <- gtfstools::filter_by_shape_id(poa, c("T2-1", "A141-1")) object.size(poa_small) |> format(units = "Kb") ``` We can then easily convert the data to simple feature format and plot them. ```{r poa_small_shapes_sf, message = FALSE} poa_small_shapes_sf <- gtfs2gps::gtfs_shapes_as_sf(poa_small) poa_small_stops_sf <- gtfs2gps::gtfs_stops_as_sf(poa_small) plot(sf::st_geometry(poa_small_shapes_sf)) plot(sf::st_geometry(poa_small_stops_sf), pch = 20, col = "red", add = TRUE) box() ``` After subsetting the data, it is also possible to save it as a new GTFS file using `write_gtfs()`, as shown below. ```{r, message = FALSE} temp_gtfs <- tempfile(pattern = 'poa_small', fileext = '.zip') gtfs2gps::write_gtfs(poa_small, temp_gtfs) ``` # Converting to GPS-like format To convert GTFS to GPS-like format, use `gtfs2gps()`. This is the core function of the package. It takes a GTFS zipped file as an input and returns a `data.table` where each row represents a 'GPS-like' data point for every trip in the GTFS file. In summary, this function interpolates the space-time position of each vehicle in each trip considering the network distance and average speed between stops. The function samples the timestamp of each vehicle every $15m$ by default, but the user can set a different value in the `spatial_resolution` argument. See the example below. ```{r, message = FALSE} poa_gps <- gtfs2gps(temp_gtfs, spatial_resolution = 100) head(poa_gps) ``` The following figure maps the first 100 data points of the sample data we processed. They can be converted to `simple feature` points or linestring. ```{r, message = FALSE} poa_gps60 <- poa_gps[1:100, ] # points poa_gps60_sfpoints <- gps_as_sfpoints(poa_gps60) # linestring poa_gps60_sflinestring <- gps_as_sflinestring(poa_gps60) # plot plot(sf::st_geometry(poa_gps60_sfpoints), pch = 20) plot(sf::st_geometry(poa_gps60_sflinestring), col = "blue", add = TRUE) box() ``` The function `gtfs2gps()` automatically recognizes whether the GTFS data brings detailed `stop_times.txt` information or whether it is a `frequency.txt` GTFS file. A sample data of a GTFS with detailed `stop_times.txt` cab be found below: ```{r, message = FALSE} poa <- system.file("extdata/poa.zip", package ="gtfs2gps") poa_gps <- gtfs2gps(poa, spatial_resolution = 50) poa_gps_sflinestrig <- gps_as_sfpoints(poa_gps) plot(sf::st_geometry(poa_gps_sflinestrig[1:200,])) box() ``` # Methodological note For a given trip, the function `gtfs2gps` calculates the average speed between each pair of consecutive stops given by the ratio between cumulative network distance `S` and departure time `t` for a consecutive pair of valid stop_ids (`i`), ```{r equation, echo = FALSE, message = FALSE} knitr::include_graphics("https://github.com/ipeaGIT/gtfs2gps/blob/master/man/figures/equation1.png?raw=true") ``` Since the beginning of each trip usually starts before the first stop_id, the mean speed cannot be calculated as shown in the previous equation because information on `i` period does not exist. In this case, the function consider the mean speed for the whole trip. It also happens after the last valid stop_id (`N`) of the trips, where info on `i + 1` also does not exist. ```{r speed, echo = FALSE, message = FALSE} knitr::include_graphics("https://github.com/ipeaGIT/gtfs2gps/blob/master/man/figures/speed.PNG?raw=true") ``` # Final remarks If you have any suggestions or want to report an error, please visit the GitHub page of the package [here](https://github.com/ipeaGIT/gtfs2gps).