--- title: "Quick start guide" author: "Martin Westgate & Dax Kellie" date: '2024-11-19' output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Quick start guide} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- galah is an R interface to biodiversity data hosted by the Global Biodiversity Information Facility ([GBIF](https://www.gbif.org)) and its subsidiary node organisations. GBIF and its partner nodes collate and store observations of individual life forms using the ['Darwin Core'](https://dwc.tdwg.org) data standard. # Installation To install from CRAN: ``` r install.packages("galah") ``` Or install the development version from GitHub: ``` r install.packages("remotes") remotes::install_github("AtlasOfLivingAustralia/galah") ``` Load the package ``` r library(galah) ``` # Configuration By default, galah downloads information from the Atlas of Living Australia (ALA). To show the full list of organisations currently supported by galah, use `show_all(atlases)`. ``` r show_all(atlases) ``` ``` ## # A tibble: 10 × 4 ## region institution acronym url ## ## 1 Australia Atlas of Living Australia ALA https://www.ala.org.au ## 2 Austria Biodiversitäts-Atlas Österreich BAO https://biodiversityat… ## 3 Brazil Sistemas de Informações sobre a Biodiversidade Brasileira SiBBr https://sibbr.gov.br ## 4 France Portail français d'accès aux données d'observation sur les espèces OpenObs https://openobs.mnhn.fr ## 5 Global Global Biodiversity Information Facility GBIF https://gbif.org ## 6 Guatemala Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt https://snib.conap.gob… ## 7 Portugal GBIF Portugal GBIF.pt https://www.gbif.pt ## 8 Spain GBIF Spain GBIF.es https://gbif.es ## 9 Sweden Swedish Biodiversity Data Infrastructure SBDI https://biodiversityda… ## 10 United Kingdom National Biodiversity Network NBN https://nbn.org.uk ``` Use `galah_config()` to set the node organisation using its region, name, or acronym. Once set, `galah` will automatically populate the server configuration for your selected GBIF node. To download occurrence records from your chosen GBIF node, you will need to register an account with them (using their website), then provide your registration email to galah. To download from GBIF, you will need to provide the email, username, and password. ``` r galah_config(atlas = "GBIF", username = "user1", email = "email@email.com", password = "my_password") ``` You can find a full list of configuration options by running `?galah_config`. # Basic syntax The standard method to construct queries in `{galah}` is via piped functions. Pipes in `galah` start with the `galah_call()` function, and typically end with `collect()`, though `collapse()` and `compute()` are also supported. The development team use the base pipe by default (`|>`), but the `{magrittr}` pipe (`%>%`) should work too. ``` r galah_config(atlas = "ALA", verbose = FALSE) galah_call() |> count() |> collect() ``` ``` ## # A tibble: 1 × 1 ## count ## ## 1 146185520 ``` To pass more complex queries, you can use additional `{dplyr}` functions such as `filter()`, `select()`, and `group_by()`. ``` r galah_call() |> filter(year >= 2020) |> count() |> collect() ``` ``` ## # A tibble: 1 × 1 ## count ## ## 1 40200358 ``` Each GBIF node allows you to query using their own set of in-built fields. You can investigate which fields are available using `show_all()` and `search_all()`: ``` r search_all(fields, "australian states") ``` ``` ## # A tibble: 2 × 3 ## id description type ## ## 1 cl2013 ASGS Australian States and Territories fields ## 2 cl22 Australian States and Territories fields ``` # Taxonomic searches To narrow your search to a particular taxonomic group, use `identify()`. Note that this function only accepts scientific names and is not case sensitive. It's good practice to first use `search_taxa()` to check that the taxa you provide returns the correct taxonomic results. ``` r search_taxa("reptilia") # Check whether taxonomic info is correct ``` ``` ## # A tibble: 1 × 9 ## search_term scientific_name taxon_concept_id rank match_type kingdom phylum class issues ## ## 1 reptilia REPTILIA https://biodiversity.org.au/afd/taxa/682e1228… class exactMatch Animal… Chord… Rept… noIss… ``` ``` r galah_call() |> identify("reptilia") |> filter(year >= 2020) |> count() |> collect() ``` ``` ## # A tibble: 1 × 1 ## count ## ## 1 338434 ``` If you want to query something other than the number of records, modify the `type` argument in `galah_call()`. Here we'll query the number of species: ``` r galah_call(type = "species") |> identify("reptilia") |> filter(year >= 2020) |> count() |> collect() ``` ``` ## # A tibble: 1 × 1 ## count ## ## 1 883 ``` # Download To download records---rather than find how many records are available---simply remove the `count()` function from your pipe. ``` r result <- galah_call() |> identify("Litoria") |> filter(year >= 2020, cl22 == "Tasmania") |> select(basisOfRecord, group = "basic") |> collect() ``` ``` ## Retrying in 1 seconds. ``` ``` r result |> head() ``` ``` ## # A tibble: 6 × 9 ## recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus ## ## 1 00052544-d943-42e9… Litoria ewing… https://biodi… -42.9 147. 2022-09-19 00:00:00 PRESENT ## 2 00168ca6-84d0-4af1… Litoria ranif… https://biodi… -41.2 146. 2023-12-21 10:20:19 PRESENT ## 3 001a43fe-8586-4064… Litoria ewing… https://biodi… -43.0 147. 2021-08-07 00:00:00 PRESENT ## 4 00250163-ec50-4eda… Litoria ranif… https://biodi… -41.2 147. 2023-08-23 11:49:28 PRESENT ## 5 003e0f63-9f95-4af9… Litoria ewing… https://biodi… -42.9 148. 2022-12-24 06:27:00 PRESENT ## 6 0070521f-bb45-46fb… Litoria ewing… https://biodi… -43.1 147. 2023-12-20 14:29:23 PRESENT ## # ℹ 2 more variables: dataResourceName , basisOfRecord ``` Check out our other vignettes for more detail on how to use these functions.