\documentclass{article} \usepackage{nameref} \usepackage{url} \usepackage{listings} \usepackage[utf8]{inputenc} \usepackage{longtable} \usepackage{afterpage} \usepackage{enumitem} \usepackage[normalem]{ulem} \lstset{language=R} \bibliographystyle{plain} % \VignetteIndexEntry{TR8: Extract traits data for plant species} %\VignetteDepends{TR8} \title{TR8: Extract traits data for plant species} \author{Gionata Bocci\\Pisa (ITALY)\\ {boccigionata@gmail.com}} \begin{document} \maketitle \SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,strip.white=true,keep.source=TRUE} \SweaveOpts{include=FALSE} \DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=2em} \DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=2em} \DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em} \fvset{listparameters={\setlength{\topsep}{0pt}}} \renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}} \section{Rationale} \label{sec:rationale} @ <>= options(width = 60) @ %def The \texttt{TR8} package has been built in order to provide the user with the possibility of easily retrieving traits data for plant species from the following publicly available databases: \begin{description} \item[Ecological Flora of the British Isles] \url{http://www.ecoflora.co.uk/} \cite{ecoflora} \item[LEDA traitbase] \url{http://www.leda-traitbase.org/LEDAportal/} \cite{leda} \item[Ellenberg values for Italian Flora] \cite{pignatti} \item[Flowering period for Italian Flora] \cite{pignatti} (data retrieved from \url{http://luirig.altervista.org/}) \item[Mycorrhizal intensity database] \cite{amf} \item[MycoFlor database]\cite{Hempel2013} \item[BROT]\cite{BROT_1}\cite{BROT_2} \end{description} Please note that not all the traits available on the listed databases are downloaded by the package: this may change in future versions of the package (ie. some functionalities may be added and more traits will be made available). As of September 2024 the following databases are no more available online (thus TR8 is not available to retrieve traits data): \begin{description} \item[Biolflor] \url{http://www.ufz.de/biolflor/index.jsp} \cite{biolflor} \item[Catminat database]\cite{catminat} \end{description} \section{Installation} \label{sec:installation} The \texttt{TR8} package is available on CRAN, thus it can be easily installed through: @ <>= install.packages("TR8",dependencies = TRUE) @ %def The option \texttt{dependencies = TRUE} takes care of installing those packages which are needed by \texttt{TR8} to work properly. Once the package is installed, you can load it with: @ <>= library(TR8) @ %def Please note that: \begin{description} \item[The user is asked to always cite the data sources: ] the development of traits databases is a long and costly process, thus all the users of the \texttt{TR8} package are asked (and reminded \textbf{every time} they load the package) to always cite the original sources of the data (see paragraph \ref{sec:citing}). \end{description} %% \subsection{A very important note for Windows users} %% If you want to be able to use data from the \texttt{Catminat} %% database you need to have \texttt{Perl} installed on your machine. %% If you do not have it, please go at %% \url{https://www.perl.org/get.html} and install either %% \texttt{Strawberry Perl} (the one I suggest) or \texttt{ActiveState Perl}. \subsection{Using the development version} The devel version of the package is hosted on github at \url{https://github.com/GioBo/TR8}: to use this version (instead of the stable one, released from CRAN), you'll need the \texttt{devtools} package (\url{https://github.com/hadley/devtools}): @ <>= ## install the package install.packages("devtools") ## load it library(devtools) ## activate dev_mode dev_mode(on=T) ## install TR8 install_github("GioBo/TR8",ref="master") ## load it library(TR8) ## you can now work with TR8 functions ## if you want to go back and use the CRAN version ## already installed, simply deactivate dev_mode dev_mode(on=F) @ %def \section{Simple usage} \label{sec:usage} Using the \texttt{TR8} package is fairly simple: users just need to call the \texttt{tr8} function passing, as arguments, a vector of plant species names (\textbf{without authors' names!}\footnote{This is needed since some traitbases do not include authors' names in the species' names.}) and a vector containing the codes corresponding to the traits which are to be downloaded: @ <>= ## a vector containing a list of plant species names my_species<-c("Apium graveolens","Holcus mollis","Lathyrus sylvestris") ## a vector of traits to_be_downloaded<-c("life_form_P","ell_L_it") ## now run tr8 and store the results in the my_traits object my_traits<-tr8(species_list = my_species,download_list = to_be_downloaded, allow_persistent=TRUE) @ %def The codes which are accepted by \textsc{tr8} are listed in the \texttt{available\_tr8} database: @ <>= library(TR8) @ %def @ <>= ## see the firs lines of available_tr8 database head(available_tr8) @ %def The database is composed of three columns: \begin{description} \item[short\_code] contains the codes that should be passed to the \texttt{download\_list} argument of the \texttt{tr8} function. \item[description] contains short description of each trait (please refer to the original sources for detailed descriptions). \item[db] refers to the databases from which are providing traits data \end{description} Suppose the user is interested in downloading the \textit{maximum height}, the \textit{leaf area} and the \textit{life form} (which are available through the \textit{Ecolfora} database) for \textit{Salix alba} and \textit{Populus nigra} and store the resulting data in the \texttt{my\_Data} object; the command should be: @ <>= my_species<-c("Salix alba","Populus nigra") my_traits<-c("h_max","le_area","li_form") my_Data<-tr8(species_list = my_species, download_list = my_traits, allow_persistent=TRUE) @ %def The \texttt{tr8} function will take care of downloading the data and store them in the \texttt{my\_Data} object; you can see the results by printing them: @ <>= ## see the downloaded data print(my_Data) @ %def Or you can convert them to a data frame using the \texttt{extract\_traits} function: @ <>= traits_dataframe<-extract_traits(my_Data) @ %def All the traits are now contained in a data frame with species as rows and columns as traits; where no trait data were available, you will see a \texttt{NA}. In order to make the dataframe more readable, traits' names (i.e. columns' names) are converted to shorter codes: to see a brief explanation of the codes used to identify the traits, use the \texttt{lookup} function: <>= lookup(my_Data) @ %def The object returned by the \texttt{lookup} function can also be stored in order to be available for further elaborations: @ <>= my_lookup<-lookup(my_Data) head(my_lookup) @ %def \subsection{Checking retrieved data} Several steps can go wrong during the data retrieval process (e.g. a database may contain two entries for the same species); \texttt{tr8()} will keep track of some of this problematic cases and the \texttt{issues()} function can be used to see whether any problem was faced during the process. @ <>= my_species<-c("Salix alba","Populus nigra") my_traits<-c("h_max","le_area","li_form") my_Data<-tr8(species_list = my_species, download_list = my_traits, allow_persistent=TRUE) issues(my_Data) @ %def \subsection{Interactive use of \texttt{tr8}} Up to now we've been using the \texttt{tr8} function in a non-interactive way. In order to help those user which are more familiar with a GUI approach, the function can also be run setting the \texttt{gui\_config} parameter to \texttt{TRUE} (without providing any trait to the \texttt{download\_list} parameter) and a multi-panel window will appear: the user is asked to choose those traits which are to be downloaded from the various databases \footnote{A note for Mac users: the GUI relies on the \texttt{Tcl/Tk} toolkit, thus if you want to run the GUI, please make sure that the \texttt{X11} package is installed - see "Tcl/Tk" issues at \url{http://cran.r-project.org/bin/macosx/RMacOSX-FAQ.html}}. For a detailed explanation of each level of a trait, please refer to the original websites (all the databases listed in the references provide the users with very precise and detailed descriptions). Typically users will have their vegetation data in the form of a \textit{sites}*\textit{species} dataframe (or matrix), thus they may want to extract traits data for the whole dataset (this time using the GUI to select traits), i.e. : @ <>= ## suppose veg_data is our dataframe with ## plant species as columns and sites as rows ## extract species names specie_names<-names(veg_data) ## use the tr8() function ## and tick those traits of interest in the pop-up window my_traits<-tr8(species_names,gui_config=TRUE, allow_persistent=TRUE) ## print the results print(my_traits) @ %def \section{Interpreting retrieved data} \label{sec:interpreting} Please note that for many traits there is more than one entry in the original databases: in those cases, in order to obtain a single value the following strategy was adopted: \begin{description} \item[Quantitative traits] the mean of all the values was calculated (e.g. when multiple values for "Seed weight mean" are available, the mean of these value is calculated) \item[Qualitative traits] all the values are taken into account and "joined" together in a single string (the values are separated by a score "$-$") \end{description} \textbf{Nota bene}: in some cases some traits are stored as \textit{string} in the original databases, even though they should be treated as numbers (e.g. the number \textit{five} is stored as a string - i.e. "5", not as the numeric value \texttt{5}): in those case \texttt{tr8} function is not able to interpret that entry as a numeric, thus, applying the above mentioned criteria to merge multiple traits, strange outputs may result from \texttt{tr8}, e.g. if a species has two entries for the trait \texttt{height} - day 3 and 3.5 meters - the merged value will not be the numeric mean (3.25) but the union of the two strings ("3-3.5"). \textbf{Note for BROT database}: BROT database contains several entries for the same trait for some species: this may be due to the fact that several sources of data were used and/or that data for different regions were reported. In the current version of \texttt{TR8} these multiple data are treated as follows: \begin{itemize} \item if there are both numeric (quantitative) and string (qualitative) entries, numeric ones are given the precedence and the mean of all numeric entries is used as the returned trait value \item if several qualitative entries are present (but no quantitative ones), only the most abundant level is reported (in case ties are present, eg. two levels are reported the same number of times, both values are reported, separated by a comma). \end{itemize} \section{Citing the package} Please use the following citation when using \texttt{TR8} package:\\ \vspace{3mm} Bocci, G. (2015). TR8: an R package for easily retrieving plant species traits. Methods in Ecology and Evolution, 6(3):347-350. \vspace{3mm} Or, if you use BibTeX: \begin{verbatim} @Article{, author = {Bocci Gionata}, title = {TR8: an R package for easily retrieving plant species traits}, journal = {Methods in Ecology and Evolution}, year = {2015}, volume = {6}, number = {3}, pages = {347--350}, url = {http://dx.doi.org/10.1111/2041-210X.12327}, } \end{verbatim} \section{Citing sources of information} \label{sec:citing} Users of the \texttt{TR8} package should always cite the sources of information which provided the traits data: the correct citations to be used for the retrieved data can be obtained through the \texttt{bib} method; just use: <>= bib(my_traits) @ %def % \section{Some notes on using \texttt{tr8}} % \subsubsection*{A NOTE OF CAUTION} % Searching the web is a time (and % Internet band) consuming activity, thus the higher the number of % your plant species and the traits to be retrieved, the longer it will take to \texttt{tr8()} to complete its job. Moreover, in order % not to overflow the remote databases with \texttt{http} requests, the \texttt{tr8} function will always pause between one search and the following one. % \subsubsection*{A (SECOND) NOTE OF CAUTION} % Some users adopt the following workflow for analysing their vegetation data: % \begin{enumerate} % \item insert vegetation data into a \textit{spreadsheet file} with species as % columns' and sites' as rows % \item export the spreadsheet file as a \texttt{.csv} file % \item import the \texttt{.csv} file into a \textbf{R} dataframe. % \end{enumerate} % When following these steps, a dot (".") will be inserted between % Genus and Species of each plant species name (i.e. column names in the % \texttt{R} dataframe will not be in the form \texttt{c("Abies alba", "Salix % alba")} but in the form \texttt{c("Abies.alba", "Salix.alba")}). % This may cause problems for further processing of plants' % species names, thus, in order to avoid this problem, please use the \texttt{check.names=F} % option in \texttt{read.csv}. E.g. suppose that % \texttt{my\_veg\_data.csv} is the \texttt{csv} file: in the % \textbf{R} console, one should use: % <>= % My_data<-read.csv("my_veg_data.csv", % header=T,row.names=1,check.names=F) % @ %def \section{Suggested workflow} We strongly suggest to always check plant species names with the \texttt{tnrs} function (from the \texttt{taxize} package) before using the \texttt{tr8} function; thus a typical workflow would be the following: \begin{enumerate} \item Check plant species names (e.g. with something like the following - please refere to the \texttt{taxize} package documentation\cite{taxize} for further details) <>= species_names<-names(veg_data) checked_names<-tnrs(species_names,source="iPlant_TNRS") print(checked_names) @ %def Check which species (rows) in the table have a "score" value lower than 1 and check their names; if needed, correct them before using the tr8() function. \item Run \texttt{tr8} (in this case using the GUI): <>= my_traits<-tr8(species_names,gui_config = TRUE, allow_persistent=TRUE) print(my_traits) @ %def Check whether \texttt{tr8()} had any problems in retrieving data: <>= my_traits<-tr8(species_names,gui_config = TRUE, allow_persistent=TRUE) issues(my_traits) @ %def \item You may want to have these traits available as a data frame: just use the \texttt{extract\_traits} function which uses the results of \texttt{tr8} (in this case it's the \texttt{my\_traits} objects) and returns a data frame. <>= traits_df<-extract_traits(my_traits) @ %def \item Observing a big data frame inside \texttt{R} could be difficult, thus users may want to save the \texttt{traits\_df} data frame as a \texttt{.csv} file: @ <>= save(traits_df,file="traits_df.csv") @ %def and then open that file with a spreadsheet software (e.g. LibreOffice). \end{enumerate} \subsection{Dealing with synonyms} The new versions of tr8 does not allow to check for plant species synonims any more (given some issues with remote databases needed for checking species names). % In some cases the same species may be present in different traitbases % under different synonyms. In order to make it easier for the user to deal with % this issue, \texttt{tr8} function has a parameter called % \texttt{synonyms}: when set to true it forces the \texttt{tr8} to call the % \texttt{tnrs} function from the \texttt{taxize} package and retrieve % possible synonims for the provided species names: in this case the % retrieved trait data will have two additional columns, one called % \texttt{synonyms} (for the synonyms found) and another named % \textit{original\_names} which contains the original names passed by % the user with the \texttt{species\_list} % parameter\footnote{\textbf{PLEASE NOTE} that when \texttt{synonyms} % is set to TRUE, the resulting dataframe will have numbers as % row.names: this is mandatory since in a few cases different % species names may share a common synonym and % \texttt{R} does not allow different % rows to share the same name.}. % @ % <>= % my_species<-c("Salix alba","Inula viscosa") % my_traits<-c("h_max","le_area","li_form") % my_Data<-tr8(species_list = my_species, download_list = my_traits, % synonyms=TRUE) % @ %def % In the \texttt{Catminat} traitbase there are some entries which are % in the form "Genus v. subspecies" (e.g. "Myrtus communis % v. communis" and "Myrtus communis v. leucocarpa".); in order to % allow users to search among those subspecies, the parameter % \textsc{catminat\_alternatives} has been introduced in the % \textsc{tr8} function: when set to \textsc{TRUE} the function will % search for entries which contain, in their names, the names % provided \textsc{species\_list}; e.g. if "Myrtus communis" is % included in the \textsc{species\_list}, tr8 will query the following % existing entries in Catminat: "Myrtus communis", "Myrtus communis % v. communis" and "Myrtus communis v. leucocarpa"). % @ % <>= % my_species<-c("Myrtus communis") % ## some traits from Catminat % my_traits<-c("inflorescence_fr","sex_reprod_fr","poll_vect_fr") % my_Data<-tr8(species_list = my_species, download_list = my_traits, % catminat_alternatives=TRUE) % @ %def \section{Further steps} The \texttt{TR8} package comes with another vignette, called \texttt{TR8\_workflow}, which shows a typical workflow describing all the steps needed for retrieving and analysing traits data with \texttt{tr8}, listing the most common problems that could be faced and the possible solutions to fix them.\\ The vignette can be opened from within R, using: @ <>= vignette("TR8_workflow") @ %def Another vignette (called \texttt{Expanding\_TR8}) shows to programmers how sources of data can be added to \texttt{TR8} (i.e. how functions for retrieving data should be written so that they can be easily integrated in \texttt{TR8}). \section{Local storage of remote data} \label{sec:leda} The following databases are stored as files (\texttt{.txt},\texttt{.csv} or \texttt{xlsx}) on the remote servers: \begin{itemize} \item LEDA \item Akhmetzhanova \item MycoFlor % \item Catminat \end{itemize} %% LEDA Traitbase datafiles and Akhmetzhanova database are text files (either \texttt{.txt} or \texttt{.csv} files) which are %% available for download at the LEDA (\url{http://www.leda-traitbase.org/LEDAportal/data_files.jsp}) and at the\textit{ Ecological Archives} websites . These files are (quite) big in size, thus downloading them every time the \texttt{tr8()} function is used is a time consuming activity\footnote{The text files are not distributed together with the \texttt{TR8} package - which would save time and memory when executing the \texttt{tr8()} function - in order to avoid possible licensing conflicts between the \texttt{TR8}' GPL license and these datasets.}. In order to make data retrieval more efficient, when \texttt{tr8} is run AND traits data from the above mentioned databases are requested for the first time, these files are downloaded and an R version (\texttt{.Rda}) copy is stored in a local directory and made available to future requests\footnote{By default these files will be installed in the directories which are commonly used for storing applications' data (which depends on the underlying operating systems; see \url{https://github.com/hadley/rappdirs} for details).}. \section{Update local lookup tables} For some traitbases tr8 uses lookup-tables which contain URLs of the species of interest (i.e. whenever the user search for traits data for some species, tr8 searches for the corresponding URLs in such tables and then retrieve the data from the corresponding webpages); some of the traitbases queried by tr8 are uploaded from time to time thus the function may be unable to retrieve traits data for the most recently uploaded species. The \textsc{tr8\_setup} function allows the user to download the most updated version of such tables; simply run If the user wants to have available the most updated data, @ <>= tr8_setup() @ %def \textbf{BEWARE}: this function takes a long time to run; it's very likely that the most recent version of \textsc{TR8} contains the already up-to-date version of the data, so in most cases will never need to use this function. \bibliography{tr8} \end{document}