
Rapid LC - fungi workflow
Source:vignettes/SIS_connect_fungi_workflow.Rmd
SIS_connect_fungi_workflow.RmdThe following workflow demonstrates how to search and clean open data on fungi species in order to test whether they are likely to be evaluated as Least Concern. With selected LC species, you can then generate CSV files with the minimal required information to support IUCN Red List Least Concern assessments.
The LCr and rCAT packages are not on CRAN,
but can be installed via the github repository:
# install LCr and rCAT2 from github
remotes::install_github("stevenpbachman/LCr")
# load relevant libraries
library(LCr)
library(dplyr)
library(rCAT)Load in a list of names
Start by providing a dataframe with your taxon list.
# dataframe of species you want to run through LCr
lc_species <-
data.frame(sp = c(
"Morchella americana Clowez & Matherly",
"Hericium americanum Ginns",
"Lactarius badiosanguineus Kühner & Romagn"
))Match names against GBIF to get identifiers
Run the get_name_keys function using your dataframe of
names as input. This will check the names against the GBIF names
backbone. The name_column is the column that contains the
taxa to be assessed. This can either contain a binomial or a binomial
and author combined. Including the author with the binomial will help
achieve a better name match.
By default the GBIF matching will select the match with highest
confidence, but you can also set this to fuzzy match using
match = "any", which may provide multiple matches.
# get the GBIF identification key
lc_keys <-
LCr::get_name_keys(
df = lc_species,
name_column = "sp",
match = "single",
kingdom = "fungi"
)Let’s take a look at the output:
dplyr::glimpse(lc_keys)Clean keys
LCr is designed to work at species level and where taxonomic status
is accepted. In this case, all names matched the GBIF backbone and were
accepted, but we can run clean_keys anyway.
# remove any problematic records
lc_keys_clean <- LCr::clean_keys(lc_keys)Get GBIF occurrence data
Now with a clean set of names and identification keys we can query GBIF to obtain occurrence data.
Note that we will use the rGBIF package to obtain the
occurrence data. You will need to set up your GBIF credentials to obtain
the downloads. After you have set up an account at GBIF you need to register your
credentials in the R environment - see this post
for an explanation. You may need to restart R after updating the R
environment.
The get_gbif_occs function has two modes - search or
download. Search mode uses rgbif::occ_search and provides a
fast, exploratory search. Use the download mode when you want to
generate a citable DOI for your assesment, accessed
viagbif_occs$citation.
# get the raw GBIF occs - with timer
start_time <- Sys.time()
gbif_occs <- LCr::get_gbif_occs(lc_keys_clean, mode = "search")
end_time <- Sys.time()
check <- end_time - start_time
print(check)Data quality checks
We can now run some tests, mostly based on the geospatial
information, and flag records that are potentially problematic and
should be removed. As we don’t currently have a dataset for native
ranges for fungi we ignore the native_ranges and
buffer parameters of the check_occs function.
The output will be two objects, the flagged data and a summary of the
flagged records.
# run the occurrence quality checks
checked_occs <- LCr::check_occs(gbif_occs$points)
# flagged occurrences and summary table
flagged_occs <- checked_occs$checked_data
flagged_sum <- checked_occs$summaryNow we can clean the data using the clean_occs function
with the default settings, meaning all flagged occurrences are removed.
We then create objects containing valid data (cleaned) and problem data
(flagged as problematic). You can save these data for your records.
# Clean using all available flags
cleaned_result <- LCr::clean_occs(flagged_occs)
# Create objects for clean and problem records
valid_data <- cleaned_result$clean_occs
problem_data <- cleaned_result$problem_occs
# For reference you may want to save these down as csv files
write.csv(valid_data, file = "valid_occs.csv", row.names = FALSE)
write.csv(problem_data, file = "problem_occs.csv", row.names = FALSE)Run the Least Concern tests
Now we have a clean dataset that is ready for the Least Concern test. Depending on the data available, either 4 or 5 metrics (Number of WGSRPD regions for plants only) are generated and checked against the following default thresholds:
- Extent of occurrence (EOO) > 30000
- Area of occupancy > 3000
- Number of cleaned records>75
- Number of WGSRPD regions >5
- Number of recent (<30 years) occurrence records >50
The core parameters (EOO and AOO) must both be above the thresholds and at least 2 of 3 remaining parameters (number of cleaned points, regions, recent records) must also be above the thresholds.
# check EOO, AOO, number of records and number of recent records
lc_test <- LCr::make_metrics(valid_data, keys = lc_keys_clean)Now we can see which species are estimated to be LC using the
leastconcern column and can filter on all those estimated
to be LC. However, it should be noted that the user has full control
over which species they want to designate as Least Concern. Other
predictive methods to estimate LC such as Bachman et al. (2024)
can be used instead of this simple test.
Generate the SIS CSV files
Now we have the final set of species and can proceed with generating the IUCN standard point file and the SIS Connect CSV files.
Note: Remember that the valid occurrence data was generated and cleaned based on the full species list, but we have now reduced that list to only the species we think are LC. When we generate the IUCN standard point file we need to make sure that we filter the occurrence data by the chosen LC species.
You will also need to define some parameters for the point file and CSV files:
# define parameters - the person who is assessing the species
first_name <- "Steven"
second_name <- "Bachman"
email = "s.bachman@kew.org"
institution = "Royal Botanic Gardens, Kew"
# filter the valid occurrence data on the LC species
valid_data_LC <- valid_data %>%
filter(taxonKey %in% lc_final$GBIF_usageKey)
# generate the point file
sis_point_file <- LCr::make_sis_occs(valid_data_LC,
first_name = first_name,
second_name = second_name,
institution = institution)
# write down the SIS point file
write.csv(sis_point_file,
file = "SIS_points.csv",
row.names = FALSE,
fileEncoding = "UTF-8",
na = "")The make_sis_occs function includes some validation
checks so may report issues with the file that will need to be resolved.
In this case, write down the file, fix errors, import the file and rerun
the make_sis_occs function. Note that when writing the
final SIS point file we specify UTF-8 encoding and convert
NA values to blanks (na = "") in order to
ensure the point file is compliant with IUCN standards.
We can now run the make_sis_csvs function to
automatically generate the CSV files
One final step is to generate ‘native ranges’. We don’t have native ranges for fungi in the same we have for plants where species’ ranges are recorded using the World Geographic Scheme for Recording Plant Distributions (WGSRPD). However, we can generate the same WGSRPD regions based on where our cleaned occurrence records are found. We can then use this as the basis for making the list of IUCN land regions as used in the countries.csv file.
# get WGSRPD regions based on points
native_ranges <- get_occs_range(sis_point_file)
# get SIS files
lc_sis_files <- LCr::make_sis_csvs(unique_id = lc_final$GBIF_usageKey,
first_name = "Steven",
second_name = "Bachman",
email = "s.bachman@kew.org",
institution = "Royal Botanic Gardens, Kew",
family = lc_final$GBIF_family,
genus = lc_final$GBIF_genus,
species = lc_final$GBIF_species,
gbif_ref = gbif_occs$citation,
taxonomicAuthority = lc_final$GBIF_authorship,
occs = sis_point_file,
native_ranges = native_ranges,
kingdom = "fungi"
)And finally, we zip them up ready to be sent to SIS Connect
# final step - make the zip file
make_zip(lc_sis_files)Register for SIS Connect and then log in: https://connect.iucnredlist.org/
Upload the ZIP file and wait for the automated validation to complete. This will take a few minutes depending on the number of species in your file. You should receive an email from noreply@iucnredlist.org alerting you when the validation is complete.
Open the working set in SIS and check for any issues. You can manually fix issues by opening up the CSV files and editing them directly and saving them as a zip file again. When you are happy that the files are ready you can select the option to submit to the Red List unit. This should be followed up with an email to Craig Hilton-Taylor.
References
Bachman, S.P., Brown, M.J.M., Leão, T.C.C., Nic Lughadha, E. & Walker, B.E. (2024). Extinction risk predictions for the world’s flowering plants to support their conservation. New Phytologist. https://doi.org/10.1111/nph.19592