Rapid LC - plants workflow • LCr

The aim of LCr is to speed up the process of adding Least Concern (LC) species to the IUCN Red List. We’ll start with a list of plant species and end up with a zip file of CSV files that contain the minimal required information to support a Least Concern assessment. The zip file can then be uploaded into the IUCN Species Informaiton System (SIS) via SIS Connect (requires registration) where the draft assessments can be edited, reviewed and hopefully published on the IUCN Red List in due course. Spatial data to support the assessment is also generated by LCr.

Before you start, make sure you have the rWCVP package installed along with the associated data package rWCVPdata. For more information see the Getting Started guide.

Load LCr and other relevant libraries:

# install from github for first time use
remotes::install_github("stevenpbachman/LCr")
remotes::install_github('matildabrown/rWCVPdata')

# load libraries
library(LCr)
library(rCAT)
library(dplyr)
library(rWCVP)

Get name keys from a species list

The first step is to determine the list of LC species that you want to document. Bachman et al. (2024) predicted extinction risk for all species of flowering plants (Angiosperms). You can filter on species that are confidently predicted to be LC, or manually generate a data frame from a list as shown below.

Note that predictions can be wrong so please verify that your selected species are genuine LC species. A useful resource to help determine whether your species is LC is to check whether it has previously been assessed. The ThreatSearch resource maintained by BGCI contains evidence-based plant conservation assessments compiled from digital resources including national/regional Red Lists. You can also run the built-in LC test function make_metrics, see below.

# dataframe of species you want to run through LCr
lc_species <-
  data.frame(sp = c(
    "Crabbea acaulis", "Crabbea nana", "Crabbea velutina"
  ))

Run the get_name_keys function using your dataframe of names as input. This will check the names against the GBIF names backbone as well as the World Checklist of Vascular Plants using the rWCVP package. The name_column is the column that contains the taxa to be assessed. This can either contain a binomial or a binomial and author combined. Including the author with the binomial will help achieve a better name match.

In this case we want to enforce a single matching name for every name in our list so we set the match parameter to ‘single’, but you can set this to ‘multiple’ if you wish to allow multiple matches.

# get the GBIF and WCVP identification keys
lc_keys <-
  get_name_keys(
    df = lc_species,
    name_column = "sp",
    match = "single",
    kingdom = "plantae"
  )

Let’s take a look at the output:

glimpse(lc_keys)

The name matching went well. I’ve checked the authors and I’m happy that the names match to the correct concept for these taxa. Note that it is worth spending time with the name matching step to ensure you are using a consistent concept for the taxa you are working with across GBIF and WCVP.

We now have some useful fields to help us find more data for these species. The GBIF_usageKey is an identifier for species according to the GBIF name backbone and the wcvp_ipni_id identifies the species according to the WCVP name backbone.

Clean keys

LCr is designed to work at species level and where taxonomic status is accepted. The clean_keys will remove any problematic records i.e. where a name was not at species level, or either of the name sources (GBIF, WCVP) does not treat the name as accepted. In some cases you may want to override the results, for example if GBIF treats a name as a synonym, but WCVP treats it as accepted you can set the parameter override_gbif_status = TRUE and clean_keys will treat the GBIF name as accepted at species level. Similarly, set override_wcvp_status = TRUE if you want to do the same for WCVP. However, note that using a WCVP name that is not accepted will cause problems with the workflow downstream e.g. native ranges and other information from WCVP are only available for accepted species.

# remove any problematic records
lc_keys_clean <- LCr::clean_keys(lc_keys)

Get GBIF occurrence data

Now with a clean set of names and identification keys we can query GBIF to obtain occurrence data.

Note that we will use the rGBIF package to obtain the occurrence data. You will need to set up your GBIF credentials to obtain the downloads. After you have set up an account at GBIF you need to register your credentials in the R environment - see this post for an explanation. You may need to restart R after updating the R environment.

The get_gbif_occs function has two modes - search or download. Search mode uses rgbif::occ_search and provides a fast, exploratory search. Use the download mode when you want to generate a citable DOI for your assesment, accessed viagbif_occs$citation.

# get the raw GBIF occs - with timer
start_time <- Sys.time()
gbif_occs <- LCr::get_gbif_occs(lc_keys_clean, mode = "search")
end_time <- Sys.time()
check <-  end_time - start_time
print(check)

Get native ranges

We now need to extract native range information for our species from the World Checklist of Vascular Plants dataset.

# get native ranges from WCVP- used for cleaning occs, and also for country list for SIS
native_ranges <- LCr::get_native_range(keys = lc_keys_clean)

Data quality checks

We can now run some tests, mostly based on the geospatial information, and flag records that are potentially problematic and should be removed. The output will be two objects, the flagged data and a summary of the flagged records.

# run the occurrence quality checks
# if you don't want to run native range cleaning check, just leave out native_ranges
# you can also adjust the buffer to account for the coarse WGSRPD polygons.
# the default is a 2 km buffer
checked_occs <- LCr::check_occs(gbif_occs$points, native_ranges, buffer = 2000)

# flagged occurrences and summary table
flagged_occs <- checked_occs$checked_data
flagged_sum <- checked_occs$summary

Now we can clean the data using the clean_occs function with the default settings, meaning all flagged occurrences are removed. We then create objects containing valid data (cleaned) and problem data (flagged as problematic). You can save these data for your records.

# Clean using all available flags
cleaned_result <- LCr::clean_occs(flagged_occs)

# Create objects for clean and problem records
valid_data <- cleaned_result$clean_occs
problem_data <- cleaned_result$problem_occs

# For reference you may want to save these down as csv files
write.csv(valid_data, file = "valid_occs.csv", row.names = FALSE)
write.csv(problem_data, file = "problem_occs.csv", row.names = FALSE)

Run the Least Concern test

Now we have a clean dataset that is ready for the Least Concern test. Depending on the data available, either 4 or 5 metrics (Number of WGSRPD regions for plants only) are generated and checked against the following default thresholds:

Number of cleaned records (NOP) >75
Extent of occurrence (EOOkm2) > 30000
Area of occupancy (AOOkm) > 3000
Number of WGSRPD regions (WGSRPD_count) >5
Number of recent (<30 years) occurrence records (recent_records) >50

The core parameters (EOO and AOO) must both be above the thresholds and at least 2 of 3 remaining parameters (number of cleaned points, regions, recent records) must also be above the thresholds.

# check EOO, AOO, number of records and number of recent records
lc_test <- LCr::make_metrics(valid_data, keys = lc_keys_clean)

Now we can see which species are estimated to be LC using the leastconcern column and can filter on all those estimated to be LC. However, it should be noted that the user has full control over which species they want to designate as Least Concern. Other predictive methods to estimate LC such as Bachman et al. (2024) can be used instead of this simple test.

# filter on LC species
lc_final <- lc_test %>% dplyr::filter(leastconcern == "TRUE")

Generate the SIS CSV files

Now we have the final set of species and can proceed with generating the IUCN standard point file and the SIS Connect CSV files.

Note: Remember that the valid occurrence data was generated and cleaned based on the full species list, but we have now reduced that list to only the species we think are LC. When we generate the IUCN standard point file we need to make sure that we filter the occurrence data by the chosen LC species.

You will also need to define some parameters for the point file and CSV files:

# define parameters - the person who is assessing the species
first_name <- "Steven"
second_name <- "Bachman"
email = "s.bachman@kew.org"
institution = "Royal Botanic Gardens, Kew"

# filter the valid occurrence data on the LC species
valid_data_LC <- valid_data %>%
  filter(taxonKey %in% lc_final$GBIF_usageKey)

# generate the point file
sis_point_file <- LCr::make_sis_occs(valid_data_LC,
                                     first_name = first_name,
                                     second_name = second_name,
                                     institution = institution)

# write down the SIS point file
write.csv(sis_point_file, 
          file = "SIS_points.csv", 
          row.names = FALSE, 
          fileEncoding = "UTF-8",
          na = "")

The make_sis_occs function includes some validation checks so may report issues with the file that will need to be resolved. In this case, write down the file, fix errors, import the file and rerun the make_sis_occs function. Note that when writing the final SIS point file we specify UTF-8 encoding and convert NA values to blanks (na = "") in order to ensure the point file is compliant with IUCN standards.

We can now run the make_sis_csvs function to automatically generate the CSV files.

# get SIS files
lc_sis_files <- LCr::make_sis_csvs(unique_id = lc_final$GBIF_usageKey,
                                   wcvp_ipni_id = lc_final$wcvp_ipni_id,
                                   first_name = "Steven",
                                   second_name = "Bachman",
                                   email = "s.bachman@kew.org",
                                   institution = "Royal Botanic Gardens, Kew",
                                   family = lc_final$GBIF_family,
                                   genus = lc_final$GBIF_genus,
                                   species = lc_final$GBIF_species,
                                   gbif_ref = gbif_occs$citation,
                                   powo_ref = TRUE, 
                                   taxonomicAuthority = lc_final$GBIF_authorship,
                                   occs = sis_point_file,
                                   kingdom = "plantae",
                                   native_ranges = native_ranges
)

And finally, we zip them up ready to be sent to SIS Connect

# final step - make the zip file
make_zip(lc_sis_files)

Upload the ZIP file and wait for the automated validation to complete. This will take a few minutes depending on the number of species in your file. You should receive an email from noreply@iucnredlist.org alerting you when the validation is complete.

Open the working set in SIS and check for any issues. You can manually fix issues by opening up the CSV files and editing them directly and saving them as a zip file again. When you are happy that the files are ready you can select the option to submit to the Red List unit. This should be followed up with an email to Craig Hilton-Taylor.

References

Bachman, S.P., Brown, M.J.M., Leão, T.C.C., Nic Lughadha, E. & Walker, B.E. (2024). Extinction risk predictions for the world’s flowering plants to support their conservation. New Phytologist. https://doi.org/10.1111/nph.19592