The package GNRS
is designed to interact with the
Geographic Name Resolution Service API (GNRS; https://gnrs.biendata.org/) of the Botanical Information
and Ecology Network (BIEN; https://bien.nceas.ucsb.edu). The GNRS is a tool for
resolving, standardizing, and indexing political division names. The
GNRS resolves political division names against standard world political
units in the Geonames (https://www.geonames.org/) and Global Administrative
Areas (GADM; https://gadm.org/) databases. Names are resolved to
three levels: country, state/province and county/parish. The GNRS uses
both exact and fuzzy matching to match standard and alternative
political division names in a variety of languages, as well as
abbreviations and codes such as ISO and FIPS codes. Results returned by
the GNRS include the original names submitted, the standard names and
codes of the political units matched, unique identifiers from the
Geonames and GADM databases, and additional fields describe how each
name was resolved. An overall match score from 0-1 describes how closely
the submitted names matches standard names, where 1 is a perfect
match.
The current, stable version of the GNRS package is available on CRAN, while the development version can be installed from Github using devtools.
In some cases, we may only want to standardize a single name. Say,
we’d like to check what the standardized name for the United States of
America is. Or perhaps we’d like to get the standardized name for the
Canadian province of Quebec. We can use the function
GNRS_super_simple
for this.
library(GNRS)
# Standardizing a single country
USA_standardized <- GNRS_super_simple(country = "United States of America")
# Take a look at the columns returned
colnames(USA_standardized)
## [1] "poldiv_full" "country_verbatim"
## [3] "state_province_verbatim" "state_province_verbatim_alt"
## [5] "county_parish_verbatim" "county_parish_verbatim_alt"
## [7] "country" "state_province"
## [9] "county_parish" "country_id"
## [11] "state_province_id" "county_parish_id"
## [13] "country_iso" "state_province_iso"
## [15] "county_parish_iso" "geonameid"
## [17] "gid_0" "gid_1"
## [19] "gid_2" "match_method_country"
## [21] "match_method_state_province" "match_method_county_parish"
## [23] "match_score_country" "match_score_state_province"
## [25] "match_score_county_parish" "threshold_fuzzy"
## [27] "overall_score" "poldiv_submitted"
## [29] "poldiv_matched" "match_status"
## [31] "user_id"
# The most useful columns in this case are country and overall_score
USA_standardized[c("country","overall_score","match_method_country")]
## country overall_score match_method_country
## 1 United States 1.00 exact alternate name
In this case, the standardized name is just “United States”. We have high confidence in this name because it matched perfectly (overall_score = 1.00) to an alternate name for “United States of America”. Note that even though we didn’t supply any state/province or country/parish names, there are still fields returned for these. This is because, when resolving names, the output is always identical, but may be empty.
#First, we'll load the test data that are included with this package, gnrs_testfile
gnrs_testfile <- gnrs_testfile
head(gnrs_testfile, n = 10)
## user_id country state_province
## 1 1 Russia Lipetsk
## 2 2 Mexico Sonora, Estado de
## 3 3 Guatemala Izabal
## 4 4 USA Arizona
## 5 5 U.S.A Arizona
## 6 6 USA Ilinois
## 7 7 Mexico Quintana Roo
## 8 8 Mexico Quintana Roo
## 9 9 Ukraine Kharkiv
## 10 10 Canada Province of Nova Scotia
## county_parish
## 1 Dobrovskiy rayon
## 2 Hua^sA(C)pac
## 3
## 4 Pima County
## 5 Pima
## 6
## 7 La^sA°zaro Ca^sA°rdenas
## 8 Municipio de La^sA°zaro Ca^sA°rdenas
## 9 Novovodolaz'kyi
## 10
As you can see, the sample data include spelling variants (USA vs U.S.A.) and non-standard characters that may cause problems. The GNRS will standardize these spelling variants and non-standard characters.
gnrs_results <- GNRS(gnrs_testfile)
#The standardized names are found in these columns:
head(gnrs_results[c("country","state_province","county_parish")], n = 10)
## country state_province county_parish
## 1 Russia Lipetskaya Oblast' Dobrovskiy Rayon
## 2 Mexico Sonora
## 3 Guatemala Izabal
## 4 United States Arizona Pima
## 5 United States Arizona Pima
## 6 United States Illinois
## 7 Mexico Quintana Roo
## 8 Mexico Quintana Roo
## 9 Ukraine Kharkivs'ka Oblast' Novovodolaz'kyi
## 10 Canada Nova Scotia
The GNRS function expects 4 columns as input, but all are optional. If you ever forget, you can use the function GNRS_template as a quick look-up, or as a template to populate
## user_id country state_province county_parish
## 1 NA NA NA NA