GeoSpecies Knowledge Base

The GeoSpecies Knowledge Base was started to help tie together disparate data about species. The diagram below shows some of the kinds of data sets that researchers would like to link for analysis.

Species Attribute Diagram

The Name Problem

Typically, data like this are tied together using the binomial or scientific name for a species (e.g. the species Ochlerotatus triseriatus).

Unfortunately, under the current nomenclatural system this name conveys two different types of information. The first is that there is an entity that someone has determined represents a species. The second is that this species belongs in the genus Ochlerotatus.

This structure creates a myriad of problems, particularly when there is a taxonomic revision necessitating a change in the name.

In summary, the idea that this entity is a species and the separate idea of where this species belongs taxonomically are commingled.

Solutions

One of the goals of the GeoSpecies Knowledge Base is to create unique resolvable identifiers for species concepts that are stable despite changes in taxonomy. This disambiguates the two separate ideas that are present in a typical scientific name. The clearer meaning and stability of these identifiers allow data about those species to be linked together as part of the Linked Data network of distributed data.

This advantage is best illustrated by an example:

One biologist may refer to a specimen of the Eastern Tree Hole Mosquito as Ochlerotatus triseriatus. Another may refer to that same species as Aedes triseriatus. They both agree they are talking about the same species concept; however, they are assigning different taxonomic hypotheses to that species concept.

Species Attribute Diagram

Advantages

Now species occurrence records, DNA, behavioral observations, scientific articles, blog posts, news stories etc. can be linked without worrying that simple changes in nomenclature will break the links between these information sources. It also enables the development of knowledge bases that store facts about species. The diagram below shows examples of the kinds of statements that can be made.

Species Statement Diagram

Implementation

For now, the data set concentrates on plants and animals found in the Upper Midwest, but many of the same species are found in Europe and other parts of the world. The site and data follow the Linked Data recommendations, so a given URI will resolve to a human readable web page for browsers and RDF data set for Semantic Web applications.

For instance, the Universal Resource Locator or URI for the mosquito Aedes vexans is:

http://lod.geospecies.org/ses/4XSQO

Your browser will automatically resolve to this page: http://lod.geospecies.org/ses/4XSQO.html

A semantic web crawler will resolve to this page: http://lod.geospecies.org/ses/4XSQO.rdf

Web crawlers can get the RDF data dump starting at: http://lod.geospecies.org/index.rdf [2010-07-15]

There is also a semantic sitemap at: http://lod.geospecies.org/sitemap.xml.gz [2010-07-15]

A crawled version of the RDF data is also available as a single compressed file at: http://lod.geospecies.org/geospecies.rdf.gz [2010-04-11] 2,201,532 Triples)

The ontology documentation can be found at http://rdf.geospecies.org/ont_doc/index.html [2010-07-15]

Example SPARQL Queries can be found here http://about.geospecies.org/sparql.xhtml [2010-01-17]

The data set currently contains information and linked data for: 19,230 Species, 1,671 Familes, 221 Orders. We have approximately 6,500 species observations, but are awaiting release on the majority of those. The current data set includes 12 sample observation records with geo and geonames links. There is also a growing number of GeoSpecies annotated articles and presentations in the bibtex and bibio vocabularies. The knowledge base is currently linked to DBpedia, Freebase, Bio2RDF, Uniprot, uBio data sources, and uses some of the umbel subject concepts. See the projects page information on proper attribution. Until they have been fully documented, the bulk of the observation records are not currently available.

Linked Open Data Cloud Connections

We have attempted to link to dbpedia, bio2rdf, uniprot and freebase when possible using skos:closeMatch. Of the 19,230 species, 11,799 are linked to dbpedia and wikipedia, 11,095 are linked to bio2rdf and uniprot. Approximately, 2,676 species are linked to the EUNIS database records. There are also foaf:topic links to 10,389 Wikispecies pages. Similar linkages are made at the other taxonomic levels of kingdom, phylum, class, order and family.

Examples

You can start browsing the data set at the home page: GeoSpecies Knowledge Base

If you like pretty pictures, visit Nymphalid butterflies: Nymphalidae

For those interested in mosquitoes, visit the mosquito pages: Culicidae

Citing the GeoSpecies Knowledge Base as a whole

GeoSpecies Knowledge Base Available from http://lod.geospecies.org. Accessed 15 Jan 2009.

Citing a GeoSpecies Knowledge Base page

"Ochlerotatus triseriatus (Say, 1823)." GeoSpecies Knowledge Base, available from http://lod.geospecies.org/ses/iuCXz.html. Accessed 15 Jan 2009.
Creative Commons License
GeoSpecies Knowledge Base by Peter J. DeVries is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.
Permissions beyond the scope of this license may be available at http://about.geospecies.org/projects/.
This site is hosted at University of Wisconsin Department of Entomology.