Even with the help of web services, it is difficult for a human to effectively retrieve, judge, synthesize, and integrate vast amounts of widely available, often widely distributed digital resources. To assist or replace human effort, software must be able to correctly place a document or text string or datum in the appropriate context.
Figure 5 Semantic web layers. Web technologies build on each other to provide increasingly semantic representations. XML is the essential syntax for describing information and Uniform Resource Indicators (URIs) are the basic means of pointing to chunks of information, or entities, on the web. All of the layers above use XML and URIs. Resource description framework (RDF) adds the ability to create a triple that relates entities to each other. Ontologies define in a highly formalized way the kinds of relationships and the kinds of entities that are possible, and includes instances of these, while rules give richer ways to specify the relationships and entities that must or must not exist. The logic layer is a more abstract representation that enables us to produce and discuss proofs - objects that represent a conclusion and the evidence and reasoning that justify it. Finally, the trust layer integrates information from the previous layers with an understanding of their security measures, to enable judgment on how trustworthy is the new knowledge. Adapted from Berners-Lee, http://www.w3.org/2002/Talks/04-sweb/slide12-0.html.
There is growing interest, therefore, in publishing machine-readable documents and databases in ways that include semantics. Ecologists are among those scientists collaborating with researchers in the fields of artificial intelligence and knowledge representation to apply semantic web concepts to ecology.
A common view of the semantic web is as a layer of technologies, each building upon the ones below (Figure 5). As one ascends this 'layer cake', the sophistication of the semantics involved increases, from the purely pragmatic use of XML to mark data as having meaning, through the top layer where the technology takes advantage of meaning, rules, and logic to make judgments about the trustworthiness of data.
A semantic representation goes beyond controlled vocabularies (standardizing keywords for concepts such as 'detritus' or 'dissolved organic matter') to specify in a machine-readable format the relationships among concepts. The formal specification of the relationships among concepts is called an ontology. Inference over semantic web documents can be used to expand a search for 'butterfly habitat' to find data sets or documents that include species belonging to a taxonomic group Papillionidae that also include information on named habitats, even if the word
'Papillionidae' or 'habitat' does not appear in the documents or their metadata. Using semantics, a computer can use automated reasoning to make decisions about how data sets using different data models or methods might be chosen, integrated, or transformed.
There are a variety of languages used for ontologies but most ecological ontologies use the W3C-recommended standard of OWL, the Web Ontology Language. OWL is based on RDF (resource description framework), which models the world with <subject, verb, object> assertions. RDF is the first layer of semantics built on top of XML in the W3C's semantic web vision.
A semantic web search engine such as Swoogle acts like a traditional web search engine, except that it only crawls and indexes semantic web documents in languages such as OWL and RDF.
Several query languages have been developed for OWL and RDF. SPARQL (SPARQL Protocol and RDF Query Language) is the current W3C standard.
An ontology-based approach is not intended to rely as much on metadata standards as more traditional approaches. While the use of standard ontologies would ensure greater interoperability, the ability to semantically relate terms in one ontology to those in another suggests that it should be easier to automate the integration of data represented by different but related ecological ontologies. In practice, the use of ontologies in ecology is still in its infancy and this power remains to be tested.
Examples of web-accessible ontologies relevant to ecology include those developed as part of the SEEK project, ETHAN (Evolutionary Trees and Natural History) ontology, and the Spire ontologies, including the California Biodiversity Information Resources ontology. Darwin Core, a metadata standard for museum specimens, has also been represented as an ontology. Several other projects use ontologies internally.
Was this article helpful?