Building Semantic Web Based Applications with Watson

Mathieu d'Aquin
Knowledge Media Institute, The Open University, Milton Keynes, UK
The Semantic Web is growing steadily, as more and more RDF documents are published online, more and more data is made available, more and more knowledge is accessible to machines. We present, demonstrate and explain how a new type of application could be built by exploiting this body of knowledge. More specifically, we demonstrate how developing this sort of application is achieved thanks to Watson, a Gateway to the Semantic Web, and to the API it provides. We present a complete but very simple example of development of an application using this API. We also discuss how the availability of such infrastructure not only supports the exploitation of the semantic information available online, but also facilitate and encourage contributions to it.
Note: Images in this page are taken from the Web, and links included help in understanding and making sense of the text, so we encourage the reader to be connected while reading this document.

Watson: a Gateway to the Semantic Web

Technologies like RDF and OWL are now well established and, more importantly, the amount of knowledge published on the Semantic Web is rapidly increasing. This leads to the need for new infrastructures and tools supporting software developers in building applications that dynamically find, select and exploit the relevant semantic information.
Watson is a Gateway to the Semantic Web: it collects, analyses, and gives access to semantic information available online. Roughly speaking, it can be seen as a search engine for the Semantic Web: it crawls the web to discover semantic web documents (i.e. RDF/OWL/DAML+OIL documents) and indexes these documents according to different dimensions (location, content, metadata, etc.).
However, in contrast with classical Web search engines (and with some other Semantic Web search engines), Watson focuses on providing, through its API, high level services to support the development of semantic applications that wish to dynamically exploit online semantic information.

The Watson API: Integrating Online Knowledge in Semantic Applications

Alongside an AJAX-based Web interface which makes it possible to query and explore the collected Semantic Web documents, as well as SPARQL endpoints to interrogate the content of these documents, Watson deploys a number of Web services and an associated Java API allowing applications to: There are a number of advantages in using this API, compared to others like the ones provided by Swoogle and Sindice, which are similar systems. First, it is an entirely free and open API where any document Watson has collected and any piece of information it has extracted from them is accessible without restriction. Second, a comprehensive set of functionalities is provided to exploit this information, so that any application can make use of any semantic data available online in a lightweight way, without even having to download the corresponding semantic documents. In order to achieve this, the development of the API has been guided by the requirements of a number of applications, such as PowerMagpie for Semantic Web browsing, PowerAqua for question answering, and Scarlet for discovering semantic relations between terms. The API is constantly evolving to include new features needed by the developers of these applications.
Here, rather than providing all the details of all the features of this API, we present a complete example of how an application can be developed using it, illustrating the functionalities it provides to simply integrate online knowledge into a Web interface.

What Can Be Done: A Complete Example

Imagine you want to build an application exploiting ontologies for query expansion in a classical web search engine. For example, when given a keyword like developer, such a tool could find out that, in an ontology, there is a sub-class programmer of developer and could therefore suggest this term as a way to specify the query to the Web search engine. This would require to integrate one or several ontologies about the domain of the queries and an infrastructure to store them, explore them and query them. However, if the considered search engine is a general Web search engine, such as e.g. Google, the domain of the queries cannot be predicted a priori: the appropriate ontology can only be selected at run-time, depending on the query that is given. In addition, this application would require a heavy infrastructure to be able to handle large ontologies and to query them efficiently.
Now let's see how this application could be built using the Watson API and Google as a search engine. First, let's give it a name: gowgle, Web search using Google and Watson. The overall architecture of gowgle is made of a Javascript/HTML page for entering the query and displaying the results, which communicates using the principles of AJAX with a simple Java server. This server is in charge of the Web search and of generating suggestions of terms for expanding the query thanks to ontologies. The Web search component is a simple redirection to Google using the Google API/SOAP Web service.
Now the interesting parts is to use ontologies to suggest terms related to the query, that is, if the query contains the word developer: 1- to find ontologies somewhere talking about the concept of developer, 2- to find in these ontologies which entities correspond to developer and 3- to inspect the relations of these entities to find related terms (e.g., programmer as a sub-class of developer). These three steps are handled very easily using the Watson API. The functions provided by the API to search using keywords for semantic documents published online, and within the documents for entities that correspond to the given keywords, provide a mechanism to find any semantic description that have been publised online about a particular term. Then, either by using functions to explore the discovered semantic documents (finding sub-classes, super-classes, instances, labels, etc.) or by using SPARQL queries, entities can be inspected to obtain the required semantic information (e.g. that developer has a sub-class called programmer).
Gowgle has actually been developed in the way it is described here and can be tried from the deployed Web interface. The idea here is not to discuss the elegance of this particular application, its efficiency, robustness (or even its name), but just to show how the features it implements are made possible, and even easy, thanks to Watson. It indeed only took 2 days for a developer familiar with the Watson API, to build it and the Java code of the server only corresponds to a couple of hundred lines. Note that because of its relative simplicity, this application could be easily extended to extract other semantic relations between the keywords, and to be used on any other search engine that provides an API.

From Exploiting the Semantic Web to Contributing to the Semantic Web

The Semantic Web is about publishing semantic resources so that they integrate with others, contributing rather then adding to the existing body of knowledge. Tools like Watson provide valuable support for creating and linking semantic content, by making it possible to find, select and exploit Semantic Web documents automatically,
Watson has also been integrated has a plugin for the NeOn toolkit, the ontology development environment built as part of the NeOn Project. This tool allows to automatically find and integrate existing entities from online ontologies when building a new ontology. In that way, it encourages the reuse of knowledge at a large scale, but more importantly creates resources by linking them to other resources. Once the ontologies developed using the Watson plugin are in turn published on the Web, they are already part of a network of ontologies that interlink a number of resources. Being interlinked with the ontologies that have been used during its construction, the new ontology not only reuses the semantic information they contain, but also contributes to it by serving as an intermediary between them, providing a pillar in the construction of the knowledge network that is the Semantic Web.

Last modified: Fri Jan 18 19:45:29 GMT 2008