SSWS'2005 - Short trip report by Raphaël

2nd International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS'2005)

Sunday, 20 November 2005, New York City

2nd Scalable Semantic Web Knowledge Base Systems Workshop (SSWS'2005), co-located at the 6th International Conference on Web Information Systems Engineering (WISE 2005)
New York City, Intercontinental Barclay Hotel
Author: Raphael
CWI participants: Raphael
# participants: 20

Overall impression

A VERY interesting workshop which follows a previous one that took place in Sanibel Island, two years ago (co-located with ISWC'03). They have received many papers and the selection was quite high for a workshop (50%). I tried to give a detailed summary for each talk. The talks were rather technical since they mainly present some available systems for the Semantic Web, so my notes are technical too. The proceedings regrouping the 3 workshops is available in my office.

Agenda

Session 1: Scalable Repository and Reasoning Services

9:00 - A Method for Performing an Exhaustive Evaluation of RDF(S) Importers

Raul Garcia-Castro, Asuncion Gomez-Perez

Problem: interoperability between ontology construction tools using an interchange language (OWL/RDFS). The translation is usually bad-made. They propose a complete benchmarks set in order to:

evaluate the different tools
provide recommendations to the tools developers so they can improve their import/export functionalities

A complete general methodology to establish such a benchmarks is proposed. The whole benchmarks set and the performances for various ontology development tools with respect to this benchmarks are presented. These tools are: Protégé, WebODE, Corese, KAON, and DOE (my own INA tool :-)

9:30 - Time – Space Trade-Offs in Scaling up RDF Schema Reasoning

Heiner Stuckenschmidt, Jeen Broekstra, presented by Holger Wache

Motivation: There is a natural trade-off between storing facts explicitly and inferring them from a set of base statements. Their idea is to compile the RDF databases to reduce the storage space and improve the overall performance. Sesame (the product of Advuna) begins to compute the closure of any RDF graphs according to the RDF(S) semantics in order to minimize the query processing time.

Analysis: They perform an empirical analysis to study what kinds of statements are actually added to the closure and study which derivation rules is actually used. Generally, this is the subClassOf and the type statements which increase drastically when applying the derivation rules of the RDF(S) semantics.

Conclusion: The greatest potential in saving closure space is in type and subclass statements. The idea is thus to just use the rules to not create type statements (they remove some rules in the inference engine) arguing that these rules can be computed during query time making an online reasoning.

Evaluation: They have test this restricted closure on very big benchmark dataset. For the uploading load time, 15 minutes are saved for 1 million of triples. Therefore, the current version of Sesame (v 2.0) does not contain any more these rules.

Future work:

More evaluation with real data sets;
Performance evaluation with real queries: they do not have evaluate how many time they have lost in the query time !!!
Proof of completeness.

10h00 - Coffee break

10h30 - OWLIM – A Pragmatic Semantic Repository for OWL

Atanas Kiryakov, Damyan Ognyanov, Dimitar Manov

They begin to claim that they have the fastest OWL repository in the world! OWLIM is a semantic repository, aims to replace RDBMS in a wide range of applications. It provides full support for RDF(S) and OWL DLP (in fact OWL Horst which is more expressive, see the paper of ter Horst, H.J. in ISWC'05). It is nice that Horst from Phillips, who really makes a good work, gave his name to this new OWL fragment :-). A nice diagram structuring the numerous OWL fragments (RDFS, OWL DLP, OWL Horst, OWLIM, OWL Lite, OWL DL, OWL Full) has been presented!

OWLIM performs in-memory reasoning and query evaluation (forward-chaining reasoning). It has very fast upload, retrieval and query evaluation for huge ontologies and KB. OWLIM is available as a Storage and Inference Layer (SAIL) for Sesame RDF database. Latest version, OWLIM v 2.8.1 will be available on the web next week. Many configuration options are available, to switch on/off the reasoning support for RDF, RDFS, OWL, etc. as well as the memory to be used (a default configuration is of course provided, which is basically equivalent to the Sesame in-memory RDFS reasoning).

Evaluation: OWLIM can manage millions of statements; Upload speed (including inference + storage) is about 10000/100000 statements per second; The delete operation is still (relatively) slow (main problem of Sesame too): The delete time grows linearly with about 20 seconds for each new million of statements in the repository. They have finally test OWLIM on the most huge semantic web repository benchmark in the world, the Lehigh University Benchmark (LUBM). They can load the 7 millions statements in 6 minutes. The only other system which can handle this KB took 12 hours. See Guo, Y. Pan, Z. and Heflin J.: "An Evaluation of Knowledge Base Systems for large OWL Datasets", Journal of Web Semantics, 2005, which was also the best paper of ISWC 2004.

Download OWLIM at http://www.ontotext.com/owlim.

11h00 - Scaling the Kowari Metastore

David Wood

Kowari is an open source database for the storage for RDF/OWL, written in JAVA. Latest stable version is v1.1, binaries available in January 2006 on http://kowari.org. RDFS (almost) implemented using Kowari Rules Engine. SKOS support soon. Carefully choose OWL subset (Lite + full cardinality) so that it may be scalable and implemented using rules.

11h30 - Discussion

Naming: how to name these systems ?

Semantic Repository but they generally make more than simple RDF storage (inference for example).
Knowledge Base Systems but it suffers from a bad reputation.

12h00 - Lunch break

Session 2: Query Handling & Optimization Techniques

13h30 - SPARQL Query Processing with Conventional Relational Database Systems

Stephen Harris, Nigel Shadbolt

3store is another storage system for RDF triples, implemented in ANSI C, using MySQL. The talk addresses problems such as how to optimize traditional relational databases using their custom parameters for tackling the problem of storing RDF data (triples).

The talk has then shown how SPARQL queries can be transformed into traditional SQL queries. Not all SPARQL features are yet considered in their implementation.

14h00 - Scalable Instance Retrieval for the Semantic Web by Approximation

Holger Wache, Perry Groot, Heiner Stuckenschmidt

Problem: OWL DL is complex (NExpTime). How to deal with that? By approximation and modularization. There are different possibilities:

Language weakening;
Knowledge compilation;
Approximate deduction: simplification of the query and give an approximate answer.

Focus on a use-case: instance retrieval and boolean conjunctive queries. Use of the approximate entailment by Cadoli-Schaerf.

14h30 - Reordering Query and Rule Patterns for Query Answering in a Rete-Based Inference Engine

Murat Osman Unalir, Tugba Ozacar, Ovunc

Rete-based inference engine: it is an optimized forward chaining algorithm that remembers the previously found results and does not compute them again. Discussion about various heuristics that can be implemented to improve the reasoning process: it is for example possible to play with the ordering of the rules to apply.

15h00 - Scalable Peer-to-Peer RDF Query Algorithm

Denis Ranger, Jean-Francois Cloutier

Context: dynamic P2P group (many peers join and leave the group). Each peer can contribute some RDF information. Used of Pastry (a framework for routing messages from one peer to another) + Scribe (offers basic publish/subscribe mechanisms).

After George's talk this week in CWI, I feel comfortable with this talk, since I have already heard about all these P2P concepts. The work presented here seemed very similar to what Georges has done in his master thesis. Maybe a good point for him ?

15h30 - Coffee break

Session 3: Practical Semantic Web Applications

16h00 - Towards Automatic Generation of Semantic Types in Scientific Workflows

Shawn Bowers, Bertram Ludascher

Presentation of the Kepler Scientific Workflow System: http://www.keplerproject.org.

16h30 - A Web Mining Method Based on Personal Ontology for Semi-structured RDF

Kotaro Nakayama, Takahiro Hara, Shojiro Nishio

I was not interested, but I have finished a paper during this talk :-)