2nd Scalable Semantic Web Knowledge Base Systems Workshop (SSWS'2005), co-located at the
6th International Conference on Web Information Systems Engineering (WISE 2005)
New York City, Intercontinental Barclay Hotel
Author: Raphael
CWI participants: Raphael
# participants: 20
A VERY interesting workshop which follows a previous one that took place in Sanibel Island, two years ago (co-located with ISWC'03). They have received many papers and the selection was quite high for a workshop (50%). I tried to give a detailed summary for each talk. The talks were rather technical since they mainly present some available systems for the Semantic Web, so my notes are technical too. The proceedings regrouping the 3 workshops is available in my office.
Problem: interoperability between ontology construction tools using an interchange language (OWL/RDFS). The translation is usually bad-made. They propose a complete benchmarks set in order to:
Motivation: There is a natural trade-off between storing facts explicitly and inferring them from a set of base statements. Their idea is to compile the RDF databases to reduce the storage space and improve the overall performance. Sesame (the product of Advuna) begins to compute the closure of any RDF graphs according to the RDF(S) semantics in order to minimize the query processing time.
Analysis: They perform an empirical analysis to study what kinds of statements are actually added to the closure and study which derivation rules is actually used. Generally, this is the subClassOf and the type statements which increase drastically when applying the derivation rules of the RDF(S) semantics.
Conclusion: The greatest potential in saving closure space is in type and subclass statements. The idea is thus to just use the rules to not create type statements (they remove some rules in the inference engine) arguing that these rules can be computed during query time making an online reasoning.
Evaluation: They have test this restricted closure on very big benchmark dataset. For the uploading load time, 15 minutes are saved for 1 million of triples. Therefore, the current version of Sesame (v 2.0) does not contain any more these rules.
Future work:
They begin to claim that they have the fastest OWL repository in the world! OWLIM is a semantic repository, aims to replace RDBMS in a wide range of applications. It provides full support for RDF(S) and OWL DLP (in fact OWL Horst which is more expressive, see the paper of ter Horst, H.J. in ISWC'05). It is nice that Horst from Phillips, who really makes a good work, gave his name to this new OWL fragment :-). A nice diagram structuring the numerous OWL fragments (RDFS, OWL DLP, OWL Horst, OWLIM, OWL Lite, OWL DL, OWL Full) has been presented!
OWLIM performs in-memory reasoning and query evaluation (forward-chaining reasoning). It has very fast upload, retrieval and query evaluation for huge ontologies and KB. OWLIM is available as a Storage and Inference Layer (SAIL) for Sesame RDF database. Latest version, OWLIM v 2.8.1 will be available on the web next week. Many configuration options are available, to switch on/off the reasoning support for RDF, RDFS, OWL, etc. as well as the memory to be used (a default configuration is of course provided, which is basically equivalent to the Sesame in-memory RDFS reasoning).
Evaluation: OWLIM can manage millions of statements; Upload speed (including inference + storage) is about 10000/100000 statements per second; The delete operation is still (relatively) slow (main problem of Sesame too): The delete time grows linearly with about 20 seconds for each new million of statements in the repository. They have finally test OWLIM on the most huge semantic web repository benchmark in the world, the Lehigh University Benchmark (LUBM). They can load the 7 millions statements in 6 minutes. The only other system which can handle this KB took 12 hours. See Guo, Y. Pan, Z. and Heflin J.: "An Evaluation of Knowledge Base Systems for large OWL Datasets", Journal of Web Semantics, 2005, which was also the best paper of ISWC 2004.
Download OWLIM at http://www.ontotext.com/owlim.
Kowari is an open source database for the storage for RDF/OWL, written in JAVA. Latest stable version is v1.1, binaries available in January 2006 on http://kowari.org. RDFS (almost) implemented using Kowari Rules Engine. SKOS support soon. Carefully choose OWL subset (Lite + full cardinality) so that it may be scalable and implemented using rules.
Naming: how to name these systems ?
3store is another storage system for RDF triples, implemented in ANSI C, using MySQL. The talk addresses problems such as how to optimize traditional relational databases using their custom parameters for tackling the problem of storing RDF data (triples).
The talk has then shown how SPARQL queries can be transformed into traditional SQL queries. Not all SPARQL features are yet considered in their implementation.
Problem: OWL DL is complex (NExpTime). How to deal with that? By approximation and modularization. There are different possibilities:
Focus on a use-case: instance retrieval and boolean conjunctive queries. Use of the approximate entailment by Cadoli-Schaerf.
Rete-based inference engine: it is an optimized forward chaining algorithm that remembers the previously found results and does not compute them again. Discussion about various heuristics that can be implemented to improve the reasoning process: it is for example possible to play with the ordering of the rules to apply.
Context: dynamic P2P group (many peers join and leave the group). Each peer can contribute some RDF information. Used of Pastry (a framework for routing messages from one peer to another) + Scribe (offers basic publish/subscribe mechanisms).
After George's talk this week in CWI, I feel comfortable with this talk, since I have already heard about all these P2P concepts. The work presented here seemed very similar to what Georges has done in his master thesis. Maybe a good point for him ?
Presentation of the Kepler Scientific Workflow System: http://www.keplerproject.org.
I was not interested, but I have finished a paper during this talk :-)