FACETED BROWSER =============== by: michiel hildebrand at: $Date: 2007/02/23 20:18:12 $ ab: considerations and decisions in the implementation of a faceted browser td: http://media.cwi.nl/hildebra/blog/2006/03/02/facet-browser-todo/ 0. Intro -------- Faceted browsers allow navigation of a repository along the conceptual dimensions of its metadata. A single facet consists of hierarchically structured concepts that are used as metadata. User interaction in faceted navigation follows a step by step process of specifying, and possibly releasing, the constraints of a result set. At each stage a faceted browsers displays the current children and, additionally, the number of instances it annotates. Concepts with no instances (count 0) are hidden to protect users from oblivious navigation of empty branches. A multifaceted interface allows navigation from different dimensions. The user can combine concepts from multiple facets and in fact creates complex queries by simple clicking. 1. Transitive Closure --------------------- At every stage the system has to calculate for all direct children the number of instances that are annotated with that concept. We say that an instance is annotated with a concept if there exists a triple , and additionally all parents from that concept. In other words a direct property or through a transitive relation. ============================================================================== How can we do efficient computation with entailed relations? ============================================================================== There are at least two approaches to tackle this: 1. Let a query processor infer new statements as needed per query. 2. Compute and store the closure of the given graph as a basis for querying. (cited in reversed order from [2]) Example, the semantics of the rdfs:subClassOf relation state that the instances of the subject class are also instances of the value class. In the semweb package of SWI prolog the predicate rdfs_individual_of uses the subClassOf relation to find individuals (instances) of a class [1]. This is approach 1. Sesame uses approach 2. It computes the closure and adds all triples in advance. My first naive implementation used the first approach. The algorithm searched for all instances top down. Get all direct instances of a concept and recursively get the instances of a child of that concept. This strategy turned out to be very expensive, because a parent can have multiple children, and most of them will not lead to an annotation triple. For each subfacet the algorithm has to search the whole subtree. A bottom up approach is not much better because in this case all instances have to be checked for each subfacet, and most of them are not being used in that facet. Although the path from instance to parent is most of the time non branching (some concepts can have more than one parent) the set with all instances is usually very big. 1.1. Storage vs computation In a revised implementation (thanks to Jan) I followed the second approach. In an initialization stage I first compute the transitive closure and add additional triples for the entailed relations between an instance and a concept. If a concept is used to annotate an instance there must exist a triple . The algorithm does not have to compute any transitive relations anymore. 1.2. Reduction of additional triples If instances share a metadata concept additional derived triples are needed from both instances to the concept's parents. In fact the hole transitive closure for that concept is stored twice, once for each instance. The number of derived triples for the transitive closure can be reduced if the closure is calculated solely at the concept level. Instead of adding triples from the instance to all the parents of the metadata concept, triples are only added from the concept to all its parents. In our data set there are much less concepts used for one metadata property than that there are instances. In the worst case each instance has one or more concepts as metadata and each concept is used only once. (The number of triples needed can never be higher). 2. Multiple facets of the same dimension ---------------------------------------- Each facet describes the hierarchy in one dimension. However, the concepts from one dimension can be used for different properties of an instance. For example, the place of birth of an artist and the place describing the subject of a painting both use a hierarchy of geographical locations. (Would be nice to find a good visualization for this). The same dimension can also be used for the same type of instance. For example concept from a material hierarchy can be used as metadata for the medium used in an artwork (oil paint) or for support (canvas). Solution I choose uses different facets with the same dimension. Note, the hierarchy does not have to be exactly the same, because empty branches are hidden. To make a distinction between the facets the property between the instance and concept have to be taken into account. I call this the annotation property. As a result facets can be (partially) overlapping. Most implementations of faceted browsers require different facets to be orthogonal sets. (eg. Flamenco, more??) 3. Multiple instances --------------------- 4. Cross relations ------------------ RDF stores [1] SWI Prolog semweb package documentation [2] Jeen Broekstra , Arjohn Kampman , and Frank van Harmelen. Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema Facted browsing [3] Ka-Ping Yee (et al). Faceted Metadata for Image Search and Browsing Sketches (are here because it is too early to throw them away) ============================================================================== Cross relations (4 march 2006 00:05) ------------------------------------ cross relations are tricky. I got most of the infrastructure ready to implement all kinds of cross relations. But already with only 1 cross relation (work-creator-person) many questions arise. especially in combination with the type facet: - what to do when a type is selected 1 -> user specifies which type so only show those but what if a constraint is already given e.g. artwork material next select person type result are those person that made artwork constructed with the previously selected material. All good. But if all artwork facets are hidden (including the material facet) it is not possible to further specialize the material query. This is not necessarily a bad thing. It's a choice! I does make some sense, if you want to be able to further specify the artwork's material don't select the person type. But at the other hand it is somewhat strange that you selected a constraint from the material facet, and after selecting the person type the material facet disappears but the constraint remains. Could it be a solution to keep only the material facet (Note the facets from which previous constraint have been selected) and hide the rest??? Facet types ----------- Until now only explored annotation facet. concept of an annotation property and display its hierarchy through a transitive property different facet types are also possible. - annotation - type - instance - collection -type facet is a special case of an annotation facet with annotation property is rdf:type difficult for composition, because what is the instance type of this facet -> all instance types. need to check for composition at concept level not on facet level -instance facet show instances as concepts in a facet instead of results useful if the instances are itself used for annotation (MH this seems to be an indication for composition) sorting on alphabet or instance count etc. - collection facet use fourth argument of triples to get origin. Facet hierarchy as an Interaction Module -------------------------------------------------------- Most implementations of faceted browsers take up the whole interface, MuseumFinland, Flamenco, as it's their key feature for browsing. I want a system built up from separate search and navigation modules, which can all cooperate together in one interface. Therefore the interface of the hierarchy should be presented in a compact form. A tree like structure with dynamically unfolding submenus can have a small initial size. All facets can be initially collapsed and only the selected facets are unfolded. The tree interface has two other advantages: First, at any stage the user keeps a global view of the hierarchy. Secondly, the tree can be unfolded without selecting each individual subfacet, which makes browsing to deeply embedded subfacets much faster. The result view can be shared with other navigational modules. SKOS ------- Facet browsing can only be used for hierarchical relations. Need a subclass relation on classes or a parent relation on concepts. SKOS is a W3C draft for a standard vocabulary and structure of thesauri. SKOS already Includes many concepts needed for facetted browsing. The standard SKOS vocabulary is not enough additional concepts and properties are needed. Facet declarations in RDF ----------------------------------- Store a facet description in RDF. MuseumFinland stores facet declarations in prolog. Currently I created all resources that I needed under the iface namespace. I should model this to use existing SKOS resources, and make subclasses/properties.