The role of semantics in information access, a survey

Keyword search is the method of choice for end users to access information on the current web. Several attempts have been made to provide keyword search access to resources described on the semantic web as well as existing web documents enriched with semantic annotations. We will refer to both these approaches as semantic search. The systems that provide some form of semantic search show great variation. Our aim is to give an overview of the different notions of semantic search. In particular we are interested in the role of explicit semantics in the search process.

Within this survey we define keyword search as the process in which a user can submit one or more search terms in free text, with optional control structures, and the system returns a set of result items, which can be organized in various ways. The systems are analyzed in four different stages of the search process, input, processing, feedback and the search results. For each stage we consider the functionality that the system provides and how this is made available through the graphical user interface.

We try to give a complete overview of the different notions of keyword search that have been applied to semantic web data. We believe that in pursuing this, it is not necessary to analyze all systems that support a form of semantic search. Instead it suffices to thoroughly analyze the best representative for a particular type of semantic search. In total we considered xx systems which are all listed in the system overview. A thorough analysis is performed on 15 applications. Additionally we included a couple of systems that do not support keyword search (yet), but for which there is some clear added value for adding this.

Prerequisites

Several applications support operations to create semantic web data to search on, such as, named entity extraction, annotation, crawling and indexing. This lays outside the scope of this survey.

Many applications allow the user to view a specific resource and its direct metadata values, a local view. Some also allow browsing the semantic graph by making the metadata values hyperlinks. This is an useful technique to explore the RDF graph (see also browsers such as tabulator and Disco). This functionality lays outside the scope of this survey. Unless additional features are presented in the local view we do not mention this functionality.

System Overview

Name and URL
Purpose: intended use of the system.
- Search engine
- Information management
- (Faceted) Browser
- Portal: gives access to multiple collections from a single entry point
Users: intended users of the system.
- End users (by domain expertise)
- developers
- software agents
Scope: data to which the system provides access.
- web
- semantic web
- RDF data collections
- ontologies
- thesauri
- other...
Store/Index: techniques used for literal indexing and storage of semantic web data.
- Store: sesame, jena, swi-prolog etc.
- Index: Lucene, application specific etc.

Name	Purpose	Users	Collection Scope	Store/Index
Autofocus	Search engine, Browser	End users (medium-expert)	RDFized text documents	Sesame and application specific index.
DBin	Information management	Developers. End users (medium-expert)	?	?
e-Culture	Portal, Search engine	End users (novice-expert)	Multiple data collections and thesauri in RDFS. Thesauri mappings in OWL.	SWI Prolog Semantic Web Library Triple Store and literal index (porter stemming)
Flink	Browser	End users (medium-experts)	RDFized documents.	Sesame
Haystack	Information management	Developers and end users (medium-expert)	Semantic web
Hybrid Search	Search engine	End users	Single collection in RDF	Application specific triple store. Lucene index
KIM	Information management, Search engine, Faceted browser	Developers and end users (medium-expert)	Text documents and an upper level ontology in OWL	Sesame Triple store, Lucene indexing.
Longwell	Faceted browser	End users (medium-expert)	Single data collection in RDF	Sesame
mspace	Browser	End users (medium-expert)	Single data collection in RDF
Museum Finland	Portal, Faceted Browser, Search Engine	End users (novice-expert)	Multiple data collections in RDF and 1 upper ontology in OWL.	Ontogator
OpenAcademia	Search engine, Browser	End users (novice-expert)	multiple collections in RDF. Instance mappings in OWL.	Sesame. Application specific index
OWLIR	Search engine	End users	Text Document + Extracted RDF triples + Additional scraped triples	Store: DAMLJessKB. WONDIR IR engine.
QuizRDF	Search Engine	End Users (novice-expert)	Text documents. RDFS Ontologies.	Application specific index.
SemSearch	Search Engine	End Users (medium-expert)	Single data collection in RDF	Sesame Triple Store and Lucene Index
Slashfacet	Faceted Browser	Developers. End users (medium-expert)	Multiple collections and thesauri in RDF	SWI prolog semantic web library
Squiggle	Search Engine	End Users (novice-medium)	RDFized image metadata and RDF thesaurus	Sesame Triple Store, Lucene literal index on record data and on literal values from triple store.
Swoogle	Ontology Search Engine	Developers and Software agents	Semantic web
Tap: Semantic Search	Context based search Engine	End users	Text document and RDF ontologies	Tap framework
Excluded systems
Ontokhoj	Search engine	Developers	Semantic Web	?
OntoSearch	Search engine	Developers	Semantic Web	?
SHOE search tool	Search Engine	Developers	Text document with SHOE annotations	Application specific
InWiss Search	Search Engine	End Users	Data collections in RDF	Sesame. Applications specific index
DOSE	Search Engine	End Users	Web Documents and Domain Ontology	?

Name

Purpose

Users

Collection Scope

Store/Index

Autofocus

Search engine, Browser

End users (medium-expert)

RDFized text documents

Sesame and application specific index.

DBin

Information management

Developers. End users (medium-expert)

e-Culture

Portal, Search engine

End users (novice-expert)

Multiple data collections and thesauri in RDFS. Thesauri mappings in OWL.

SWI Prolog Semantic Web Library Triple Store and literal index (porter stemming)

Flink

Browser

End users (medium-experts)

RDFized documents.

Sesame

Haystack

Information management

Developers and end users (medium-expert)

Semantic web

Hybrid Search

Search engine

End users

Single collection in RDF

Application specific triple store. Lucene index

KIM

Information management, Search engine, Faceted browser

Developers and end users (medium-expert)

Text documents and an upper level ontology in OWL

Sesame Triple store, Lucene indexing.

Longwell

Faceted browser

End users (medium-expert)

Single data collection in RDF

Sesame

mspace

Browser

End users (medium-expert)

Single data collection in RDF

Museum Finland

Portal, Faceted Browser, Search Engine

End users (novice-expert)

Multiple data collections in RDF and 1 upper ontology in OWL.

Ontogator

OpenAcademia

Search engine, Browser

End users (novice-expert)

multiple collections in RDF. Instance mappings in OWL.

Sesame. Application specific index

OWLIR

Search engine

End users

Text Document + Extracted RDF triples + Additional scraped triples

Store: DAMLJessKB. WONDIR IR engine.

QuizRDF

Search Engine

End Users (novice-expert)

Text documents. RDFS Ontologies.

Application specific index.

SemSearch

Search Engine

End Users (medium-expert)

Single data collection in RDF

Sesame Triple Store and Lucene Index

Slashfacet

Faceted Browser

Developers. End users (medium-expert)

Multiple collections and thesauri in RDF

SWI prolog semantic web library

Squiggle

Search Engine

End Users (novice-medium)

RDFized image metadata and RDF thesaurus

Sesame Triple Store, Lucene literal index on record data and on literal values from triple store.

Swoogle

Ontology Search Engine

Developers and Software agents

Semantic web

Tap: Semantic Search

Context based search Engine

End users

Text document and RDF ontologies

Tap framework

Excluded systems

Ontokhoj

Search engine

Developers

Semantic Web

OntoSearch

Search engine

Developers

Semantic Web

SHOE search tool

Search Engine

Developers

Text document with SHOE annotations

Application specific

InWiss Search

Search Engine

End Users

Data collections in RDF

Sesame. Applications specific index

DOSE

Search Engine

End Users

Web Documents and Domain Ontology

System analysis

Search Input

Functionality

Search term: the input that can be given as a search argument.
- Keyword(s)
- URI(s)
Structure: control operators that are supported in the search term.
- boolean operators
- special purpose operators
- regular expressions
Value selection: Input by direct selection of search terms.

interface

Text entry in the GUI?
- text entry box
- facet (value list)
Additional interface components.
- target type selection
- literal match type (prefix, substring)

Processing

Functionality

Literal matching: comparison of the search term to the literal index.
- Single keyword: prefix, substring
- Interpretation of multiple keywords
Retrieval: how are the search results determined
- direct matching document/URI
- query extension
- graph search

Interface

Presentation of processing information.
- processing time
- number of results
- loading message

User Feedback

Functionality

Type: kind of feedback that is supported.
- disambiguation: clarify intended meaning
- refinement: constrain/extend the current result set
- exploration: find new (related) resources
Source: on what is the feedback given.
- search input
- search results

Interface

How does the system allow a user to provide feedback.
- Select from discrete list of values
- New or extended search in text-entry box

Search Results

Functionality

Result Item: Nature of a single search result
- resource (document, image, concept)
- set of triples
Organization:
- ranking
- clustering

interface

What is included in the presentation of a single search result?
- Resource (document,image etc.)
- selected metadata
- Fresnel
- template

	Search Input	Processing	User Feedback	Search Results
Autofocus	Functionality	Keyword search with multiple terms Value selection from predefined facets	Literal matching: subword match on extracted terms Retrieval (keyword search): resources with matching literal value Retrieval (value selection): resources with selected value as metadata	Refinement: add new search term or facet value. In contrast to faceted browsing the available values for refinement are not restricted to the current selection. This allows to make multiple different intersections.	Set of items grouped by relation to constraints
Interface	Keyword search: text entry box and keyword suggestion list Search options: check boxex for specific metadata fields to search in Value selection: selectable facets with grouped value list	Number of results per selected value or search term	Similar as initial search	Table with items metadata. Cluster map visualization
DBin	Functionality
Interface
e-Culture basic	Functionality	Keyword search with single search term	literal matching: minimal letter distance on stemmed index Retrieval: backwards graph search with weighted relations. Weight are manually assigned by relation type	Disambiguation: result clusters grouped by result path	Set of items grouped by search path Clusters: search path Ranking: clusters are ranked by path length. Items within a cluster a ranked by score (=literal match * total path weight).
Interface	Text entry box	number of total results and number of results per cluster	Hyperlinks for cluster headers to zoom in on this cluster	Thumbnails with selected metadata
Flink	Functionality
Interface
Haystack	Functionality	Browse concepts	Manually defined virtual properties.
Interface
Hybrid Search	Functionality	Keyword search with multiple search terms	Literal matching: ? Retrieval: Spread Activation algorithm. Weights are determined by similarity and specificity measure plus manually assigned by relation type.	Refinement: Related keywords	Set of items clustered by type. Ranking based an activation.
Interface	Text entry box	-	list of keywords	Item presented by title and visually grouped by type
KIM	Functionality	Keyword search with multiple terms and Lucene operators Pattern search consisting of a structured query and a search term Value selection from facets Facet value autocompletion search	Literal Match: string match on extracted entities and metadata Retrieval (keyword search): resources with matching literal value Retrieval (pattern search): resources with matching literal value and exact query match Retrieval (value selection): resources with selected value as metadata	Refinement (keyword search): add search term for other metadata field Refinement (value selection): Add new facet value or related concept	Set of items
Interface	Keyword search: Form with text entry Boxes for title,keyword,author and content Pattern search: a complex form representing the structure of the query Value selection: facets with value list and text entry box for autocompletion. The facets that are shown in the interface can be manually selected.	Number of matching documents. Selected terms	Similar as initial search but with available facet value updated to current selection.	Item presented by title and date
Longwell	Functionality	Value selection from facets Facet value autocompletion search Keyword search with single search term	Literal matching (autocompletion): prefix Literal matching (keyword search): subword Retrieval (keyword search): resources with matching literal value Retrieval (value selection): resources with selected value as metadata for specific facet Facet values are updated to current selection	Add new facet value or search term	Set of items
Interface	Keyword search: text entry box Value selection: facets with value list and text entry box for autocompletion. All facets are shown but can be the value list can be hidden.	Loading message at every click Number of total results Number of results for each facet value Selected facet values	Similar as initial search	Fresnel
mspace	Functionality	Value selection from facets Facet value autocompletion search	Literal match (autocompletion): prefix Retrieval: results related to selected value by predefined graph paths.	Refinement: select new value from facet Change order of facets to construct different view	Selected item
Interface	Value selection: facets with value list and text entry box for autocompletion. Visible facets can be manually selected.	Selected facet values are highlighted	Facets are draggable to change order	All related values of the result item are shown.
Museum Finland	Functionality	Value selection from facets Keyword search with single search term	Literal matching: subword Retrieval (keyword search): resources with matching literal value Retrieval (value selection): resources with selected value with metadata or with a narrower concept as metadata for specific facet	Disambiguation: keyword search matches by use (facet in which they occur as a value) Refinement: add value from new facet or select more specific value from active facet Exploration (if a single result is selected): related results have similar values for predefined properties or paths	Set of items
Interface	Keyword search: text entry box Value selection: facets with value list	Number of results for each result cluster Number of results for each facet value Selected facet values	Similar as initial search	Thumbnail with selected metadata
Open Academia	Functionality	Keyword search with single search term for metadata field Value selection from metadata fields	Literal matching: subword Retrieval (keyword search): Resources with matching literal for specified metadata field Retrieval (value selection): resources with selected value as metadata for specified field Values in metadata fields are updated to selection	Add new search term or metadata field	Set of items.
Interface	Keyword search: search form Value selection: drop down lists for fixed set of fields	Number of total results Processing time	same as initial search	Different visualization tools, tagcould, topic graph, social net, timeline, clustermap and relation graph
OWLIR	Functionality
Interface
QuizRDF	Functionality	Keyword search with multiple search terms Search options: Class of the provided input. Options for literal match, case, exact match, only in title.	Literal matching: defined by match options Retrieval: documents with literal match on index table. Index table of a document contains the literal values from all direct annotations.	Disambiguation: select class and values for metadata fields used for instances of this class	Set of items Ranking by a variant of tf.idf
Interface	Keyword search: text entry field Options: Drop down menu with classes Options: checkboxes for search options	Number of results Possible classes of the input are updated to result set	Drop down for classes Search form for properties with a literal value range	Title of document + all metadata .
SemSearch	Functionality	Keyword search with multiple search terms Structure: Boolean operators AND/OR. Search engine specific operator ":" to indicate the result target type	Literal matching: subword Interpretation: Based on the sets of resources matching the input a formal query is constructed Retrieval: Resources matching the constructed query. RDFs reasoning over class and property hierarchy.	Disambiguation/Refinement: Deselect class/property/instance of matching search terms	Set of items Ranking on literal match
Interface	Keyword search: Text entry box	Number of total results Processing time	Form with the matches for each keyword. Checkboxes to toggle them	Title + the entities that matched the query + the relation from the keyword matches to the result
Slashfacet	Functionality	Value selection from facets Facet value autocompletion search Global facet autocompletion search	Literal matching (autocompletion): prefix Retrieval (value selection): resources with selected value as metadata for specific facet or with a narrower concept Facet values are updated to current selection Complex query paths can be constructed through interactive interface	Disambiguation (global facet search): by use of value (facet in which the value occurs) Disambiguation (in facet search): by location in the value hierarchy Refinement: Add a new facet value	Set of items Clustered by manually selected property
Interface	Value selection: facets with value list and text entry box for autocompletion. All facets are shown but can be the value list can be hidden. Global facet search: text entry box with autocompletion dropdown list	Loading message at every click Number of results per cluster Number of results for each facet value Selected facet values are highlighted	Disambiguation (global search): grouped by class Disambiguation (in facet search): value in hierarchy shown as unfolded tree Refinement: similar as initial search	Thumbnail with selected metadata
Squiggle	Functionality	Keyword search with multiple search terms	Literal matching: Lucene search engine. Retrieval: Resources with matching literal value. After disambiguation by selecting a concept resources are matched to all literal values known for the selected concept. Multiple terms in a query are interpreted disjunctive. Conjunctive queries on concepts can be made by selecting multiple values from the suggestions.	Disambiguation: by matching URI and by rdf:type Exploration: related concepts grouped by rdf:type	Set of items
Interface	Keyword search: text entry box	Total number of results Processing time Hits per matching literal	Disambiguation: List of concepts with checkbox Exploration: List of concepts	Thumbnail or title + selected metadata
Swoogle	Functionality	Keyword search: Search term or URI Structure: boolean operators AND,OR. Specific constructs to indicate domain for literal match: in URI, namespace, local name, literal values	Literal match: subword Retrieval (ontology): contains resource with matching literal value Retrieval (term): resource with matching literal value	-	Set of items Ranking: Ontorank [explain] and termran [explain]
Interface	Keyword search: Text entry box Options: result type (document, ontology, term)	Number of total results Processing time	-	rdfs:Label for terms and URI for documents + selected metadata
Tap: Semantic Search	Functionality	Keyword search with max two search terms	Literal matching: subword Retrieval: Full graph search. Restricted to manually assigned properties for each class	Exploration: the semantic search result augment results from a traditional search engine	Set of items Clustering: by type
Interface	Keyword search: Text entry box of host search engine	-	(see results)	Results are presented alongside traditional search results Template for each result class

Search Input

Processing

User Feedback

Search Results

Autofocus

Functionality

Keyword search with multiple terms
Value selection from predefined facets

Literal matching: subword match on extracted terms
Retrieval (keyword search): resources with matching literal value
Retrieval (value selection): resources with selected value as metadata

Refinement: add new search term or facet value. In contrast to faceted browsing the available values for refinement are not restricted to the current selection. This allows to make multiple different intersections.

Set of items grouped by relation to constraints

Interface

Keyword search: text entry box and keyword suggestion list
Search options: check boxex for specific metadata fields to search in
Value selection: selectable facets with grouped value list

Number of results per selected value or search term

Similar as initial search

Table with items metadata. Cluster map visualization

DBin

Functionality

Interface

e-Culture basic

Functionality

Keyword search with single search term

literal matching: minimal letter distance on stemmed index
Retrieval: backwards graph search with weighted relations. Weight are manually assigned by relation type

Disambiguation: result clusters grouped by result path

Set of items grouped by search path
Clusters: search path
Ranking: clusters are ranked by path length. Items within a cluster a ranked by score (=literal match * total path weight).

Interface

Text entry box

number of total results and number of results per cluster

Hyperlinks for cluster headers to zoom in on this cluster

Thumbnails with selected metadata

Flink

Functionality

Interface

Haystack

Functionality

Browse concepts

Manually defined virtual properties.

Interface

Hybrid Search

Functionality

Keyword search with multiple search terms

Literal matching: ?
Retrieval: Spread Activation algorithm. Weights are determined by similarity and specificity measure plus manually assigned by relation type.

Refinement: Related keywords

Set of items clustered by type.
Ranking based an activation.

Interface

Text entry box

list of keywords

Item presented by title and visually grouped by type

KIM

Functionality

Keyword search with multiple terms and Lucene operators
Pattern search consisting of a structured query and a search term
Value selection from facets
Facet value autocompletion search

Literal Match: string match on extracted entities and metadata
Retrieval (keyword search): resources with matching literal value
Retrieval (pattern search): resources with matching literal value and exact query match
Retrieval (value selection): resources with selected value as metadata

Refinement (keyword search): add search term for other metadata field
Refinement (value selection): Add new facet value or related concept

Set of items

Interface

Keyword search: Form with text entry Boxes for title,keyword,author and content
Pattern search: a complex form representing the structure of the query
Value selection: facets with value list and text entry box for autocompletion. The facets that are shown in the interface can be manually selected.

Number of matching documents.
Selected terms

Similar as initial search but with available facet value updated to current selection.

Item presented by title and date

Longwell

Functionality

Value selection from facets
Facet value autocompletion search
Keyword search with single search term

Literal matching (autocompletion): prefix
Literal matching (keyword search): subword
Retrieval (keyword search): resources with matching literal value
Retrieval (value selection): resources with selected value as metadata for specific facet
Facet values are updated to current selection

Add new facet value or search term

Set of items

Interface

Keyword search: text entry box
Value selection: facets with value list and text entry box for autocompletion. All facets are shown but can be the value list can be hidden.

Loading message at every click
Number of total results
Number of results for each facet value
Selected facet values

Similar as initial search

Fresnel

mspace

Functionality

Value selection from facets
Facet value autocompletion search

Literal match (autocompletion): prefix
Retrieval: results related to selected value by predefined graph paths.

Refinement: select new value from facet
Change order of facets to construct different view

Selected item

Interface

Value selection: facets with value list and text entry box for autocompletion. Visible facets can be manually selected.

Selected facet values are highlighted

Facets are draggable to change order

All related values of the result item are shown.

Museum Finland

Functionality

Value selection from facets
Keyword search with single search term

Literal matching: subword
Retrieval (keyword search): resources with matching literal value
Retrieval (value selection): resources with selected value with metadata or with a narrower concept as metadata for specific facet

Disambiguation: keyword search matches by use (facet in which they occur as a value)
Refinement: add value from new facet or select more specific value from active facet
Exploration (if a single result is selected): related results have similar values for predefined properties or paths

Set of items

Interface

Keyword search: text entry box
Value selection: facets with value list

Number of results for each result cluster
Number of results for each facet value
Selected facet values

Similar as initial search

Thumbnail with selected metadata

Open Academia

Functionality

Keyword search with single search term for metadata field
Value selection from metadata fields

Literal matching: subword
Retrieval (keyword search): Resources with matching literal for specified metadata field
Retrieval (value selection): resources with selected value as metadata for specified field
Values in metadata fields are updated to selection

Add new search term or metadata field

Set of items.

Interface

Keyword search: search form
Value selection: drop down lists for fixed set of fields

Number of total results
Processing time

same as initial search

Different visualization tools, tagcould, topic graph, social net, timeline, clustermap and relation graph

OWLIR

Functionality

Interface

QuizRDF

Functionality

Keyword search with multiple search terms
Search options: Class of the provided input. Options for literal match, case, exact match, only in title.

Literal matching: defined by match options
Retrieval: documents with literal match on index table. Index table of a document contains the literal values from all direct annotations.

Disambiguation: select class and values for metadata fields used for instances of this class

Set of items
Ranking by a variant of tf.idf

Interface

Keyword search: text entry field
Options: Drop down menu with classes
Options: checkboxes for search options

Number of results
Possible classes of the input are updated to result set

Drop down for classes
Search form for properties with a literal value range

Title of document + all metadata

SemSearch

Functionality

Keyword search with multiple search terms
Structure: Boolean operators AND/OR. Search engine specific operator ":" to indicate the result target type

Literal matching: subword
Interpretation: Based on the sets of resources matching the input a formal query is constructed
Retrieval: Resources matching the constructed query. RDFs reasoning over class and property hierarchy.

Disambiguation/Refinement: Deselect class/property/instance of matching search terms

Set of items
Ranking on literal match

Interface

Keyword search: Text entry box

Number of total results
Processing time

Form with the matches for each keyword. Checkboxes to toggle them

Title + the entities that matched the query + the relation from the keyword matches to the result

Slashfacet

Functionality

Value selection from facets
Facet value autocompletion search
Global facet autocompletion search

Literal matching (autocompletion): prefix
Retrieval (value selection): resources with selected value as metadata for specific facet or with a narrower concept
Facet values are updated to current selection
Complex query paths can be constructed through interactive interface

Disambiguation (global facet search): by use of value (facet in which the value occurs)
Disambiguation (in facet search): by location in the value hierarchy
Refinement: Add a new facet value

Set of items
Clustered by manually selected property

Interface

Value selection: facets with value list and text entry box for autocompletion. All facets are shown but can be the value list can be hidden.
Global facet search: text entry box with autocompletion dropdown list

Loading message at every click
Number of results per cluster
Number of results for each facet value
Selected facet values are highlighted

Disambiguation (global search): grouped by class
Disambiguation (in facet search): value in hierarchy shown as unfolded tree
Refinement: similar as initial search

Thumbnail with selected metadata

Squiggle

Functionality

Keyword search with multiple search terms

Literal matching: Lucene search engine.
Retrieval: Resources with matching literal value. After disambiguation by selecting a concept resources are matched to all literal values known for the selected concept.
Multiple terms in a query are interpreted disjunctive. Conjunctive queries on concepts can be made by selecting multiple values from the suggestions.

Disambiguation: by matching URI and by rdf:type
Exploration: related concepts grouped by rdf:type

Set of items

Interface

Keyword search: text entry box

Total number of results
Processing time
Hits per matching literal

Disambiguation: List of concepts with checkbox
Exploration: List of concepts

Thumbnail or title + selected metadata

Swoogle

Functionality

Keyword search: Search term or URI
Structure: boolean operators AND,OR. Specific constructs to indicate domain for literal match: in URI, namespace, local name, literal values

Literal match: subword
Retrieval (ontology): contains resource with matching literal value
Retrieval (term): resource with matching literal value

Set of items
Ranking: Ontorank [explain] and termran [explain]

Interface

Keyword search: Text entry box
Options: result type (document, ontology, term)

Number of total results
Processing time

rdfs:Label for terms and URI for documents + selected metadata

Tap: Semantic Search

Functionality

Keyword search with max two search terms

Literal matching: subword
Retrieval: Full graph search. Restricted to manually assigned properties for each class

Exploration: the semantic search result augment results from a traditional search engine

Set of items
Clustering: by type

Interface

Keyword search: Text entry box of host search engine

(see results)

Results are presented alongside traditional search results
Template for each result class

Overview

	Search Input	Processing	User Feedback	Search Results
Functionality	Free text input of one or more search terms. Optionally control structures, such as boolean operators or application specific operators. Controlled input of search terms. Controlled can mean that the the search term is restricted to a list of predefined values or/and that the search term is restricted to certain value range. For example, in A search form the value range is restricted for each field (title, author etc.) while the search term is unrestricted. Searching within a facet both the value range and the range are restricted. [List advantages of controlled input. No dead ends.]	Literal Match. Limit the discussion to matches on literal values of RDF resources. Indexing of documents lays outside the scope. Direct hits: match op literal attributes, normal attributes from which the label field matches. There is no difference between a literal value and a resource with a literal value as a label. Input interpretation. This only applies if multiple keywords are given. [SemSearch and Hybrid search support this] Query extension. (see discussion below) related results. (see discussiom below)	Disambiguation Refinement
Interface	It is often mentioned that the interface for search terms should be simplistic. The google interface is regarded as very positive. Forms and facets [should we discuss interface issues]. Other options, type of literal match selection (prefix,substring).	Loading message, # of results, process time, warning message (no results, number of result limit etc.)	Semantics play an important role here. The properties, classes and concepts allow the system to explain the possible interpretations to disambiguate and possible dimensions for refinement. Allows give precise feedback. Trade-off between search and browsing. Problems arise with ontological resources that do not make sense to the user. There is a need for an interface ontology (natural categories).	Semantics play an important role in clustering. The semantics provide the dimensions for the grouping of results as well as the explanation of the groups.

Search Input

Processing

User Feedback

Search Results

Functionality

Free text input of one or more search terms. Optionally control structures, such as boolean operators or application specific operators.
Controlled input of search terms. Controlled can mean that the the search term is restricted to a list of predefined values or/and that the search term is restricted to certain value range. For example, in A search form the value range is restricted for each field (title, author etc.) while the search term is unrestricted. Searching within a facet both the value range and the range are restricted. [List advantages of controlled input. No dead ends.]

Literal Match. Limit the discussion to matches on literal values of RDF resources. Indexing of documents lays outside the scope. Direct hits: match op literal attributes, normal attributes from which the label field matches. There is no difference between a literal value and a resource with a literal value as a label.
Input interpretation. This only applies if multiple keywords are given. [SemSearch and Hybrid search support this]
Query extension. (see discussion below)
related results. (see discussiom below)

Disambiguation
Refinement

Interface

It is often mentioned that the interface for search terms should be simplistic. The google interface is regarded as very positive. Forms and facets [should we discuss interface issues]. Other options, type of literal match selection (prefix,substring).

Loading message, # of results, process time, warning message (no results, number of result limit etc.)

Semantics play an important role here. The properties, classes and concepts allow the system to explain the possible interpretations to disambiguate and possible dimensions for refinement. Allows give precise feedback. Trade-off between search and browsing. Problems arise with ontological resources that do not make sense to the user. There is a need for an interface ontology (natural categories).

Semantics play an important role in clustering. The semantics provide the dimensions for the grouping of results as well as the explanation of the groups.

Discussion

Semantic Search

What do we consider as semantic search. [This can also explain why certain systems are chosen and others are not. Why do we not consider SeRQL or SPARQL query language or natural language based interfaces. I am not sure yet. The underspecified input of keyword search has something to do with it. In NL the focus is on the interpretation of the expression. Once the interpretation is made it is clear how the database should be queried. To answer an underspecified query it is uncertain what and how much should be retrieved from the database. The SPARQL describe construct is a similar notion of vagueness. It is exactly the part that is underspecified in the SPARQL spec.]

I think we should focus on systems in which the goal is to find instances. The target group typically are end users. This excludes systems, such as swoogle, swangler and ontoSearch, where the target type is a sementic web document, an ontology or resources used in an ontology, a class or predicate. The target group for these systems are developers and knowledge engineers.

General remarks about analysis of systems

Choose focus on certain aspects. There could be much more semantics done in the background, integration, smushing, extraction etc. We only look at the semantics used in the actual search process.

General remarks about the use of semantics

use metadata to extend literal match.
disambiguation of input, refinement of input and complex query patterns.
complex paths to find related resources.

In the first there is some consensus. Properties to use are synonyms, sameAs, narrower. Can be done on the fly or the additional terms can be added to the index offline. This Increases the recall, simply because the number of available "meaningful" terms is increased. On how to do disambiguation there is also some agreement. rdf:type and the property between the value and the search result are typically used. Facets are very well suited for refinement and to construct complex (union/intersection) queries. The third is the most exciting one and here we see various solutions. Rules: Ontogator/MuseumFinland uses logical rules, Squiggle predefined paths, Tap Semantic search predefined properties for each class. User controlled: KIM complex search query form, /facet has cross relations. Graph search: e-Culture weighted graph algorithm.

Can we describe the need for complex paths? Which additional results can we find with this? We can make a link to relation search here. If the focus is not the item itself but the relation between two or more items the complex paths are itself search targets. \cite{Seth Work}.

I think we can distinguish two types of complex paths. The first serves for query extension in order to find more results. The second allows the system to suggest related concepts, which allows further exploration.

Complex path is introduced by modeling decisions. In other words the value could have been modeled as a direct value. The use of blank nodes introduces this complexity. Another example would be the complex annotation in the multimedia ontology. Dealing with this type of complex paths is not a semantic issue. If the system is aware of which complex paths to use or if some mapping exists at the data level the issue of relevance does apply. .. the value is now a direct value of the item.
Complex path in which the occurrence of one or more concepts is crucial to the relation between the items. In other words the relation exist by virtue of an additional concept. In this case the system has to decide which concepts and relations are relevant. Many factors play a role here.

Interpretation of the input in advance (SemSearch) or presentation techniques that allow refinement and browsing on and from the result set. In SemSearch it is also issue iii). that is problematic. This case the problem is approached from the other side. What are the possible combinations of search terms can according to the data.

Search Results

In the presentation of the results almost all applications use a hand configured template form. FRESNEL \cite{fresnel05} could easily be applied for this purpose. Most applications support some sort of local view on an individual search result. Often this also allows some form of browsing. This connects to browsers such as Tabulator and Disco. Ranking of results is an open issue, added value of semantics is not clear. Clustering is often applied, added value of semantics in the form of meaningful explanations of the clusters.

User feedback

Disambiguation, refinement and suggestion of related items

Related Work

A Categorization Scheme for Semantic Web Search Engine, 2006. Kyumars Sheykh Esmaili, Hassan Abolhassani. Ontology search engines (meta search, crawler based) and semantic search engines (context based, evolutionary, semantic association). The systems that they cover seems to be very complete. The categorization is straightforward. I guess this means we can leave such a categorization out of our paper and just refer to this one. More room to focus on the role of semantics.

Evaluation

Fields experimenting with semantic web tools in a virtual organization, 2003. Victor Iosif, Peter Mika et. al. How do we test SW tools? Design considerations for Semantic Web Field Experiments. Description of an experimental setup with SW applications QuizRDF (search), spectacle (browse) and traditional free text search of EnerSearch.