Semantic annotation
-------------------

A basic task in semantic web user interaction is finding a controlled
term by means of a textual description. Both keyword search and
annotation are examples hereof. Semantic autocompletion is one method
that supports this functionality. An autocompletion component suggests
the terms that match the users input. Typically prefix matching is used
for the suggestions. Semantic autocompletion allows the user to see the
possible terms and try different forms of input without actually
submitting them.

Semantic autocompletion is used in various interfaces (see semantic
search survey).

In any non trivial dataset, selection of a term from the list of
suggestions is often not trivial. The selection process involves two
issues. First a user has to find the matching label from the suggestion
list. Secondly the same label can be used for different terms.

Information other than the matching label is often available in order to
make the user decide which term to use. However, in autocompletion
interfaces the amount of information that can be presented is limited by
(visualization/interaction) space reasons. Therefore, a decision must be
made about what information to present and how to present this. We will
present several strategies to do this.

Our hypothesis is that some strategies are best suited for the
disambiguation of particular terms and less well suited for others. For
example, the instance-class relation is able to disambiguates paris as a
location from the person named paris. If we have to disambiguate between
two cities, the countries in which they are located is often the best
choice. From the visualization perspective the information could be used
to cluster the results or simply by adding it as a sublabel.

[Explain why we choose to do this experiment on annotation] We see
semantic annotation as a symmetrical variant of fact finding. More
precise it is the process of searching for a term given an object,
whereas in fact finding (in the SW case) a user searches for an object
by means of a controlled term.

[We belief that the annotation task allows us to constrain the search
process. Explain!]

[We have to argue why ranking alone is not sufficient. 1. it gives no
insight wether it is the right term 2. popularity might not always be
most suited for annotation. However ranking does have a big influence on
effectiveness.]

BENEFITS OF CONTROLLED TERMS OVER TAGS 
- Often multiple related tags are used. This information is already in
  the background knowledge.
- Additional information about terms, such as tags for mapping and dates
  for time visualization.
- Uniqueness

HIGH LEVEL DESCRIPTION
The high level goal of this experiment is to find out how the use of
semantics improves the effectiveness of finding the right term for
annotating an object.

USER GROUP
For now we have divided the user group into experts and non-experts in
the cultural heritage domain. This choice is related to the type of
annotations as well as the availability of test subject.
- It would be difficult to collect a sufficient number of experts for a
  valid quantitative evaluation. As a rough estimate, for every new
  variable we want to test, we need to double the size of the subjects.
- In general we can say that experts provide information according to a
  fixed metadata structure. For example, at RMV an annotator has to
  provide at least 4 annotations, one from every branch of the SVCN
  thesauri. Non-experts annotate whatever they happen to know about an
  object and what is visually perceivable.

-> We choose to focus on non-expert users in the cultural heritage
domain. The main reason is the availability of non-experts or better the
unavailability of experts.

CRITERIA TO MEASURE
We want to measure the effect of various semantic facilities on the
"effectiveness" of making "correct" annotations. We have to define the
criteria to measure effectiveness and correctness. Furthermore we have
to describe annotation tasks in which we can measure how the semantic
facilities influence effectiveness and correctness. Possible ways to
measure effectiveness:

- Time required for an annotation (or the number of annotations made in 
a particular time)
- Qualitative user opinion

and correctness:

- manually compare annotations the user intended to make against the
  actual terms and relations used.
- construct a golden standard
- use existing annotations as golden standard

-> We choose to focus on effectiveness. Testing for correctness is
tricky. It either required long preparation time to construct a golden
standard or long qualitative experiments. We try to fix correctness as a
variable as much as possible by, a) Provide complete and non-ambiguous
tasks descriptions, b) we have to consider what to do with incorrect
answers.


RESEARCH QUESTIONS
- Which semantically based features enhance search effectiveness for
  particular types of terms?

HYPOTHESIS
[Classify particular types of terms and the expected features to be
useful.]

For pronouns (person,place,object) autocompletion has added value.
Unique names. Disambiguation required.
- Person disambiguation: ?
- Place disambiguation: hierarchical
- Object disambiguation: ?


WHAT TO ANNOTATE
-> We focus on the annotation of artefacts.

- From which collection(s) do we select artefacts?
	Paintings, ethnographic objects or a subset of these?

- In [1] Laura Hollink and others distinguish three types of
  annotations:
1) low level perceptual information, colors, shapes, visible in the
image
2) conceptual information about what is depicted 
3) non-visual information about the context.

The low level information (1) is outside the scope of this experiment.
For the contextual non-visual information (3) the annotation slots are
defined by metadata schemas, such as dc and vra. Binding ontological
descriptions to the annotation slots restricts the possible values
available for a slot. The vra properties can be grouped by the wh+how
groups:

What (title, description)
Who (creator, finder, trader, ...)
Where (current location/repository, creation site)
When (creation date, finding date, ...)
How (material, technique, style, type)
Why (?, comment)

The administrative properties (rights, source) might be added to the
what category.

Content annotations are best described by a situational model in which
the roles of the entities are expressed. Such a model might contain:
event, agent, action, object, recipient, instrument, place, time,
objective, mood, theme. Hyvonen and others propose a situation ontology
in [2]. The values for the content slots can also be bound to
ontological descriptions.

WHAT TO ANNOTATE WITH?
- Vocabularies AAT, ULAN, TGN, SVCN, Wordnet?
- Within the e-Culture project several mappings between the concepts in
  the different vocabularies are created. The mappings provide
  additional information that could the search process: more labels,
  more specific type definitions or new metadata. Do we like to use the
  mappings?

- Do we restrict possible terms by a schema? Typically the values
  allowed in an annotation slot are constrained. A RDFS/OWL schemas
  define these constraints with ontological definitions. Constraints
  usually also reduce the number of different types of terms, at the
  risk of not allowing the term the user wants to use.


Functionality
-------------

We choose text based search with autocompletion as the entry point for
the annotation interface. We think that autocompletion is a useful way
to start the exploration of the very large conceptual space that
describes our objects.

1. SYNTACTIC MATCHING
- match type: prefix, substring, whole word, whole expression, prefix of
  whole expression, prefix of each word in an expression
- allow regular expressions
- allow minimal edit distance
- which textual content to match on (labels and/or descriptions)
[explain why textual descriptions that are not labels limit the semantic capabilities.]

2. SEMANTIC MATCHING
- linguistic relations, such as synonymy expand the query. The
  ontological definitions on a slot restrict the available terms.
  linguistic expansion makes these terms accessible through alternative
  expressions.
- semantic relations, owl:sameAs, skos:exactMatch indicate similarity
  between two terms. Associative relations such as the related terms in
  thesauri indicate some relationship between terms. As in linguistic
  expansion the relations extend the scope of the query.

3. DISAMBIGUATION

- by type: Which class to select for a resource is non-trivial. For
  example, the classes used in wordnet "wordsense" and "synset" are not
  so informative. Instead we use concepts from the hyponoym hierarchy.
  At the moment we use the root concepts of the hierarchy, examples are,
  entity and "psycholgical feature". We could choose other concepts or
  even better dynamically determine which is best suited. The latter
  would be very interesting.
- by a specific property: For example, show the date of birth from a
  person or a textual description.
- by location in the hierarchy: For example, the place paris occurs
  multiple times in a place hierarchy. This is the method used in MIA
  and the e-Culture annotation tool.
 - by relation to annotation object. The possible relations to an
  artwork might be sufficient for disambiguation. For example,
  creationSite would be sufficient to illustrate the hit is a place,
  where creator means it is a person.

3. ORGANIZATION
Any of the properties used for disambiguation can be used to group the
results by. Grouping the results gives an immediate overview of the
possible classifications of the terms. A disadvantage is that only a
limited number of items can be shown in a group.

4. Ordering
- on literals 
	- string match: exact match is best result
	- content based: tf.idf
	- structure based: ?
- on resources
	- content based: ?
	- structure based: importance of resource, for example, by number of 
	in-out going links
- on groups
	- use ranking of resources in group
	- predefine preference to groups

5. SELECTION
- number of items shown in the result list and within a group
- number of groups shown

6. VISUALIZATION
- size of the result box
- highlight matching part of the string
- ...
	
7. INTERACTION
- On mouseover additional information about a resource is displayed
- Selection of a group header gives access to all resources within that 
group
- View all results (non-autocompletion)
	
	
[1] Laura Hollink, Semantic Annotation for Retrieval of Visual Resources
[2]	Junnila and Hyvonen, Describing and Linking Cultural Semantic Content by Using  Situations and Actions