IPTC Annual General Meeting - Trip report by Raphaël

41st IPTC Meeting (Vienna 2006)

2-6 July 2006, Vienna, Austria

IPTC Annual General Meeting 2006
Author: Raphael, George
CWI participants: Raphael, George
# participants: around 50

Overall impression

Very well organized meeting. Magnific hotel in a nice city, with really nice social events. The audience was composed of all IT managers from the various news agencies or private companies that process news data, provide content management solutions, that is mainly men (I count only 3 women!), in their 50's, so it was quite scaring.
All the IPTC working documents (past meetings, minutes, presentations, past decisions, standards, etc.) are available in the NewsML directory.

For the first social event, we have visited the APA (Austrian News Agency) office and see how they work. It was really impressive ! One giant room of 16000 m², 160 persons working together with a special building architecture for the cooling/heating, no noise at all despite everybody is in the same room, etc. Visit of the IT department, composed of 60 persons (30 developers). Demonstration of their in house-developed tools for indexing / archiving / searching the news (mainly textual stories + photos).
The most innovative part of the demonstration was a dynamic clustering of the news stories, by topic when one does a search in the database. A map is displayed with island of different size depending on the number of stories found by topic. When one enter in a topic, a new clusterization is done, etc. The indexing uses what I call the "brute force approach"! They do a full text search on everything, they even do not distinguish if a word appears in the title field, in a controlled vocabulary or in the textual description. The topic clusterization is done by pure statistics techniques (text similarity) and no semantics/terminological techniques are used. They would like to, but they do not know really how to do ...

Many technical discussions with Laurent and Misha. Laurent Le Meur (AFP) is really a technical guy, that has many interesting ideas and would like to prototype many small applications for AFP. He is personnally interested in the facetted browser that he know since 2 years, with the timeline view approach, etc. He has prototyped in AFP a geo-localization map of the stories (things we would like to add also in the facetted browser).
He would like to establish a cooperation CWI/AFP for testing new ideas. He is willing to give us A FULL YEAR of news (textual stories, photos, videos, audio). He could deliver all the data and NewsML2 metadata by August 30.

Late Breaking News

Creating of a new Working Group for Photo Metadata.
Approved Name: "Photo Metadata Working Group"
A public mailling list will be announced soon!

The European Broadcasting Union (EBU) is also an IPTC member. They would like to re-engineer all their metadata system, currently thesaurus that are not efficiently represented. The idea is to standardize a set of metadata vocabularies to describe a minima the TV programs (link with the NAR for describing any video). They are now very interesting by SKOS and they could use it to represent the next version of the EBU metadata.

News Architecture Working Party (NAR), Laurent Le Meur (AFP) - slides

Goals of the meeting:
- Report from the ongoing phase of the second experimental phase
- Intention to setup a Photo Metadata WG

Latest version of the specification, examples, tests, etc. available at: http://www.iptc.org/dev/

NAR = News Architecture G2. NAR is *not* an IPTC standard, but the core of all IPTC G2 standards. NAR provides a generic framework on which IPTC standards are built:
- NewsML G2: representation and management of news (textual stories, photo, video, audio, graphics)
- EventsML G2: representation and management of events
- SportsML G2: representation and management of sports results and statistics
Goals: simplify the processing of news objects, managed in the same way, compatible with NewsML1 (but not the syntax), compact, storage-friendly, more semantic capabilities, using the latest XML technologies.
Work status: NAR should be adopted formally in january 2007, try to have CURIEs as a W3C standard.

Use cases:

John is a stringer, he writes articles and takes photos to several publishers, uses one standard = NewsML G2.
newsItem: embed the content (photos, textual stories) + metadata associated with the content + management metadata (versioning, identification).
Example: sample of a "newsItem"

<newsItem>
  <catalogRef/>    <!-- definition of all vocabularies -->
  <itemMeta/>      <!-- metadata about the newsItem -->
  <contentMeta/>   <!-- metadata about the content: subject, slugline, headline -->
  <contentSet>     <!-- wrapper for "alternative" representations of news content -->
    <inlineXML>
      <someXML/>   <!-- an XML strtucture from an external namespace: could be RDF-A -->
    </inlineXML>
  </contentSet>
</newsItem>

The metadata can help to adapt the content to some context, devices, users, etc. For example, the alternative rendition of a video can be a still image, etc.

InstantNews: a News Agency.
- Alerts: use of newsItem (common set of metadata), provide large
- Breaking news: specific metadata associated with specific media types (e.g. video shot list), specific textual content markup, rules for representing other media objects
- Sports events: use of SportsML but common set of metadata
- Breaking events: to be detailed by the EventsML WP
The common metadata are:
- Administrative layer
- Core conformance level: date and location of the content created, source of information, creator, contributor and audience.
  Audience will have more and more importance, since the news will be delivered to particular communities (language, geographical, social-based, etc.)
- Descriptive layer
- Core and Power conformance level: language of content, subject (what the content is about), genre (what the content is), slugline (ordered keywords), headline (short introduction), description (caption, abstract)
  Genre and subject are the most important properties for indexing the news content
Global Photo: photographer's practice = list of (comma separated) keywords.
- IIM: object name, keyword
- XMP: title, keyword, subject; metadata inside the file
Compatibility XMP / NAR desired: Adove products should evolve its control panels (e.g. store subject codes but display the labels !), subject definition to be extended
Addressing Communities

NAR engine: easy integration in professional editorial systems (user interface, web or client application). Many TO DOs: marketting the NAR, convert the metadata into relational databases, provide software codes (Java, C#) for processing the metadata, etc.
Identity and versioning: persisting and universal identity
Multi-dimensional content: different content renditions, news content preview via slugline (keywords), headline, description, cross-media publishing (print, web, mobile)
Networked items: kind of see also links, e.g. text/photo, photo/video, video/HTML page, etc.
Signature: all content should be signed!
Transforms: official (XSLT) transform from NewsML1 to NewsML2 provided by AFP, open source.
IP (rights): (very important issue!) Choose a standard between the 3 proposals on the table: MPEG-REL, ORDL, PLUS

News + knowledge: would like to browse the news by: themes, people, organisation, geopolitical areas, point of interest, etc.
Concept definition: identifier, type, sameAs, broader, narrower, related (+relation name, extract from a controlled vocabulary), name, definition, note! This is exactly SKOS !
4 types of concepts already defined:
- Organisation: founded, dissolved, sector, location, contact information
- Person: born, died, etc.
- Geographical areas: gps, altitude, geopolitical type
- Point of interest: open hours, capacity, facility, access, details, etc.
Use of <topicItem> instead of <newsItem> that contains the value of the properties of a concept.

How to get information about concepts:
- concept identifier = code in a scheme
- scheme = URL (description of the scheme)
- {scheme, code} = URI = description (HTML pages) + definition (XML fragments) of the concept
- thesaurusItem and topicItem

Global model:

                                       AnyItem
                                          |
              /                           |                            \
             /                            |                             \
        NewsItem                     PackageItem                      TopicItem
            |                                               /             |               \
           / \                                             /              |                \
TextNewsItem  PictureNewsItem                PersonTopicItem  OrganisationTopicItem  GeoAreaTopicItem

Questions: why not using SKOS for defining the concepts?
Answers: NAR want to keep the syntax more compact, thus stylesheet will be provided to have the equivalent SKOS/RDF syntax. Bad news: Misha argues that there is an outstanding issue: the relationship between the concepts and the schemes are not the same in SKOS and in real world thesaurus ! Don't understand what the problem is but maybe something to talk with Alistair ...

Decisions:
- Proposed XML namespace policy: NewsML/NAR-NS-policy.doc

IPTC News Content Work Package, Henrik Stadler

IPTC Core - slides

Design goals: keep the focus on primary use for photography; add properties for better describing photo content; extend existing or create new photocentric News Codes (but keep the IPTC Core simple).
Long discussion: is it the work of IPTC to build a giant taxonomy of concepts ("real things" but not abstract concepts). Differences between keywords and subject codes ?
Following the CEPIC meeting, proposition to set up a new working group for dealing with images metadata only (rights metadata, stock photography specific fields). How to have interfaces that are able to access and update controlled vocabularies, kind of custom panels in Adobe/XMP products.

Timeline:

Collecting requirements: until October 2006
Assessing requirements and sorting out what to adopt and starting to work on the specifications: until Spring 2007
Working on implementation issues like the user interface: in parallel

EventsML - slides

Try to map the specification with the NAR. Some issues remaining. But globally OK.

News Content Working Party, General News Markup Working Group, SportsML Working Group

NewsML update - slides

Photo: the next thing in their TO DO list:
* XMP - NewsML G2 bidirectional mapping ;
* what is the list of the physical characteristics useful for pictures ?

Video:
* EBU requirements
* Shot list = structure description of segments in the content, with time references and maybe associated rights (interested in TemporalURI)

Text markup:
* No existing markup fully covers the NAR requirements.
* 3 Proposal: i/ define a core NITF, ii/ use XHTML2 modules and extend if needed, iii/ create a new ArticleML (will be introduced by NITF WP)
TO DO = define a global model for textual markup!

SportsML update - slides

Latest version, 1.7 posted! Big delay, they are very late.
New schema for representing all Base Ball statistics. Enhancements for Tennis Schema.
Work now on SportsML 1.8.

Profium company - see the slides

Based in Finland, member of IPTC and W3C. Provide content management solutions (claim to be semantics). News Agencies are their customers: AFP (French), ANP (Dutch) STT (Finish).
They were present at the Semantic Technology Conference 2006, organized in San Jose (see their presentation).

News Codes Working Party

The NewsCodes are the controlled vocabularies developed by IPTC. Reviewing of the various taxonomies. Very funny discussions such as the Japanese that insist that "BodyBuilding" should be considered as a "Sport" and not at a "Lifestyle and Leisure" :-)
A proposal by Reuters about the label of the NewsCodes that should start by a capital letter, be in the singular unless it is a plural noun in British English. Decision: the NewsCodes labels management will NOT change. It has not been accepted.

Bringing NewsML2 into the Semantic Web, CWI - slides

I think the audience catch all the bits of the presentation.
One remark: forget the name "NewsML2" and replace it by "NewsML G2" !

Some comments:

IPTC would like to be a W3C member! ACTION for me to ask W3C if they should pay the low admission fees or not !
Misha would like to join the XG, and has registered to the mailling list. Laurent should also!
Technical discussion regarding the NewsCodes / SKOS conversion. ACTION for me, to document what are the fundamental differences between the two approaches and see where does it break!
* In short, for NewsML, the concepts exist, independently of their representation. They have manifestation in schemes whereas in SKOS, conceptsare attached to particular schemes.
* Differences between URIs (SW world) and tuple {scheme, localName} in NewsML. That could be concatenated but the concatenation has a special meaning for NewsML, and the operation should be reversable. They also want something very compact. It seems that CURIEs do not match fully they requirements. They begin to propose to have "QCodes" instead of "QNames". They do not use namespaces for the same reason.
* GUID versus URI issue. They don't like the "host name" in http (a host name is volatile and non persistent). They would like to have a dereferencable mechanism, such as DOI.
Ask Laurent about the work done qu'il a fait avec Mondeca sur la modélisation d'une petite ontologie des entités (Person, Organization, Event, Place, Work) et les relations qu'ils peuvent extraire automatiquement. Demander le document à Laurent. Lien évident avec l'équivalent MPEG-7 semantics part !
Demander un document, mapping entre standards à Michael ...

NITF Maintenance Working Party - slides

NITF future: Article Markup Language.
See: http://www.articleml.org/proposed/.
Goals: Any ArticleML document can be included within a NewsML wrapper, which provides in-depth layers of metadata covering subject matter, authorship, publication history, priority, and other publishing details. NewsML also allows news items of any media type to be ranked and grouped, providing the ultimate flexibility in content distribution.

There are 3 alternatives "Article Markup Languages" that semantically structure editorial text:

Article Schema: own XML markup language for structuring articles
NITF Schema: kind of mix between structuring information and presentation information, most likely the worst situation.
HTML Micro format: a micro format ... but no link now with RDF-A

MESH (Multimedia sEmantic Syndication for enHanced news services),
ATC - slides

Speakers: Nikos Sarris is a Senior IT consultant and Evi Varsou is the Head of Integrated Solutions Unit in the Athens Technology Centre company.

A very buisness presentation that you can see at NewsML/MESH-Presentation.ppt ! There was a joint flash presentation, nice, made by communication company but with only buzz words or presentation of all subtilities of EU Integrated Projects (like presentation of Work Packages ...)

Well, I have thus a quite bad idea of the presentation. First, it was the typical NTUA Greek presentation, where they were talking more about Image and Video Content Analysis in general rather than the News domain in particular ! All notice that there are no news agencies in the project (emphasize as a weakness of the projec tby the EU). They try now to have news agencies in their advisory board ... so they could work but for free, something that IPTC do not really like!

The presenter has almost discovered today NewsML and there was a Working Group in IPTC that deals with metadata standards, that try to develop model and format for representing news metadata ...

There will be a "Media day" organized during SAMT, the 6th of December ! One of the question dealt with the overlap with the aim of the Quaero project while the presenters were not aware of that !