IPTC Annual General Meeting 2006
Author: Raphael, George
CWI participants: Raphael, George
# participants: around 50
Very well organized meeting. Magnific hotel in a nice city, with really nice social events. The
audience was composed of all IT managers from the various news agencies or private companies that
process news data, provide content management solutions, that is mainly men (I count only 3 women!),
in their 50's, so it was quite scaring.
All the IPTC working documents (past meetings, minutes, presentations, past decisions, standards, etc.)
are available in the NewsML directory.
For the first social event, we have visited the APA (Austrian News Agency) office and see how they
work. It was really impressive ! One giant room of 16000 m², 160 persons working together with a
special building architecture for the cooling/heating, no noise at all despite everybody is in the same
room, etc. Visit of the IT department, composed of 60 persons (30 developers). Demonstration of their
in house-developed tools for indexing / archiving / searching the news (mainly textual stories +
photos).
The most innovative part of the demonstration was a dynamic clustering of the news stories,
by topic when one does a search in the database. A map is displayed with island of different size
depending on the number of stories found by topic. When one enter in a topic, a new clusterization is
done, etc. The indexing uses what I call the "brute force approach"! They do a full text search on
everything, they even do not distinguish if a word appears in the title field, in a controlled
vocabulary or in the textual description. The topic clusterization is done by pure statistics
techniques (text similarity) and no semantics/terminological techniques are used. They would like to,
but they do not know really how to do ...
Many technical discussions with Laurent and Misha. Laurent Le Meur (AFP) is really a technical guy,
that has many interesting ideas and would like to prototype many small applications for AFP. He is
personnally interested in the facetted browser that he know since 2 years, with the timeline view
approach, etc. He has prototyped in AFP a geo-localization map of the stories (things we would like
to add also in the facetted browser).
He would like to establish a cooperation CWI/AFP for testing new ideas. He is willing to give us A FULL
YEAR of news (textual stories, photos, videos, audio). He could deliver all the data and NewsML2
metadata by August 30.
Creating of a new Working Group for Photo Metadata.
Approved Name: "Photo Metadata Working Group"
A public mailling list will be announced soon!
The European Broadcasting Union (EBU) is also an IPTC member. They would like to re-engineer all their metadata system, currently thesaurus that are not efficiently represented. The idea is to standardize a set of metadata vocabularies to describe a minima the TV programs (link with the NAR for describing any video). They are now very interesting by SKOS and they could use it to represent the next version of the EBU metadata.
Goals of the meeting:
- Report from the ongoing phase of the second experimental phase
- Intention to setup a Photo Metadata WG
Latest version of the specification, examples, tests, etc. available at: http://www.iptc.org/dev/
NAR = News Architecture G2. NAR is *not* an IPTC standard, but the core of all IPTC G2 standards. NAR
provides a generic framework on which IPTC standards are built:
- NewsML G2: representation and management of news (textual stories, photo, video, audio, graphics)
- EventsML G2: representation and management of events
- SportsML G2: representation and management of sports results and statistics
Goals: simplify the processing of news objects, managed in the same way, compatible with NewsML1
(but not the syntax), compact, storage-friendly, more semantic capabilities, using the latest XML
technologies.
Work status: NAR should be adopted formally in january 2007, try to have CURIEs as a W3C standard.
<newsItem> <catalogRef/> <!-- definition of all vocabularies --> <itemMeta/> <!-- metadata about the newsItem --> <contentMeta/> <!-- metadata about the content: subject, slugline, headline --> <contentSet> <!-- wrapper for "alternative" representations of news content --> <inlineXML> <someXML/> <!-- an XML strtucture from an external namespace: could be RDF-A --> </inlineXML> </contentSet> </newsItem>The metadata can help to adapt the content to some context, devices, users, etc. For example, the alternative rendition of a video can be a still image, etc.
NAR engine: easy integration in professional editorial systems (user interface, web or client
application). Many TO DOs: marketting the NAR, convert the metadata into relational databases,
provide software codes (Java, C#) for processing the metadata, etc.
Identity and versioning: persisting and universal identity
Multi-dimensional content: different content renditions, news content preview via slugline
(keywords), headline, description, cross-media publishing (print, web, mobile)
Networked items: kind of see also links, e.g. text/photo, photo/video, video/HTML page,
etc.
Signature: all content should be signed!
Transforms: official (XSLT) transform from NewsML1 to NewsML2 provided by AFP, open source.
IP (rights): (very important issue!) Choose a standard between the 3 proposals on the table:
MPEG-REL, ORDL, PLUS
News + knowledge: would like to browse the news by: themes, people, organisation, geopolitical
areas, point of interest, etc.
Concept definition: identifier, type, sameAs, broader, narrower, related (+relation name, extract
from a controlled vocabulary), name, definition, note!
This is exactly SKOS !
4 types of concepts already defined:
- Organisation: founded, dissolved, sector, location, contact information
- Person: born, died, etc.
- Geographical areas: gps, altitude, geopolitical type
- Point of interest: open hours, capacity, facility, access, details, etc.
Use of <topicItem> instead of <newsItem> that contains the value of the properties of
a concept.
How to get information about concepts:
- concept identifier = code in a scheme
- scheme = URL (description of the scheme)
- {scheme, code} = URI = description (HTML pages) + definition (XML fragments) of the concept
- thesaurusItem and topicItem
Global model:
AnyItem | / | \ / | \ NewsItem PackageItem TopicItem | / | \ / \ / | \ TextNewsItem PictureNewsItem PersonTopicItem OrganisationTopicItem GeoAreaTopicItem
Questions: why not using SKOS for defining the concepts?
Answers: NAR want to keep the syntax more compact, thus stylesheet will be provided to have the
equivalent SKOS/RDF syntax. Bad news: Misha argues that there is an outstanding issue: the relationship
between the concepts and the schemes are not the same in SKOS and in real world thesaurus ! Don't
understand what the problem is but maybe something to talk with Alistair ...
Decisions:
- Proposed XML namespace policy:
NewsML/NAR-NS-policy.doc
Design goals: keep the focus on primary use for photography; add properties for better describing
photo content; extend existing or create new photocentric News Codes (but keep the IPTC Core
simple).
Long discussion: is it the work of IPTC to build a giant taxonomy of concepts ("real things" but not
abstract concepts). Differences between keywords and subject codes ?
Following the CEPIC meeting, proposition to set up a new working group for dealing with images metadata
only (rights metadata, stock photography specific fields). How to have interfaces that are able to
access and update controlled vocabularies, kind of custom panels in Adobe/XMP products.
Try to map the specification with the NAR. Some issues remaining. But globally OK.
Photo: the next thing in their TO DO list:
* XMP - NewsML G2 bidirectional mapping ;
* what is the list of the physical characteristics useful for pictures ?
Video:
* EBU requirements
* Shot list = structure description of segments in the content, with time
references and maybe associated rights (interested in TemporalURI)
Text markup:
* No existing markup fully covers the NAR requirements.
* 3 Proposal: i/ define a core NITF, ii/ use XHTML2 modules and extend if needed,
iii/ create a new ArticleML (will be introduced by NITF WP)
TO DO = define a global model for textual markup!
The NewsCodes are the controlled vocabularies developed by IPTC. Reviewing of the various taxonomies.
Very funny discussions such as the Japanese that insist that "BodyBuilding" should be considered
as a "Sport" and not at a "Lifestyle and Leisure" :-)
A proposal by Reuters about the label of the NewsCodes that should start by a capital letter, be in
the singular unless it is a plural noun in British English. Decision: the NewsCodes labels management
will NOT change. It has not been accepted.
I think the audience catch all the bits of the presentation.
One remark: forget the name "NewsML2" and replace it by "NewsML G2" !
NITF future: Article Markup Language.
See: http://www.articleml.org/proposed/.
Goals: Any ArticleML document can be included within a NewsML wrapper, which provides in-depth
layers of metadata covering subject matter, authorship, publication history, priority, and other
publishing details. NewsML also allows news items of any media type to be ranked and grouped,
providing the ultimate flexibility in content distribution.
Speakers: Nikos Sarris is a Senior IT consultant and Evi Varsou is the Head of Integrated Solutions Unit in the Athens Technology Centre company.
A very buisness presentation that you can see at NewsML/MESH-Presentation.ppt ! There was a joint flash presentation, nice, made by communication company but with only buzz words or presentation of all subtilities of EU Integrated Projects (like presentation of Work Packages ...)
Well, I have thus a quite bad idea of the presentation. First, it was the typical NTUA Greek presentation, where they were talking more about Image and Video Content Analysis in general rather than the News domain in particular ! All notice that there are no news agencies in the project (emphasize as a weakness of the projec tby the EU). They try now to have news agencies in their advisory board ... so they could work but for free, something that IPTC do not really like!
The presenter has almost discovered today NewsML and there was a Working Group in IPTC that deals with metadata standards, that try to develop model and format for representing news metadata ...
There will be a "Media day" organized during SAMT, the 6th of December ! One of the question dealt with the overlap with the aim of the Quaero project while the presenters were not aware of that !