Jacco, Lloyd and Lynda attended. Report by Lynda. Additional comments by Lloyd styled like this.
The
papers are in the ACM DL.
Conference
proceedings
from conference 10 onwards and all
conferences. Links to slides for all the W3C track sessions.
Some in the organization for the series as a whole felt this year was a bit under-managed. They gathered a committee later than usual, for example. This varying in organizational quality happens in all series: each year has a chair with much more authority and responsibility than anyone management the long run, and the quality of this person varies from year to year. New York was also under-managed, though probably more so. Fortunately, this year's dinner was only awkward compared to last year's disaster. In general, the conference seemed to go well. But there wasn't the "gezellig" buzz of Budapest 2003 or Hawaii 2003.
Furthermore, as is usual for this series, the papers weren't as engaging as the panels and W3C track. The paper track research fields are so broad that each researcher attendee is involved in only a few of them. The panels and tracks, on the other hand, are about the underlying infrastructure, both technical and conceptual, of the work all attendees do. The panel and W3C track thus each got a big room downstairs, while the tracks each got one of four small rooms upstairs.
However, the submissions this year were high, and the quality of the papers seemed good, so the quality of the series as a whole is riding out a pair of slightly off years. Furthermore, next year's conference in Edinburgh sits in the capable hands of Southampton, with Les Carr and David DeRoure as general co-chairs. They gave WWW2006 a strong presence at the conference. The track chairs are all already assigned by very capable Program Chair Carole Goble and we already started meeting with each other at the conference. And Les walked around the whole time in a kilt, giving away free whisky to luckily selected pre-registrants for WWW2006 (yes, you can pre-register now, but the whisky is "op"). Look's like it will be high quality fun next year.
The workshop is a collection of people who want to make the (current) web accessible rather than a bunch of researchers trying to carve out a new research area. It feels more like a proto-working group discussion. (This is not necessarily bad, but should change the perspective with which the papers in the workshop proceedings should be viewed.) There are two things I want. The first is web pages that are accessible (which is what the workshop participants want). The second is to be able to incorporate the knowledge in the WCAG guidelines into (our) web page generation processes.
Main body of notes on papers/presentations at workshop for those very keen.
Don't educate the content provider's, but the tool builders. http://www.w3.org/WAI/intro/wcag.html
WCAG 1.0 is official, but WCAG 2.0 is a working draft. Macromedia
are working on putting this into Flash. SVG and SMIL need to be
updated. Trying for recommendation this year - so trying to stay
focussed. Mentioned Flickr as a
site using flash but not being accessible (yet).
Chatted to Wendy Chisholm about what I/we can do by directing resources rather
than dedicating them. Source of funding for this? (CHIP) Semantic
web. NL Net?
Afternoon keynote Eric Meyer. Can you have accessible design? Humans are very visually oriented and many designs use images. CSS Zen Garden shows different way out designs on the same content. Screen-scraping a web page and rendering the (visual) presentation audibly is deeply flawed. Documents are becoming semantic and structural = accessible. If it is audio browsing then forget about how it looks - it is audio. (Amazon and E-bay are still all visual based.) Screen readers should become audio browsers. Try putting a DOCTYPE in - if there is a DOCTYPE then use standards mode and otherwise use backward compatibility ("crutch") mode. Need some form of audio styling for an audio browser. In summary, there is tension between accessibility and visual design. But this is a not a huge amount of tension. We need to build audio browsers so that we need audio styling (voice ML didn't go anywhere), but there is a newer Speech Synthesis Markup Language (SSML) Version 1.0. More at "Voice Browser" Activity.
EARL - Evaluation and Report Language. It would be interesting to
talk to Shadi
Abou-Zahra, but unfortunately he couldn't make it. Wendy Chisholm is
working fairly closely with him - I chatted to her during lunch.
Shadi is chair of the Evaluation
and Repair Tools WG.
Designers, content authors, programmers, managers, evaluators. Boils
down to needing machine-readable syntax for test results to allow
tools to check pages automatically. Builds on RDF. Purpose is
generic quality assurance. EARL is to use SemWeb technology for
testing existing pages - we want it the other way round, to use the
same info to generate pages. OWL gives them more options. Their
problems (e.g., describing location for test results; improving
persistence of test reports) are competely different from our own.
(Hey, cute page. It is an interface to itself. http://www.w3.org/Talks/. Mind
you, I can't find the slides for this talk :-( . They will be put
on the W3C slides site. )
Kanpachi :-)
Finalists for best paper award: Algorithmic Detection of Semantic Similarity; Sampling Search-Engine Results; Three-Level Caching for Efficient Query Processing in Large Web Search Engines; G-ToPSS: Fast Filtering of Graph-based Metadata.
Writer and reader battle over the communication space. Style sheets
are battle ground. Send html mail in size just a little bit smaller
than the reader wants (default is -1). There is nothing like plain
old text!
(Slide 5) Web works because of expected reuse of information. Need to
preserve balance between keeping the original intent of the content.
(Slide 8) What do people hate about the internet? Spam. What do they
hate about the Web? Pop-ups. But whose problem is it? Most spam
uses html. Phishing is costing banks a huge amount of money at the
moment.
(Slide 10) Causes are user executing untrusted code,
confusion between code and data by users, browser software and
operating system.
(Slide 13) Safe languages are: Declarative, Visible
(REST); Not turing complete (scripts); Maybe not as expressive as
first order logic (logic); Have a standard meaning; Separation of
form and content.
(Slide 21) Great picture of mobile phones.
(Slide 22) Really brings home the difference in numbers between PC
access to the Web and mobile access.
(Slide 29) The Mobile Web Initiative has just started.
(Cathy's notes from her panel contribution in 2004. (With thanks to Nick Gibbins for sending me the URI.))
How can we encourage growth of the Sem Web?
Open source software: Making tools to make tools, components, toolkits.
Commercialised implementations: Making centralized...
Where is the Web in the Semantic Web? Desired features: distributed,
open world, data manipuation by others' machines. Best candidates: FOAF...
Q1 How would you characterize today's Sem Web?
Jim Hendler: What growth are we seeing? Why are we seeing it? How
can we encourage it? Numbers of RDF documents are increasing optimistically.
Zavisa Bjelogrlic (he had a paper at ISWC in 2003): Important to move from initial academic
applications to real applications.
Bernadette Hyland: Founder/CEO of Sem Web company. Still at stage of
engineers developing good toolkits. Research scientists and
developers are current champions within their institutes.
Kanzaki Masahide: People mean different things by Sem Web. Sem Web
is a concept, not something to achieve. Advanced search on not
necessarily RDF data. Many applications use own vocabularies.
Zavisa Bjelogrlic: slides. Is
there a viral model (slide 3) for spreading the Sem Web?
Context - why should a sustainable business or cooperation get
involved? Who puts in investment and who gets return? Model can be
complex and there may be a long time lag for return. Is there a
contradiction about (global) "knowledge" and "sharing"? [[He's going
too fast for my note-taking :-( ]]
Chinglish (web site open in a month), Sem Web based web site.
Sem Web is between Web and Semantics.
Jim Hendler: Too much effort is going into the business space rather
than the personal space. Need to create the information space that
commerce will come to. E.g. Amazon exploited the newly accessible
collection of users.
Bernadette Hyland: Thomas L. Friedman book "The World is Flat". Young people doing web sites
and what this has meant to large companies.
Kanzaki Masahide: Semantic Web should be easy, fun and useful - for
both users and developers.
Q from audience. What will the Sem Web give me - as a developer
familiar with Web tools.
A The panel is not going to address what the Sem Web is, but plenty
of time at end of panel session. (This was a polite but firm
statement that the panel would not debate the uses/utility of the
Semantic Web (which has been done at least at WWW10 (Hong Kong) and
WWW 2004
(Budapest) - by yours truly as it happens :-) ), but move on from these issues.
Moderator: What does the Semantic Web require to become mainstream?
Zavisa Bjelogrlic: Lower barriers for people to enter. Need to reach
more than above average programmers and engineers. Start from
something people know well.
Jim Hendler: Oracle has RDF support, Adobe embedding RDF into
documents. The Q is not how to get this thing started, but how do we
get it running. The larger the company the longer it waits to deploy
new technology.
Kanzaki Masahide: There is existing metadata. We need to help people
use this existing metadata.
Q How will the Sem Web become mainstream?
Bernadette Hyland: Firm committment into limited deployment. [My
impression of the projects she is listing is that it is behind uptake
in Europe - but this is based on get-feel and not on fact... Talk to
Carole and FrankvH is probably the right answer.] Fine
research, incredibly valuable.
Jim Hendler: We need to get functionality out there, the end user
won't see the Sem Web. They know google works better than the old
search engine. The real question is how do we motivate people to use
more of it.
Kanzaki Masahide: Delicious and Flickr use metadata. One specific
domain. No utlisation of metadata. Semantic Web metadata portal.
[I'm afraid he talks like I think - lots of related topics but no
overall narrative.]
Zavisa Bjelogrlic: Using a lot of open source. Need common low-level
tools. Examples to explain why it is an interesting application.
[Didn't quite get the train of thought here.]
Q from audience: (From user perspective.) One set of bookmarks? One
address book? These are current frustrations. Problems are
currently solved individually on different platforms. We need to
know what the user wants. [This is where I vehemently disagree. We
are at the stage with the Semantic Web that we were with hypertext in
1987 - we had some bits of technology and understood they were
valuable. Asking a user would not have resulted in "I'd like the Web
please". Reading Bush (V., not G. :-) ) or Nelson was a much better
idea. We can ask the users where they would like the gear lever
when we have the gearbox sorted out.</rant>]
Jim Hendler: We are ready to move from tools for builders to tools
for users. But not shrink-wrapped yet. Need to do transition from
the first tools. [Their Sem Web driven web site looks like it is Sem
Web driven. Ours looks pretty normal.]
Nigel Shadbolt: We can solve medium-sized applications. But what
about "real web scale"? What can you tell people?
[Actually, now I think about it, where was Steven with RDF/A? One of the ways to feed the semantic web is to have the web page developers add in (lots of) little bits of RDF in disguisein XHTML pages. At this stage of the conference I wasn't thinking RDF/A...]
Describe functionalities of services provided. Example domain is
bioscience. Domain expert built 550 concepts during 4 months. They
used only 125 concepts, so only 23% of learned ontology. Then number
of services jumped to 600.
Goal is to support domain experts to learn ontology in less time.
Extraction method for ontology learning. Identify nouns and verbs and give relationship
between them. In biology domain there is a lot of composition, so
relatively easy to apply this method. First experiment gave "bad"
results. The problem was that there were concepts in the gold
standard that could not be extracted from the corpus. Evaluation 5
"I wish I had that" - concepts that were found that he had missed.
Nice summary slide: Broad coverage domains are important but hard to build.
Textual descriptions are good sources to extract descriptions from.
DO's can be semi-automatically learned. Semi-automatically learned
ontologies are suitable for semantic WS descriptions.
Hard to tell when you're the one presenting, but I got the feeling we made it clear throughout the talk what our thesis was: that the Semantic Web can serve as a repository for knowledge and media conveying it that can be rendered to helpful document-based presentations. The talk discussed both the paper's context of Semantic Browsing and it's focus of hierarchy generation from search returns.
We submitted this paper to the "UI and Browsers" track, and wrote it and presented it as "how hypermedia can use the Semantic Web". However, resolving program scheduling constraints put this paper in the "Semantic Web misc." session, simply labelled as "Semantic Web". The audience was resultingly largely semantic, and saw the paper more as "how the semantic web can use hypermedia", which is understandable given the paper's title.
In this context, Jeremy Carroll asked at the end of the talk how we propose to display blank nodes. I had to admit I had no idea what blank nodes were, which was unfortunate because the audience had several hard-core Semantic Web programmers who were aware of them as a rather fundamental construct in RDF and were quite interested in the question posed. Furthermore, the issue of how to present them has received discussion from semantic developers lately, making the question quite appropriate for the paper and talk from the Semantic Web perspective. Oops.
Jacco then brushed up a bit on blank nodes and guided the three of us in investigating further. A blank node is simply an element serving as an object in a triple that (a) has no identifier and (b) serves as a group for property assigments for the unidentified object. You can assign them properties just like any other resource. Them not having identifiers means (I conclude, perhaps incorrectly) that each blank node can serve as the object of one triple: the one the element defining it is the child element of. We still haven't figured out what they're good for. My current best guess is that is helps authors of RDF how encounter a need for an object that the RDFS doesn't provide for. It so, it is an author's work-around for unforesightful ontology design. There's probably more to it, though.
However, dispite our previous ignorance, Noadster would actually handle blank nodes quite well. The typical browsing problem with blank nodes is that, lacking identifiers, it is hard to give them referential text. However, Sesame automatically assigns unigue strings to each blank node and passed them back to Noadster just like resource URI's, meaning Noadster would display such strings as it would URI's. Better still, blank nodes have properties just like other nodes do. This means that giving a blank node an <rdfs:label> means that text becomes its title, meaning the user sees no difference between displays involving blank nodes and those that don't.
Daniel Schwabe came up to me (Lynda) afterwards and warned me that Lloyd's desire to make the Semantic Web presentable was so convincing that we need to be careful to acknowledge that the Semantic Web can also process machine-readable information!
I thoroughly enjoyed this talk. It is the first time I have seen a connection made between research-level computing science and providing useful aid in the developing world. If I can work out how to do the same... (accessibility helps those closer as well).
3-4 billion people, purchasing power $2 per day,
could grow to 6-8 billion in next 25 years.
Set up companies to be self-sustaining. Fixing one village doesn't
help. Needs to work on a large scale. Good example is eradicating
river blindness in West Africa. Carried by mosquitos. Put in a
sensor network and find where the larvae breed. Then spray
targetted areas. 30,000,000 were protected from infection. Freed
up 100,000 square miles of land - capable of feeding 17,000,000
people.
Other good examples, using computers in primary teaching. Often in
developing countries the teachers are not that familiar with the
material. Attendance rates at schools with computers are higher
(but note that you need the teachers as well).
Being poor is expensive. Water, medicine and credit are very
expensive. The distribution systems are not there.
Technology pays for itself by making things more efficient
(e.g. cell phone uptake). Even very poor people have a disposable
income. TV/radio access, pressure cooker. 7% rural income in
Bangledesh is spent on telephony. Lots of money comes from foreign
relatives - and telephones help coordinate this.
Micro credit is a big enabler. Grameen Bank 1976 started by
Mohammed Yunus(?). 2.6 mln
borrowers (95%) over 1,000 branches in 42,000 villages. 12,000
staff. Mothers use the money reliably for the children. (Men are
worse borrowers.)
US$ 3.9 B loaned since inception. Repaid with 98,75% recovery.
rate. Has never accepted any charity.
46.5% borrowers have crossed the poverty line. Most loans go to
people who have had loans in the past - i.e. they have learned how
to create more wealth from a loan.
One idea was to have a village phone. 95,000 loans of US$200 to buy
a mobile phone per village. The phone owner ("she") charges users per minute of use.
This scales and the loan taker maintains the system since her
income depends on it.
Aravind eye hospital group does cataract surgery for US$10. $2 to
make own lens, $3 for the surgeon and $5 profit.
7 surgeries per hour. Statistically safer and cheaper to get surgery
done there than in home country. Why are they good at this? They
do a factor 10 more cases and cases are much worse.
They make their own lenses for $2. 200,000 surgeries in a year.
2mln patients.
Solution has to do with sharing computers. TIER: Technology and Infrastructure for
Developing Regions. NSF 5 year grant.
Need to develop computers that are less sensitive to stable power
supply. Video link to hospital to give easier access to doctor.
Also want to free up doctor's time to do more surgery. They can pay
$2,000 for connectivity, and it costs only $1,000.
Work on getting connectivity set up. Need to build towers and get
power and set them up properly.
Make better use of speech recognition. Each word said by different
speakers. 98% recognition accuracy - not as good as the best, bur
good enough. So can speak to device in your own language (in this
case Tamil). Run as parallel processor running at 4MHz so power
consumption is very low. (No US students know anything about power
systems these days!)
Tsunami disaster. Broadcast over loud speakers what the weather is.
Side effect of system, people could be told to get off the beach.
A different village got a warning and they were able to clear the
beach, but there was no further information dissemination.
Security should be easy for the end user. One new idea is to have
people remember not passwords, but pass-pictures. E.g. show picture
on lake and user has to click on door, window and tree.
The wall
of sheep! Wireless networks are not secure and your passwords
can be sucked off with comparative ease. (After the talk I installed
my Windows updates... Baaah :-) )
Her talks are online, but at the time of the talk the slides - nice style! - were not available.
Jeremy J Carroll,
[[where are the slides?? in my emailbox]]
Combine machine processable and human-processable information. RDF/A is the
answer, but it's not finished yet!
David Wood (standing in for Eric Miller), Semantic Web
Applications
Piggybank is bookmarks "on steroids"! (Sounds like
writing hypertext on the web is almost here!)
Eric Prud'hommeaux, DAWG Data Access Working
Group, SPARQL Overview
SPARQL query
which includes information about the expected/desired structure of the results.
Open source, Helix
community. Started 3 years ago for communal building of media
player software.
Harmony
is about digital music and interoperability. Apple, Microsoft are
going down route of proprietary software.
Harmony gives meta DRM (digital rights management), where content is downloaded
in a format which can then be transcoded/transcripted to one of a
large range of devices. Real is committed to interoperability.
Music subscribers, (Rhapsody?) listeners average around 200 tracks a month (8 per day).
If you listen to fewer than 25 a month then you don't need to become a
subscriber.
Long term interoperability is crucial in DRM. Don't know how to open
source the DRM. There is strong cryptography in system, so difficult
to know how to open source it.
Lloyd asked why Real had resigned from the W3C last month. Answer was
that Real is right behind W3C and will rejoin real soon now.
Real formally notified the W3C in February that they would not renew their contract with the W3C when it ended at the end of March. Rob Glaser said in his answer that Real not renewing was more of a neglegent oversight than a company-wide change of vision. Based on my conversations with W3C staff after the keynote, there may be some truth to this. Often W3C membership is managed by lower-level divisions by the company head, and thus often division leaders struggling with small budgets opt out of the large company fee to balance books without fully consulting the directorship. But clearly there's bitterness in Real toward W3C membership in at least some managerial levels.
I only knew of Real's withdrawal from a one-line offical SYMM posting from the W3XC staff representative. At the keynote I confirmed it by seeing that Real was no longer listed as a member at the W3C website. However, much Googling during the keynote found no other mention of Real's withdrawal from the W3C. Judging also by how many heads in the audience suddenly perked up and turned my way when I asked the question, it was a big surprise to most in the audience, especially after Rob's very pro-standards keynote. Rob got applause from the audience with his "prodigal son returns" answer.
Based on conversations afterward, the W3C staff were clearly (and not surprisingly) pleased by the question, which they of course couldn't ask, but certainly gave them leverage when W3C staff Steve Bratt had lunch with Rob after the keynote to discuss renewing. I caught up with Steve after that, at which he discussed what I mentioned above about lower-level management bailing out of W3C only to have upper-level management be surprised and reverse the decision. Steve added that, as with Real in this case, W3C attempts to follow up with upper level management are often slowed by general inavailability of and lack of W3C access to the directors. The impending keynote gave W3C an opportunity to discuss the withdrawal directly with Rob, who told Steve then, as he told the audience, that it was a lower level decision that caught him by surprise and he would reverse.
However, other W3C staff, not sharing Steve's need to put a happy membership spin on things, said that Real's letter of notification of non-renewal included as an explanation Real's disappointment of how SMIL and SVG were handled. The staff was surprised by the mention of SVG because Real was only marginally involved in its development, and only marginally implemented it. Concerns about SMIL from various members are more widely known, but these are generally considered to be no greater than with other standards and from other companies: the W3C is a consensus process in which no one gets exactly what they envisioned when starting. Thus Real's objections were considered by the W3C staff as, for lack of a better word, childish ... or at least "not consensus-oriented".
My guess is that Rob Glaser is no W3C angel back at Real, but was not completely behind the withdrawal. Someone who was both in charge of a division-level budget and who worked directly enough in W3C working groups to have a grudge made and executed the decision, combining financial frugality with venting frustration. All parties at Real then got together and decided that, in the broader sense, involvement with W3C balanced out with more benefit than petty technical and political frustrations.
Note that Dieter didn't show up.
Two models for accessing the Web: passive ad hoc, vs active and event-based. Web actively respond and report real-time train schedules, or films showing near the hotel I am staying in.
(?)
Do we need a web time machine? Yes, for both past and future.
Querying the past - Google and Yahoo could collect all the pages they
index. Way back machine in the internet archive is pretty good too.
But there is no single snapshot of the Web at a particular moment in
time. [My thought is - so what?] In many cases the shift of content
of links doesn't matter, but sometimes it might.
Privacy problems in monitoring changes in pages in a timely fashion.
Can query past/future of the Web but to a limited extent.
Andrei Z. Broder
The Web decays and the temporality of links. People expect that a web
page is up-to-date unless stated otherwise explicitly. But this is
not true for many pages. Humans are better at recognising patterns,
so we can see better that pages are out of date. Machines can look at
last modified date.
Carole Goble
Life sciences, encode genome information and need to be able to query
it. Every resource is on the Web. (They keep old (decaying) web pages -
since it is useful to know what they used to know.) They also use raw data as part of
the documents. In sciences we are about to collect more information
in next 5 years than has been collected in total up until now.
Information is recorded so you can see where a biologist has been and
what they have done. Biologists used to use notebooks and kept stuff
in archives. They have people who check whether everything is OK.
One database has 70 curators for keeping it up-to-date.
(When Edinburgh AI dept burned down, the electronic lists were in
Google and in the internet archive - but they lost all the paper
stuff.)
As they run queries they build up a web of knowledge that they have
discovered. They need to provide this as evidence later, so need to
record it.
RDF description of their information track. But how do I know whether
the linked-to thing hasn't changed. The data is guaranteed to be the
same. The metadata describing it may change. So build Webs of
relationsships between life science identifiers(LSI). LSI can be used
to ask where do you come from, what it your history? And what is the
relationship between LSIs.
Get event busses between a whole bunch of different information
sources (RSS feeds, data flow resources etc.).
This is all being built in the Grid community. Sem Web offers
techniques fo representing the past. The Grid is developing the
middleware to manage all this.
Calton Pu
Georgia-Navigator.com shows
realtime traffic. Lots of different services: flight paths, river
salinity, real-time supply chains.
Bert Bos, The device-independent browser: CSS and grid layout
Define a grid in CSS3 and put elements into it. Then have alignment
as in table and order-independence (as in positioning).
Can also make grids where elements can overlap with other
elements. (See slide 15).
Bert also mentioned using screen size as a measurement unit, letting font size and placement be in terms of dimensions of the current system's screen. This is very helpful for full-screen displays such as slides shows.
Mark Birbeck (presented by Steven Pemberton), The Semantic Browser:
Improving the User Experience
How can you relate XHTML to the semantic web? Putting RDF into XHTML.
RDF/A was original paper. bnodes still have to be finalised. (name in
XHTML is now property.)
Q Lloyd: since this is obviously RDF, why not just use RDF?
A Steven: Syntax is useful for validating the XHTML. And because it looks like XHTML - which is good for
XHTML creators - but gives you the power
of RDF.
TV Raman, IBM Research, Web
Applications in XML
Dean Jackson, Welcome back browser
He had a cool demo of a number web sites being displayed within the
one page. (Not using frames but CSS I think.)
Client side. SVG Tiny is on 50,000,000 phones!
Questions
Q Lloyd: Are you sure that RDF/A will be remain compatible with RDF?
A Steven: That is one of the design requirements.