WWW 2005, Chiba, Japan

This Year's Conference in General

Some in the organization for the series as a whole felt this year was a bit under-managed. They gathered a committee later than usual, for example. This varying in organizational quality happens in all series: each year has a chair with much more authority and responsibility than anyone management the long run, and the quality of this person varies from year to year. New York was also under-managed, though probably more so. Fortunately, this year's dinner was only awkward compared to last year's disaster. In general, the conference seemed to go well. But there wasn't the "gezellig" buzz of Budapest 2003 or Hawaii 2003.

Furthermore, as is usual for this series, the papers weren't as engaging as the panels and W3C track. The paper track research fields are so broad that each researcher attendee is involved in only a few of them. The panels and tracks, on the other hand, are about the underlying infrastructure, both technical and conceptual, of the work all attendees do. The panel and W3C track thus each got a big room downstairs, while the tracks each got one of four small rooms upstairs.

However, the submissions this year were high, and the quality of the papers seemed good, so the quality of the series as a whole is riding out a pair of slightly off years. Furthermore, next year's conference in Edinburgh sits in the capable hands of Southampton, with Les Carr and David DeRoure as general co-chairs. They gave WWW2006 a strong presence at the conference. The track chairs are all already assigned by very capable Program Chair Carole Goble and we already started meeting with each other at the conference. And Les walked around the whole time in a kilt, giving away free whisky to luckily selected pre-registrants for WWW2006 (yes, you can pre-register now, but the whisky is "op"). Look's like it will be high quality fun next year.

International Cross-Disciplinary Workshop on Web Accessibility

The workshop is a collection of people who want to make the (current) web accessible rather than a bunch of researchers trying to carve out a new research area. It feels more like a proto-working group discussion. (This is not necessarily bad, but should change the perspective with which the papers in the workshop proceedings should be viewed.) There are two things I want. The first is web pages that are accessible (which is what the workshop participants want). The second is to be able to incorporate the knowledge in the WCAG guidelines into (our) web page generation processes.

Interdependent Components of Web Accessibility

Don't educate the content provider's, but the tool builders. http://www.w3.org/WAI/intro/wcag.html WCAG 1.0 is official, but WCAG 2.0 is a working draft. Macromedia are working on putting this into Flash. SVG and SMIL need to be updated. Trying for recommendation this year - so trying to stay focussed. Mentioned Flickr as a site using flash but not being accessible (yet).
Chatted to Wendy Chisholm about what I/we can do by directing resources rather than dedicating them. Source of funding for this? (CHIP) Semantic web. NL Net?

Is Accessible Design A Myth?

Afternoon keynote Eric Meyer. Can you have accessible design? Humans are very visually oriented and many designs use images. CSS Zen Garden shows different way out designs on the same content. Screen-scraping a web page and rendering the (visual) presentation audibly is deeply flawed. Documents are becoming semantic and structural = accessible. If it is audio browsing then forget about how it looks - it is audio. (Amazon and E-bay are still all visual based.) Screen readers should become audio browsers. Try putting a DOCTYPE in - if there is a DOCTYPE then use standards mode and otherwise use backward compatibility ("crutch") mode. Need some form of audio styling for an audio browser. In summary, there is tension between accessibility and visual design. But this is a not a huge amount of tension. We need to build audio browsers so that we need audio styling (voice ML didn't go anywhere), but there is a newer Speech Synthesis Markup Language (SSML) Version 1.0. More at "Voice Browser" Activity.

SemanticWeb Enabled Web Accessibility Evaluation Tools

EARL - Evaluation and Report Language. It would be interesting to talk to Shadi Abou-Zahra, but unfortunately he couldn't make it. Wendy Chisholm is working fairly closely with him - I chatted to her during lunch. Shadi is chair of the Evaluation and Repair Tools WG.
Designers, content authors, programmers, managers, evaluators. Boils down to needing machine-readable syntax for test results to allow tools to check pages automatically. Builds on RDF. Purpose is generic quality assurance. EARL is to use SemWeb technology for testing existing pages - we want it the other way round, to use the same info to generate pages. OWL gives them more options. Their problems (e.g., describing location for test results; improving persistence of test reports) are competely different from our own.
(Hey, cute page. It is an interface to itself. http://www.w3.org/Talks/. Mind you, I can't find the slides for this talk :-( . They will be put on the W3C slides site. )

Dinner Tuesday

Wednesday

Finalists for best paper award: Algorithmic Detection of Semantic Similarity; Sampling Search-Engine Results; Three-Level Caching for Efficient Query Processing in Large Web Search Engines; G-ToPSS: Fast Filtering of Graph-based Metadata.

Tim Berner's-Lee, Web for real people

Writer and reader battle over the communication space. Style sheets are battle ground. Send html mail in size just a little bit smaller than the reader wants (default is -1). There is nothing like plain old text!
(Slide 5) Web works because of expected reuse of information. Need to preserve balance between keeping the original intent of the content.
(Slide 8) What do people hate about the internet? Spam. What do they hate about the Web? Pop-ups. But whose problem is it? Most spam uses html. Phishing is costing banks a huge amount of money at the moment.
(Slide 10) Causes are user executing untrusted code, confusion between code and data by users, browser software and operating system.
(Slide 13) Safe languages are: Declarative, Visible (REST); Not turing complete (scripts); Maybe not as expressive as first order logic (logic); Have a standard meaning; Separation of form and content.
(Slide 21) Great picture of mobile phones.
(Slide 22) Really brings home the difference in numbers between PC access to the Web and mobile access.
(Slide 29) The Mobile Web Initiative has just started.

Panel: Can the Semantic Web be Made to Flourish?

How can we encourage growth of the Sem Web? Open source software: Making tools to make tools, components, toolkits. Commercialised implementations: Making centralized...
Where is the Web in the Semantic Web? Desired features: distributed, open world, data manipuation by others' machines. Best candidates: FOAF...
Q1 How would you characterize today's Sem Web?
Jim Hendler: What growth are we seeing? Why are we seeing it? How can we encourage it? Numbers of RDF documents are increasing optimistically.
Zavisa Bjelogrlic (he had a paper at ISWC in 2003): Important to move from initial academic applications to real applications.
Bernadette Hyland: Founder/CEO of Sem Web company. Still at stage of engineers developing good toolkits. Research scientists and developers are current champions within their institutes.
Kanzaki Masahide: People mean different things by Sem Web. Sem Web is a concept, not something to achieve. Advanced search on not necessarily RDF data. Many applications use own vocabularies.
Zavisa Bjelogrlic: slides. Is there a viral model (slide 3) for spreading the Sem Web? Context - why should a sustainable business or cooperation get involved? Who puts in investment and who gets return? Model can be complex and there may be a long time lag for return. Is there a contradiction about (global) "knowledge" and "sharing"? [[He's going too fast for my note-taking :-( ]] Chinglish (web site open in a month), Sem Web based web site. Sem Web is between Web and Semantics.
Jim Hendler: Too much effort is going into the business space rather than the personal space. Need to create the information space that commerce will come to. E.g. Amazon exploited the newly accessible collection of users.
Bernadette Hyland: Thomas L. Friedman book "The World is Flat". Young people doing web sites and what this has meant to large companies.
Kanzaki Masahide: Semantic Web should be easy, fun and useful - for both users and developers.
Q from audience. What will the Sem Web give me - as a developer familiar with Web tools. A The panel is not going to address what the Sem Web is, but plenty of time at end of panel session. (This was a polite but firm statement that the panel would not debate the uses/utility of the Semantic Web (which has been done at least at WWW10 (Hong Kong) and WWW 2004 (Budapest) - by yours truly as it happens :-) ), but move on from these issues.
Moderator: What does the Semantic Web require to become mainstream?
Zavisa Bjelogrlic: Lower barriers for people to enter. Need to reach more than above average programmers and engineers. Start from something people know well.
Jim Hendler: Oracle has RDF support, Adobe embedding RDF into documents. The Q is not how to get this thing started, but how do we get it running. The larger the company the longer it waits to deploy new technology.
Kanzaki Masahide: There is existing metadata. We need to help people use this existing metadata.
Q How will the Sem Web become mainstream?
Bernadette Hyland: Firm committment into limited deployment. [My impression of the projects she is listing is that it is behind uptake in Europe - but this is based on get-feel and not on fact... Talk to Carole and FrankvH is probably the right answer.] Fine research, incredibly valuable.
Jim Hendler: We need to get functionality out there, the end user won't see the Sem Web. They know google works better than the old search engine. The real question is how do we motivate people to use more of it.
Kanzaki Masahide: Delicious and Flickr use metadata. One specific domain. No utlisation of metadata. Semantic Web metadata portal. [I'm afraid he talks like I think - lots of related topics but no overall narrative.]
Zavisa Bjelogrlic: Using a lot of open source. Need common low-level tools. Examples to explain why it is an interesting application. [Didn't quite get the train of thought here.]
Q from audience: (From user perspective.) One set of bookmarks? One address book? These are current frustrations. Problems are currently solved individually on different platforms. We need to know what the user wants. [This is where I vehemently disagree. We are at the stage with the Semantic Web that we were with hypertext in 1987 - we had some bits of technology and understood they were valuable. Asking a user would not have resulted in "I'd like the Web please". Reading Bush (V., not G. :-) ) or Nelson was a much better idea. We can ask the users where they would like the gear lever when we have the gearbox sorted out.</rant>]
Jim Hendler: We are ready to move from tools for builders to tools for users. But not shrink-wrapped yet. Need to do transition from the first tools. [Their Sem Web driven web site looks like it is Sem Web driven. Ours looks pretty normal.]
Nigel Shadbolt: We can solve medium-sized applications. But what about "real web scale"? What can you tell people?

[Actually, now I think about it, where was Steven with RDF/A? One of the ways to feed the semantic web is to have the web page developers add in (lots of) little bits of RDF in disguisein XHTML pages. At this stage of the conference I wasn't thinking RDF/A...]

16:30 Semantic Web papers

Marta Sabou, Learning domain ontologies for Web service descriptions: an experiment in bioinformatics

Describe functionalities of services provided. Example domain is bioscience. Domain expert built 550 concepts during 4 months. They used only 125 concepts, so only 23% of learned ontology. Then number of services jumped to 600. Goal is to support domain experts to learn ontology in less time.
Extraction method for ontology learning. Identify nouns and verbs and give relationship between them. In biology domain there is a lot of composition, so relatively easy to apply this method. First experiment gave "bad" results. The problem was that there were concepts in the gold standard that could not be extracted from the corpus. Evaluation 5 "I wish I had that" - concepts that were found that he had missed.
Nice summary slide: Broad coverage domains are important but hard to build. Textual descriptions are good sources to extract descriptions from. DO's can be semi-automatically learned. Semi-automatically learned ontologies are suitable for semantic WS descriptions.

Lloyd, Jacco and Lynda, Making RDF Presentable: Integrating Global and Local Semantic Web Browsing

Hard to tell when you're the one presenting, but I got the feeling we made it clear throughout the talk what our thesis was: that the Semantic Web can serve as a repository for knowledge and media conveying it that can be rendered to helpful document-based presentations. The talk discussed both the paper's context of Semantic Browsing and it's focus of hierarchy generation from search returns.

We submitted this paper to the "UI and Browsers" track, and wrote it and presented it as "how hypermedia can use the Semantic Web". However, resolving program scheduling constraints put this paper in the "Semantic Web misc." session, simply labelled as "Semantic Web". The audience was resultingly largely semantic, and saw the paper more as "how the semantic web can use hypermedia", which is understandable given the paper's title.

In this context, Jeremy Carroll asked at the end of the talk how we propose to display blank nodes. I had to admit I had no idea what blank nodes were, which was unfortunate because the audience had several hard-core Semantic Web programmers who were aware of them as a rather fundamental construct in RDF and were quite interested in the question posed. Furthermore, the issue of how to present them has received discussion from semantic developers lately, making the question quite appropriate for the paper and talk from the Semantic Web perspective. Oops.

Jacco then brushed up a bit on blank nodes and guided the three of us in investigating further. A blank node is simply an element serving as an object in a triple that (a) has no identifier and (b) serves as a group for property assigments for the unidentified object. You can assign them properties just like any other resource. Them not having identifiers means (I conclude, perhaps incorrectly) that each blank node can serve as the object of one triple: the one the element defining it is the child element of. We still haven't figured out what they're good for. My current best guess is that is helps authors of RDF how encounter a need for an object that the RDFS doesn't provide for. It so, it is an author's work-around for unforesightful ontology design. There's probably more to it, though.

However, dispite our previous ignorance, Noadster would actually handle blank nodes quite well. The typical browsing problem with blank nodes is that, lacking identifiers, it is hard to give them referential text. However, Sesame automatically assigns unigue strings to each blank node and passed them back to Noadster just like resource URI's, meaning Noadster would display such strings as it would URI's. Better still, blank nodes have properties just like other nodes do. This means that giving a blank node an <rdfs:label> means that text becomes its title, meaning the user sees no difference between displays involving blank nodes and those that don't.

Daniel Schwabe came up to me (Lynda) afterwards and warned me that Lloyd's desire to make the Semantic Web presentable was so convincing that we need to be careful to acknowledge that the Semantic Web can also process machine-readable information!

Thursday

Keynote Eric A. Brewer, The Case for Technology for Developing Regions

I thoroughly enjoyed this talk. It is the first time I have seen a connection made between research-level computing science and providing useful aid in the developing world. If I can work out how to do the same... (accessibility helps those closer as well).

3-4 billion people, purchasing power $2 per day, could grow to 6-8 billion in next 25 years. Set up companies to be self-sustaining. Fixing one village doesn't help. Needs to work on a large scale. Good example is eradicating river blindness in West Africa. Carried by mosquitos. Put in a sensor network and find where the larvae breed. Then spray targetted areas. 30,000,000 were protected from infection. Freed up 100,000 square miles of land - capable of feeding 17,000,000 people.
Other good examples, using computers in primary teaching. Often in developing countries the teachers are not that familiar with the material. Attendance rates at schools with computers are higher (but note that you need the teachers as well).
Being poor is expensive. Water, medicine and credit are very expensive. The distribution systems are not there. Technology pays for itself by making things more efficient (e.g. cell phone uptake). Even very poor people have a disposable income. TV/radio access, pressure cooker. 7% rural income in Bangledesh is spent on telephony. Lots of money comes from foreign relatives - and telephones help coordinate this.
Micro credit is a big enabler. Grameen Bank 1976 started by Mohammed Yunus(?). 2.6 mln borrowers (95%) over 1,000 branches in 42,000 villages. 12,000 staff. Mothers use the money reliably for the children. (Men are worse borrowers.) US$ 3.9 B loaned since inception. Repaid with 98,75% recovery. rate. Has never accepted any charity. 46.5% borrowers have crossed the poverty line. Most loans go to people who have had loans in the past - i.e. they have learned how to create more wealth from a loan.
One idea was to have a village phone. 95,000 loans of US$200 to buy a mobile phone per village. The phone owner ("she") charges users per minute of use. This scales and the loan taker maintains the system since her income depends on it.
Aravind eye hospital group does cataract surgery for US$10. $2 to make own lens, $3 for the surgeon and $5 profit. 7 surgeries per hour. Statistically safer and cheaper to get surgery done there than in home country. Why are they good at this? They do a factor 10 more cases and cases are much worse. They make their own lenses for $2. 200,000 surgeries in a year. 2mln patients.
Solution has to do with sharing computers. TIER: Technology and Infrastructure for Developing Regions. NSF 5 year grant.
Need to develop computers that are less sensitive to stable power supply. Video link to hospital to give easier access to doctor. Also want to free up doctor's time to do more surgery. They can pay $2,000 for connectivity, and it costs only $1,000. Work on getting connectivity set up. Need to build towers and get power and set them up properly.
Make better use of speech recognition. Each word said by different speakers. 98% recognition accuracy - not as good as the best, bur good enough. So can speak to device in your own language (in this case Tamil). Run as parallel processor running at 4MHz so power consumption is very low. (No US students know anything about power systems these days!)
Tsunami disaster. Broadcast over loud speakers what the weather is. Side effect of system, people could be told to get off the beach. A different village got a warning and they were able to clear the beach, but there was no further information dissemination.

Lorrie Cranor, Towards Usable Web Privacy and Security

Security should be easy for the end user. One new idea is to have people remember not passwords, but pass-pictures. E.g. show picture on lake and user has to click on door, window and tree.
The wall of sheep! Wireless networks are not secure and your passwords can be sucked off with comparative ease. (After the talk I installed my Windows updates... Baaah :-) )

Her talks are online, but at the time of the talk the slides - nice style! - were not available.

W3C track: Semantic Web Activity: Query and Best Practices

Friday

Rob Glaser, RealNetworks, Real and the Future of Digital Media

Open source, Helix community. Started 3 years ago for communal building of media player software.
Harmony is about digital music and interoperability. Apple, Microsoft are going down route of proprietary software. Harmony gives meta DRM (digital rights management), where content is downloaded in a format which can then be transcoded/transcripted to one of a large range of devices. Real is committed to interoperability.
Music subscribers, (Rhapsody?) listeners average around 200 tracks a month (8 per day). If you listen to fewer than 25 a month then you don't need to become a subscriber.
Long term interoperability is crucial in DRM. Don't know how to open source the DRM. There is strong cryptography in system, so difficult to know how to open source it.
Lloyd asked why Real had resigned from the W3C last month. Answer was that Real is right behind W3C and will rejoin real soon now.

Real formally notified the W3C in February that they would not renew their contract with the W3C when it ended at the end of March. Rob Glaser said in his answer that Real not renewing was more of a neglegent oversight than a company-wide change of vision. Based on my conversations with W3C staff after the keynote, there may be some truth to this. Often W3C membership is managed by lower-level divisions by the company head, and thus often division leaders struggling with small budgets opt out of the large company fee to balance books without fully consulting the directorship. But clearly there's bitterness in Real toward W3C membership in at least some managerial levels.

I only knew of Real's withdrawal from a one-line offical SYMM posting from the W3XC staff representative. At the keynote I confirmed it by seeing that Real was no longer listed as a member at the W3C website. However, much Googling during the keynote found no other mention of Real's withdrawal from the W3C. Judging also by how many heads in the audience suddenly perked up and turned my way when I asked the question, it was a big surprise to most in the audience, especially after Rob's very pro-standards keynote. Rob got applause from the audience with his "prodigal son returns" answer.

Based on conversations afterward, the W3C staff were clearly (and not surprisingly) pleased by the question, which they of course couldn't ask, but certainly gave them leverage when W3C staff Steve Bratt had lunch with Rob after the keynote to discuss renewing. I caught up with Steve after that, at which he discussed what I mentioned above about lower-level management bailing out of W3C only to have upper-level management be surprised and reverse the decision. Steve added that, as with Real in this case, W3C attempts to follow up with upper level management are often slowed by general inavailability of and lack of W3C access to the directors. The impending keynote gave W3C an opportunity to discuss the withdrawal directly with Rob, who told Steve then, as he told the audience, that it was a lower level decision that caught him by surprise and he would reverse.

However, other W3C staff, not sharing Steve's need to put a happy membership spin on things, said that Real's letter of notification of non-renewal included as an explanation Real's disappointment of how SMIL and SVG were handled. The staff was surprised by the mention of SVG because Real was only marginally involved in its development, and only marginally implemented it. Concerns about SMIL from various members are more widely known, but these are generally considered to be no greater than with other standards and from other companies: the W3C is a consensus process in which no one gets exactly what they envisioned when starting. Thus Real's objections were considered by the W3C staff as, for lack of a better word, childish ... or at least "not consensus-oriented".

My guess is that Rob Glaser is no W3C angel back at Real, but was not completely behind the withdrawal. Someone who was both in charge of a division-level budget and who worked directly enough in W3C working groups to have a grudge made and executed the decision, combining financial frugality with venting frustration. All parties at Real then got together and decided that, in the broader sense, involvement with W3C balanced out with more benefit than petty technical and political frustrations.

Panel: Querying the Past, Present, and Future: Where the Web is and where the future Web will be

Two models for accessing the Web: passive ad hoc, vs active and event-based. Web actively respond and report real-time train schedules, or films showing near the hotel I am staying in.

(?)
Do we need a web time machine? Yes, for both past and future. Querying the past - Google and Yahoo could collect all the pages they index. Way back machine in the internet archive is pretty good too. But there is no single snapshot of the Web at a particular moment in time. [My thought is - so what?] In many cases the shift of content of links doesn't matter, but sometimes it might. Privacy problems in monitoring changes in pages in a timely fashion. Can query past/future of the Web but to a limited extent.

Andrei Z. Broder
The Web decays and the temporality of links. People expect that a web page is up-to-date unless stated otherwise explicitly. But this is not true for many pages. Humans are better at recognising patterns, so we can see better that pages are out of date. Machines can look at last modified date.

Carole Goble
Life sciences, encode genome information and need to be able to query it. Every resource is on the Web. (They keep old (decaying) web pages - since it is useful to know what they used to know.) They also use raw data as part of the documents. In sciences we are about to collect more information in next 5 years than has been collected in total up until now. Information is recorded so you can see where a biologist has been and what they have done. Biologists used to use notebooks and kept stuff in archives. They have people who check whether everything is OK. One database has 70 curators for keeping it up-to-date. (When Edinburgh AI dept burned down, the electronic lists were in Google and in the internet archive - but they lost all the paper stuff.) As they run queries they build up a web of knowledge that they have discovered. They need to provide this as evidence later, so need to record it. RDF description of their information track. But how do I know whether the linked-to thing hasn't changed. The data is guaranteed to be the same. The metadata describing it may change. So build Webs of relationsships between life science identifiers(LSI). LSI can be used to ask where do you come from, what it your history? And what is the relationship between LSIs. Get event busses between a whole bunch of different information sources (RSS feeds, data flow resources etc.). This is all being built in the Grid community. Sem Web offers techniques fo representing the past. The Grid is developing the middleware to manage all this.

Calton Pu
Georgia-Navigator.com shows realtime traffic. Lots of different services: flight paths, river salinity, real-time supply chains.

W3C track: Interaction and the Web: The Future Browser

Bert Bos, The device-independent browser: CSS and grid layout
Define a grid in CSS3 and put elements into it. Then have alignment as in table and order-independence (as in positioning). Can also make grids where elements can overlap with other elements. (See slide 15).

Bert also mentioned using screen size as a measurement unit, letting font size and placement be in terms of dimensions of the current system's screen. This is very helpful for full-screen displays such as slides shows.

Mark Birbeck (presented by Steven Pemberton), The Semantic Browser: Improving the User Experience
How can you relate XHTML to the semantic web? Putting RDF into XHTML. RDF/A was original paper. bnodes still have to be finalised. (name in XHTML is now property.)
Q Lloyd: since this is obviously RDF, why not just use RDF? A Steven: Syntax is useful for validating the XHTML. And because it looks like XHTML - which is good for XHTML creators - but gives you the power of RDF.

Dean Jackson, Welcome back browser
He had a cool demo of a number web sites being displayed within the one page. (Not using frames but CSS I think.)
Client side. SVG Tiny is on 50,000,000 phones!

Questions
Q Lloyd: Are you sure that RDF/A will be remain compatible with RDF? A Steven: That is one of the design requirements.

WWW 2005, Chiba Japan, 10-14 May 2005