16th International World Wide Web Conference
(WWW 2007)
Author: Raphael, Zeljko, Lynda
CWI participants: Raphael, Zeljko, Lynda, Steven, Ivan
# participants: around 1000
Wonderful place for this year Web conference. A nice video (recorded) has shown the history of the web conferences. New record in number of papers submitted!
Some notes about the W3C Advisory Committee Meeting:
http://www.w3.org/2005/Incubator/mmsem/talks/AC2007/
Raphael's slides
This is what the audience got of my talk (from the IRC log)!
MMSEM XG had 37 participants from 15 organizations ... many organizations work on multimedia metadata standards ... we wanted to show the benefits of combining all these metadata formats into RDF ... and provide best practices and use cases for using multimedia on the web ... example: EXIF + low-level feature extraction using MPEG7, stored into flickr, combined with metadata from other standards ... another use case on music: combining MP3 metadata with FOAF ... knowing where your favorite artists are giving concerts ... see panel in W3C track this Thursday ... and Photo Metadata Conference next month ... making a call for a follow-up XG
I had interesting chats with the AC rep of Adobe and Apple for a possible involvment in a future follow-up XG activity. I need to contact IBM (John Smith), Yahoo! (Mor) and Boeing (Mike Uschold) as well
Interesting slides, Tim made a parallel between the Web 2.0 and the
open source paradigm shift. People aren't realizing that Google is one of
biggest Linux apps!
Interesting slide on "Understanding open source" (* modular architecture,
* internet-enabled collaboration, * users as co-contributors, * viral
distribution and marketing).
Key sentence: Data is the next "intel inside". Web 2.0 sites putting up friendly front-ends on existing database, and opening them to user contributions.
One way of looking at the value of the Web -- when you put something on the Web, you get unexpected and serendipitous reuse. With the Web, basic reuse happens by following links. It's a myth to say that we don't need links... google is using the links.
Web 2.0, compared to sem web, gives you _limited_ reuse. As TimO points out, current incentive is for sites to jealously guard data. But that limits reuse. But users will demand data back, and we'll start to see more reuse outside of single sites. When you have more linking among user data (e.g., same person, same protein, same place, etc.) ...you'll get reuse across organization and application boundaries. I want to view all the photos of me, no matter who uploaded them and where. Or I want to reuse information about protein no matter where the data is. We are currently only getting a fraction of of the potential reuse of our data.
Two new groups: WebAPI and Web Applications Formats WGs. WAF WG: the focus is new syntax and new languages. WebAPI WG: Web application programming interfaces - dom and other interface.
On our internal systems and how they are web 2.0 like. The vast majority of our systems are geared towards being collaborative systems for a broad number of users. Some of this data already available in RDF, and through UIs. The challenge we will face is to make more of it available, and to associate people with data they seek. Need to see whether in accordance with our privacy policies. Tim's tabulator is a playpen for browsing data. We are using our own technologies, but also external (non w3c) systems, especially for inspiration.
Kingsley: Perhaps the semantic web is web^3: I think open data is a feasible business model. TimO: "licensed access" rather than "open access" Paul Downey: Some people think about web 2.0 as "rich user experience"; TimO indicates that it's really a business model. Privacy today is like sex + drugs in the 1960s. w3c could help with identity around open data. I might want to mash up my photos with my back account information, but don't want to make the bank info public. I think the next stage is to mix data that is public, but also data that I don't want to be public. TBL: Value of Web is to make life smoother for people (e.g., finding flights, etc.). A world where people have access to all the data they logically and legally can access, is a more powerful one for the user. TimO: look at how the credit card emerged. The credit card is an aggregator (of access to banks). Banks were not made more interoperable; credit card companies are aggregators. Opportunity for aggregation where there is not interop. Those become the new centralization sites. TBL: Yes, they'll be able to do this for bank data, but not for all data. And that bank aggregator will be an element within a much larger Web of data. At the end of the data, we are still going to end up with an economy and then ecology. TG: We need to find ways to delegate authority to other service in the face of aggregation. e.g., w3c wants to make a lot of member-only data to members in RDF. Want to ensure that it remains confidential when used by an aggregation service.
The web2.0 maturity model? http://files.skyscrapr.net/users/jevdemon/YetAnotherAcronym_A1D2/image0_thumb8.png
General impressions:
W4A tries to get together very diverse set of researchers. W4A tries to provide unified view on many diverse areas, and to stimulate discussion among different communities. Although identified as important by many, Web accessibility and general accessibility are not very popular topics. Work in this area is distribued and carried out by small teams. This discussion and influence of accessibility research are still in the early stage.
Particularly important aspect of the conference is that disabled users are directly involved and participate in the conference. For example, two out of three reviewers for speech browsers challenge competition are blind users.
Most useful elements of the conference for me was contact with main researchers in Web accessibility, and contact with disabled users. I definitelly learned new thigs, especially about motivation for my work. I have also had lots of discussion with people from W3C Web Accessibility Initiative (WAI). Some new ideas that I have got include usage of multimedia metadata to improve accessibility of multimedia data on the Web (for example, reusing K-Space annotation for this). This will also be an issue in the next Multimedia Semantics XG.
KEYNOTE: Enabling an Accessible Web 2.0
Becky Gibson
KEYNOTE: Web 2.0: Hype or Happiness?
Mary Zajicek
Communications Paper: Position Paper: Accessible Image File Formats - The Need and the Way
Sandeep Patil
Communications Paper: The National Accessibility Portal: An Accessible Information Sharing Portal for the South African Disability Sector
Louis Coetzee et al
Communications Paper: A Preliminary Usability Evaluation of Strategies for Seeking Online Information with Elderly People
Sergio Sayago
Keynote: Accessibility of Emerging Rich Web Technologies: Web 2.0 and the Semantic Web
Michael Cooper
Technical Paper: Quantitative Metrics for Measuring Web Accessibility
Markel Vigo et al
Keynote: Semantic Web: The Story So Far
Ian Horrocks
Technical Paper: Experimental Evaluation of Usability and Accessibility of Heading Elements
Takayuki Watanabe:
Slides: http://www.w3.org/2007/Talks/0509-www-keynote-tbl/
The keynote has been recorded, we have the video!
My opinion: The first part was interesting and I think that the circles for the web science will be reused a lot. The second part, about the shape of the data and the challenges were general statements and known things, not really interesting. Globally, a good TBL talk because of the first part (I have seen worst from him :-)
Poll: who made 1 WWW conference?, 2?, 3?, 4?, etc. 15?, 16? (only him!)
Analogy with Physics: from the micro phenomenon to the macro phenomenon. This
is all the story of the web. Nice diagram showing how the sum of micro
effects make an emergent phenomenon (6*10^9 users, 10*10^9 web pages). The
macro phenomenon can then be analyzed, which raise issues. Is this what we
want? This is the magics of web science!
Between the lines: I think TBL tried to take some distances with OWL, rules, and all these inference mechanism. For him, the key point is URIs + RDF + HTTP. URIs need to be dereferencable and use the HTTP protocol. The key point is to have linked data ... and the Tabulator! I see that like another shift from the academic view of the KR-SW world. The key question is then why people should expose their data (protein example), what is the added value for them? This is the link incentives!
Shapes of data: lines (tape, cards); matrix (databases); trees (SGML, XML,
OO); Net (internet?, WWW?)
What shape IS the WWW? THe shape is and should a fractal tangle! We should
engineed for that.
Web Science attitude is necessary: problems combine technical and social aspects, and problems have a function of the very large scale (The Web Science is multidisciplinary!)
Question and Answers:
Steven Pemberton: in the circles, every time you end up with spam ... except for the SW. Can we block it now to not have SW spam ? TBL: good question! The study on provenance of triples are fundamental. Trust will be also an important issue. The web is passive, you don't get web spam. You can have scrappy web sites, this is not the same thing. ??: About your view on the shape of the web. TBL: the collective consciousness of the web. There is a common culture here. The working groups are very important for making new technical specifications.
Raphael: I have found that the defensors of the web science have very weak arguments to justify that it is not just about having a slogan (the web is not hype enough?) for getting more money, or like Peter said: an initiative → an institute → an empire!
Zeljko: Conclusion = NO CONCLUSIONS. Lots of philosophical discussion, little compromises in views.
Main questions: What is Web Science? Is it a new discipline or a new name for an old discipline? Is it a genuine academic discipline at all? What is a Web Science methodology? What is the core knowledge set that Web Science practitioners share? What does a Web Science paper look like?
Some interesting comments:
He is Head of Yahoo! Research and Professor in Stanford.
Content: editorial, free, commercial
Audience: consume, enrich, transact
In the middle: AOL, Google, IAC, MSN, NewsCorp, Yahoo! → make buisenes
putting that together
Search on the web: algorithm results = Audience (left side) and
Advertisements = Monetization (rigth side)
Search and content supply: people don't want to search, they want to get
tasks done (e.g: I want to book a vacation in Tuscany)
Information integration: information extraction and schema normalization.
Semantics structure is not easy!
How do we cicumvent?: be incentive.
Statistics about the growth of content on the web. User-generated metadata
(tags=100Mb/day; reviews=around 5Mb/day; ratings=small)
START metadata: Stars, Tags (label for retrieval or sharing), Access, Routing
(community), Text
Example: flickr. No image analysis, but use community phenomenon. Why
millions of users share and tag each others' photographs?
Challenges: How do we use these tags better? How do we cope with spam? What's
the ratings and reputation system? What are the incentive mechanisms? (the
ESP Game! or the Yahoo! questions/answers)
What asignment of incentives leads to good user behavior? What's "good"
user behavior? Good questions, good answers, new questions ...? Whom do you
trust and why?
Grand challenges: How do we retain and enrich participation? (online media
experiences)
"I'm looking for a science that will retain and enrich participation"!
Flashback: HCI and CHI → the science of online audience engagement, not
just about people interacting with computers or the web, but about people
interacting with other people with the web as a medium.
Audience engagement in Second Life is much bigger. Sweden to set up
embassy in Second Life!
What does it mean to have an engaged audience? Who cares? (advertisers, plus
media and users)
New audience metrics? → funny formula but not totally implausible!
Grand challenge (again!): devise and standardize defensible metrics of online
engagement and use these to predictively devise online experiences (not a
substitute of creativity).
Microeconomics meets CS? Talk about how matching ads to query and context (IR), and how to order the ads + pricing on a click through (economics).
A new convergence? Computing meets humanities like never before
(sociology, economics, anthropology, ...).
Conclusion: Vanevar Bush (As we may think) ... quoted sentence.
Where is the SW data and documents? GRDDL is a markup for declaring that an XML document contains
semantics metadata. Microformats (hCal, XFN, hCard). Microformats have limits: can't be validated,
no standard way to get the data out of the HTML, too domain-specific.
GRDDL make the microformats data viewable as SW data.
Four documents produced: Spec, Test Cases, Primer and Use Cases.
Show a lot of use cases that benefit from using GRDDL for extracting the RDF data from the
micro-formats.
Main message: Don't just take benefits of the Web - we also have to take responsibilities.
Christian Halaschek-Wiener: Toward Expressive Syndication on the Web
Web Ontology Language (OWL) for syndication: motivation example to define what is a Risky Company in the financial domain, so that given news information about the products of a company a reasoner can predict the risk or this company and anticipate the auctions.
OWL-Based Syndication framework: OWL reasoning is hard and static (consistency of the entire KB
needs to be rechecked), so how to make this practical?
Recent work on incremental consistency checking under instance updates.
Simple problem for the query/answering: reduce the portion of KB that must be considered for a
query given an update.
David Huynh: Exhibit: Light-weight Structured Data Publishing
OK, it is a cool presentation from the PiggyBank MIT guy. You just add more attributes in your HTML, and thanks to some cool JSON/AJAX stuff, you display cool stuff on your web page (calendar, timeline, maps, facetted browser, etc.) → Exhibit: a new micro-format?. It pretends to address the publishing needs and desire of the persons who want to publish structured semantic web data on sophisticated interfaces ... BUT, no evaluation whatsoever that 1/ there are user needs for such functionalities and 2/ Exhibit actually address these needs.
Some posters I have found interesting: