15th International World Wide Web Conference (WWW 2006)
Author: Lynda, Raphael
CWI participants: Lynda, Jacco, Raphael, Steven, Ivan
# participants: 1200
Raphael has found a very well organized conference despite the huge number of participants (1200 ?). Definitively, the place to be! As usual, many parallel events, where one can make his own market. The interesting things happen during the breaks, dinner and while digging around for talking with people.
I think a very successful workshop. The talks were generally good. The poster and demo session,
completely improvised, very nice. New potential members for the XG: CNR in Pisa (Italy), Southampton,
Oracle (confirmed), Yahoo?, etc. The co-chairs have a lot of work to invite formally all these guys :-)
Proceedings and Slides of the workshop will be posted on:
http://multimedia.semanticweb.org/
A nice joint dinner afterwards with the
Collaborative Web Tagging workshop. The best papers of the
two workshops should appear in the
International Journal of Web Semantics.
Nigel Shadbolt (deputy president BCS) Jim Hendler, Tim B-L, Clare Hart, Richard Benjamins Nigel: 5 years since Sci Am Sem Web article. Tim - 1989 web invented Richard - National award 2005 CS Spain, prof Tech Uni Madrid What have been real achievements toward Sem Web? TBL: layer cake: RDF, ontology, query and rules We have been making our way through this. RDF OK, OWL OK. SPARQL is candidate recommendation - implementation feedback welcome. Starting to work on rules. SPARQL make huge difference. Jim: 10 year vision. Biggest surprise that pieces fell together sooner than expected. Underestimated deployment scale. DAML workshop - think of 10exp6 as small. Think of 10exp9 triples as routine. Graphs with URIs organised vocabularies for science. Richard - SW is enabling technology, not business. Big step in adopting technology, risks? Costs? Benefits? People feel that the risk is limited now, with OWL. Clare - Sem Web can help in analysing. Untapped resources - structured and unstructured information. Build apps on top of them? Nigel: Ontologies and taxonomies are necessary - are they usable? Clare: useful w.r.t unstructured content. IBM is found millions of times in news articles, but not all articles are about IBM. TBL - Database columns don't change very often - that there are people in working groups. But people are in the working groups. Their details can be included using FOAF. Taxonomies are one side. Infrastructure ontologies, no excuse not to do them right away. The value is in the data. Other taxonomies have value in themselves. Jim: Getting started in important area. "Messy world" "cleaning up Tim's messayh world". Subway map, different maps of SW info. Ontology helps humans to find things. In Web there wasn't a single browser, didn't have bookmarks. Then there was a search engine - and could find stuff. Fewer words in human language than pages on the Web. Search databases age person between 10-20. Ability to locally organise to find way down into data. Find way from this rich space, through data to this rich space and find things I didn't know. Richard - Industry see ontologies as necessary evil to enjoy good things of Sem Web. Ontologies are do-able, manageable and affordable. What are cost factors for creating ontologies? Complexity of ontology? Access to experts? Nigel - Web 2.0. How relate to SW? Tim - Ajax and tagging, and people tag with Ajax. Bunch of standards implemented in browsers. Using HTML and SVG model as UI. Power of HTML formatting engine rather than pixels and stuff. Tim writes Ajax using RDF. Access web of RDF data to user. [[US! :-) AJAX -> AJAR (RDF instead of XML) ]] No-one expected keyword search to work. Keyword index everything. Mis-used keywords, not controlled vocabulary. Everyone excited about tagging - will it end up like the old keywords. Tags don't have well-defined meaning that bit by bit get well-defined meaning. Maybe Web 2.0 becomes big mess of rather unreliable stuff. Clare - Web 2.0 designed around user. Easy access to info. Applications smart enough to understand user. Business can't take concept of folksonomy and use it. Web 2.0 encourage notion of individual tagging, simplicity. Technology underlying driver - consistency of codes - mine is different from yours, but could be mapped together. Jim - fan of Web 2.0 tagging stuff. Other direction really amazing. Take CYC tag. |Go to flickr (Yahoo) type word cat and at the top is a URI. So here are the pictures of this concept. He can put in concept and have people tagging pictures of it. Netting/web part/linking really starts to make sense in stuff. Links don't go just one way. Nigel - achieve critical mass of sem annotated content? Tim - Sem ann waiting for everyone to go and edit their files in notepad and put in the tags. SO sem annotation is effectivfe for whole database. Nigel - exposing data with well-defined semantics. [did I get the mesg here?] Tim - file format that already exists. This column corresponds to this part of exisiting public/friend's ontology. Browse sem web then get to dead end - email address, social security number. For a little bit of input - FOAF mailbox. Now when look at display lots more properties have been added, everything begins to connect up. Richard - Lot of investment in past. NL technology to auto find what piece of content belongs to particular class. Trade-off. Millions of concepts - hard to classify. 100 or fewer then software do good job. SW to knowledge mgt. Libraries or cultural heritage. Culture of annotation. Take advantage of that. Clare - strategic view of info, what relationships are, people tagging but technology has to be driven of annotation. Too many categories is same as searching full text. How to organise concepts and relink them. Jim - google "person ext:owl" 500 different definitions of person. Heuristic. Text similarity 800 definitions Some pointing at each other. What is the right search engine in ontology space? How do I provide one link to start up connections. TBL - huge ontology too difficult to classify things. Small not enough richness. Clare - million company codes. Subjects 500. Regions, 200. Languages 50 Ontologies, more sophisticated company, partners, suppliers, Exon Mobile. Multiple connections to it. Wrapper ontology relation. Nigel - Trust content? Conclusions arrived at? TBL - Every agent is aware of where the data is from. Basic RDF graph. But when you have an RDF graph and present results then why do I think I believe this? Triple have a fourth part (where it came from) "oh yeah?" Why do you know this? RDF stores can find whether it was derived or from a particular source. Not just inference rules about data, but where they came from. Nigel - Trust live in machanise rule-based systems. Email. Did the email come from Tim? And should I do what it says? Alan Bundy - What are killer applications? Jim - Killer application of Sem Web. Web is killer app of internet. Health care and life science. DB community coming to work very closely with sem comm on web stuff using SW things. Lily company, large pharmaceutical company. Oracle RDF support for mgt review in clinical trial databases. Clare - enterprise, individuals TBL - Advantage of sem web data is all the other data on the sem web. Take database maintainer to lunch, ask what the columns mean, and write down a light-weight ontology. Allow SPARQL query across all data. Clare - analyse how much time people spend on searching for information. People should be analysing the information. The search time is currently increasing. Sem Web should reduce the search time. Q: Search engines use sem web? TBL - Search engines make order out of chaos. If you give then order then they can't make any money. Clare - Exon Mobile, BP. Can't find the information you really want. Richard - web search and search within companies. Semantic search engines for islands of the web. TBL - Search engines have oo in their names - also ontology? :-) Jim - You ain't seen nothing yet.
Peter expose the pro and cons of the two paradigms (Datalog vs classical logical) for the Semantic Web.
Domain of age is person. => integrity constraints vs restrictions John and Jack are friends of Bill => How do you know names are distinct? Incomplete vs complete info How do paradigms fit sem web? Classical paradigm vs datalog paradigm Always narrow down interpretations. John is friend of Bill and John is not a friend of Bill throws away everything. Checking all these acceptable interpretations. Datalog. Extensions of notions from databases. different names different objects, all objects named Info not in situation is false. Can be generated from facts and rules.Datalog:
When an ontology is getting really big (like the medical ontologies) it is not possible to use the classical OWL reasoner we all know. General problem: how to deal with big ontologies/thesaurus ? Thus, really relevant for MultimediaN. The solution proposed in this paper is then to make a partitioning of the ontology. How ? I have not understand the whole process, but it seems that the way they proceed does not guarantee any quality. They reduce each pieces until they can be computed, but you have no clue about what you're losing in terms of inferences during the approximation phase.
As technologies get bigger, get more difficult to use. SNOMED-CT 364,000 concepts Partitioning Start at one concept in ontology and extract out of main ontology. Ontology in own right. Goes down concept and selects all children and descendants. and all ancestor concepts. Also all cross-links GALEN complicated and large. heart segment -> 6000 from 24000. Then add boundary limit. Modified definitions -> unexpected inferences. Classification time. Segmentation by traversal can be done semi-automatically. [Implications for mapping ontologies?
A new algorithm (V-Doc) developped in Falcon, one of the most performant ontology alignment tool. Nothing really new, I wonder why this paper has been accepted. Actually, after checking, the result they have shown are less good that the ones they obtained last year on the same data ... See http://xobjects.seu.edu.cn/project/falcon/.
See the short summary of each paper on the Developers Track page
Omar Alonso, Oracle, USA. He isn't really interested in the user interface side, but had some different viewing methods (of DBLP - interesting in and of itself). (A potential participant for the SWUIG workshop at ISWC.)
Lynda: Very close to our work on browsing RDF repositories - a sort of Noadster-/facet system. (Definite potential participant for SWUIG workshop.)
Ontology Construction: how to build new ontologies, not starting from scratch, but reusing some existing pieces. Current tools (Protégé, Swoop, KAON) are not made for reusing existing ontologies. But more ontologies (libraries) are coming online.
Example: let's assume that one needs to represent the concept "Conference". Swoogle returns 115 ontologies with this concept! How to rank the different ontologies ? With what criteria ? The system needs to compare and merge different ontologies. Proposition of a whole process consisting in: search for relevant ontologies, rank the returned list of ontologies, compare them segmentate them, merge them, and show the result to the user and loop if necessary. The user relevance feedback should not be ignored!
Challenge: make a full chain of these processes. Danger of producing a Frankensteined ontology.
More ontologies are coming online, and many people sweated over those ontologies. Time to start
planning for proper reuse!
Questions: as usual, people complain that ontology editors are made by computer scientists, ontology-aware, and are not user friendly (which is true!). Should the ontologies be expressed in the same KR language ? Of course an issue ... Last good question: an ontology is being in a given context, and reusing pieces of an ontology means put these pieces into a different context, but the modeling of this context is not present in the picture proposed ! The speaker agrees ...
Glose: an auxiliary informal (but controlled) account for the commonsense perception of umans of the intented meaning of a linguistic term. Proposition of semi-formalized the way the gloss sould be written in order to make some inferences. Looks really like a formalization of the differential principes in DOE.
A wide variety of schemas are exposed on the web. They convey a clear meaning to humans but not for machines since it is not formally and explicitely represented. The paper proposes to automatically explicit the intented semantics of the various schemas, mainly by gessing the semantics of the links between the concepts. The approach uses mainly WordNet and begins with a traditional schema matching task. At the end, this work aims to "lexicalize" an ontology, in order to close the gap between structural and intended meaning. The tool is CtxMatch 2.0.
Web search scenario: sparse user trust ans distrust. Content trust is a judgment on the veracity of a specific piece of information. Many factors affect trust: topic, context, age, popularity, recommendation, expertise, appearance, provenance, direct experience, related resources, etc.
Challenge 1: acquiring and representing trust.
Challenge 2: using trust relationships.
Presentation of a model and simulations that take into account how a user can trust a web content.
Using Wikipedia means reading articles. Inconsistencies different language versions: the population
of a country differs depending if you look at the English, French or German version of Wikipedia :-)
Solution: marry Wikipedia and the Semantic Web.
Proposition: mark up the wikipedia articles with extra information. The idea is to put a semantic label for marking the links between the wikipedia pages. The wiki concepts are mapped in OWL: a Relation in wikipedia is an ObjectProperty, an attribute in Wikipedia is a DatatypeProperty and a Category in wikipedia is a OWLClass. RDF export at the end.
Benefits for wikipedian: ask for information, maintain more easily pages, etc.
Already some users of the system:
Semantic Media Wiki
Future work: performance and scability !
More expressivity: transitivity, inverse relationship, symmetric, etc.
Alexander Linden Semantic technologies still hugely underestimated potential. Metadata is expensive, can be about everything, must be machine readable. (I didn't find his talk enlightening. A long complaint as to why the Sem Web is so difficult to explain to industry. It didn't help give insights for industrial people to improve things, nor to academics to steer how they do their research. He did call for better dissemination of best practise, but no mention of the Best Practices group).
Ian Ritchie Perspective from invesment community point of view. 1984 OWL. Director Scottish Enterprise 199-2005. Cartoon of investors (shake a tree, get $4,000,000, go public, go find a tree). VC-backed businesses, 10% US economy. Why is VC interested in hi-tech? Innovation is difficult to manage, but easier in a small start-up team. Crossing the Chasm. 1990 Geoffrey Moore There is a break between the visionaries and pragmatists. Early adopter is full of enthusiasts. Pragmatists are commercially driven and need to know there is a market. Chasm needs fund raising. Angels, grants, loans, venture capital sources. Marketing and development is much more expensive. Angel investor UKL 10,000 - 100,000. Bring more than money to the table. Gregory's girl - he's a bit slow and awkward. That's like VC's. Angels are quite fast. VC's don't like high risk. They need return on investment. Of 10 investments: 3 will fail, 5 will do OK, 1 will do well and 1 will do really well. Mountains of data "are making it so people can't act on it without being overwhelmed". Bill Gates May 17th 2006. Web 2.0 and Sem Web. Need to get the business plan right. Business plans are worst written documents on the planet. Potential of business. Builds confidence in team. Plan will not match performance. There is a great new growth market out there. Niche player can sell well. Has to be readable. 5 pages, colourful charts... Write it yourself. Don't get consultants to do it. Do financial projections yourself - you need to understand them. Do I want to be in trouble with these people in next 5, 10, 15 years. Honesty and integrity. Shorter plans are better. Say 10 pages. Exec summary 1 page. Whole story including money to be raised. Market opportunity. Boring stuff in appendices. Find a good angel. Worth their weight in gold.
Ralph Hodgson Making the business case for SW technology. Europe is ahead of the US in terms of development of knowledge management technologies - fuelled by EU funding. "Instant messenger" is now a concept we can now talk about. Establishing value propositions. Plans for new house - put sketches in front of people. Elevator pitch for Semantic Web. Top Quadrant, described in ontology. First product came from Strathclyde. FACT - deductive database. '85 funded. Fell fowl in market place - too much ahead of its time. They have trained 400 people in sem technology in USA. Now asking "how" questions, no longer "why". Ontoprise and larger companies working together. Proofs of concept are key. Federation, integration and interoperability are the unique value propositions in Semantic Web. This makes a small proof of concept difficult to make. No ontology modelling design patterns. What is the business capability "gap". Then explore potential solutions (not at technical level). Asked Ontoprise for capability cases. What would you do if you had an expert locator. Discuss the undiscussable. 5 adoption areas of Semantic Technology (slides on web?) (Talk got interesting at the end, but no time to go through the slides :-() NASA cube and other applications Trained 400 people - who? Knowledge engineers? S/w engineers? Mostly software architects, database designers. More federation of DBs. They do sessions on the technologies and then hands-on sessions for building things. RDF gateway, also does OWL. Rapid prototyping and round trips.
On the same wave length and disagree, or First two speakers - all the things that are really hard and can't be done. FrankH not work out well from tagging point of view. FrankS tagging has been successful. What do we mean by successful. By whom, for what purpsoe, does it scale? flickr and delicio.us, raw sugar What is first major success, and how will take us all by storm. frank - which sem web? Web of data? Blog? Boeing, Oracle, UK health care system, E-science Ron: Great stuff happening, but not part of movement of the masses. FrankS: Not a sem web. They could do it with HTML, SQL. FrankH: Could you do it with "X" - if you put enough sweat in it. Boeing stopped counting how many sem web databases and integrating them. Currently not talking about the sem web, but about sem islands. John Davies, British Telecoms. Company stops working if they pull the website plug. Dan: Don't have top down admin hierarchy and use sem web technology to deal with it Rohit: How do you make it acceptable to the masses. FrankS: Bloggers tag so blog is easier to find. They are tagging since it is only way people can find stuff on their blog. Ron: But search engines could be better in indexing blogs. FrankS: We can't ignore it and it works. ---- Steven: Extreme position - these two things are not different. Two presentations of same idea - of adding semantics. Microformats, they are rowing with the oars that they have. How can we leverage this? Sem Web trying to build new oars. Micorformats are great what they can do - but can't distinguish between different vocabularies. RDFA attempt to address remaing problems, so it still feels like tagging, but integrated into RDF. FrankS: People can tag any content. FrankH: question back - tags are about stuff we can do now. Need gradual transition. Make little hierarchy of tags. That could be a growth path. Ron: More people use email clients than tagging. Rohit: Pragmatic path - got to be a bridge - griddle. Data that exists today so can be scraped. Critical issue for microformats movement. Cards, calendars, views, listings. What about this other stuff. Annotea: Looks from users point of view. Store it to services Don: Annotea are getting repackaged in other ways. Ron: 30-40 years ago. Semantic networks. Community reinventing the hard way it was done before. Simple inferences - of you don't have well-founded semantics then you will start making mistakes. Be nice not to make all the mistakes of the past. Rohit: automate mistakes of the past! question: (long - I skipped it) rawsugar demonstrates that tags by themselves are not enough. Need to create order out of chaoas - so you need order. Auto or manually. Not need for single meaning. No need for manual markup - structure automatically. question: question answering - how do you ask the question? FrankH: search engines give you pile of documents and answer is in there somewhere. Sem web is part of that , not all of it. Marc Davis: Sem Web 2.0 and . Reference and inference. URIS are concepts, Physical world, this place in physical. URI or physical based. Sem Web logical model of inference. Language and meaning. Reference, inference. FrankH: statistical vs logical. Deep fundamental question. Do they meet or not. Not solve very quickly. I would use any hammer to hit the nail. Discussion about philosophical distinction. Marc: Reference? Don: workshop on Tuesday. Marc: tagging has theory of reference? FrankS: only statistical methods can be used [[I missed this]] Patrick's "pal": categorization, cannot tag my database. So different things. Where is overlap. Dan: 7000 people tagged it and accept as truth. Cryptography to go between things. FrankH: Sem Web is about categorisation and hierarchies, Tagging flat descriptions. Ron: Our own languages evolved over millenia
Benefits from globalization, need to accept and deal with change and innovation. Common theme with advancements is the Web - forced them to look at society and social issues. Web is abstract information space. Need mixed backgrounds: technical, and new media artists. Enterprise is no longer a local entity and need to deal with different cultures. New workforce needs to understand technical and business side of things. Competitive, value of solutions to users or company. Increase market share. Collaborative engineering in virtual enterprise. Need to share information, but at the same time protect it from specific parties. Ontologies - bring intelligence to applications? Realtime applications.
Complexity of biological data. Submit sequence information to one source and other information to another source. Top scientists could integrate patterms and insights. Human genome has been sequenced. Variety in human genes, activity levels of genes. Now 1,000 different data banks publcially available. Find out expression of every gene in a cell. No longer possible for people to integrate all this information in their heads. Challenging environment. Genes, proteins, metabolytes - experiments themselves become bigger. Share data.
Put information into model to ensure data has correct contextual meaning. Quality of data varies - need to know this when integrating the data.
Different vocabularies - term collision.
Looking at relational and XML approaches which were not meeting their requirements. Cost of new drug is still increasing.
RDF model is more flexible, statement about a fact, also about a statement. Provide information about how a particular piece of info has been generated. Sem Web m/c processable approach to analysing data. Manual is no longer possible. Browse data and find data of interest.
Integrate data sources and use data in UNANTICIPATED ways. Aggregate disparate data sets for first time, then serrendipidous approach. Accessed all data sets from the web, put them in RDF facet browser sedirium OWL constructs such as same-as.
Not that many data sets published in data in RDF. Can also allow people do on the fly mappings to RDF.
Oracle and RDF - motivation : customer requests/ graph model in a version of the database, so could build on it for RDF. Oracle RDF is available in latest release. Object relational capabilities - take all triples out of document. Users can work with data on top of highly optimised infrastructure.
Link represents complete triple (?) useful for annotating data and reification. Support for blank nodes. NRA (?) type relationship.
Extend SQL, so intend to stay compliant with SPARQL standard. Inferencing based on RDF, RDFS and user-defined rules. Parent of parent is a grandparent. Pull out grandparent relations and also store them.
Oracle 80,000,000 triples [Sesame 1,000,000,000 triples - inferencing becomes 1.8 billion.] Download the database! You can try it!
Use cases:
Stanford University, RDF data model. Three different data sets and
integrated using biopax for pathways data. Separated different
graphs: different species in different models. Simple query
interface. Enter own query syntax.
See web site.
Eli Lilly:
Target assessment tool. Drug discovery -> development.
Ontologies -> model, so can compare drug candidates. Very main
stream.
University of Texas Health Science Center:
Patterns of disease spreading and visits to emergency rooms.
Bio RDF is sub group of life sciences group.
Question Frank: Great talk. Semantics is there. Where is the web? Answer: Focus on scalable back-end. People to provide own applications at front-end. Focus more on the back-end. Q2 Frank: All in life sciences domain - head of the pack. Are there other domains also heating up to the message? A: Governments are starting to become interested (USA). Geospatial market. Nokia - MIT SwapMe, with Oracle as back-end. Q Brian Matthews: A: Company talking about "maybe we should do this". Why did we wait for 3 months? Took us half a day to start to get going. Subsets of data, looking at connectivity rather than scalability. Three days to get it all set up. Not rigorous enough for "real" drug discovery. Easy to get going and to get started then would take a while. Q: Ontologies existed? A: LSID were URIs to link datasets together. Converted unique ID to LSID where didn't exist. UniTexas, linking technologues is far from trivial.
ACTION ITEM Lynda: Find links to slides from these 3 speakers.
Coordinate bio rdf group. Getting data and making available in RDF. Demo over different data soruces. Explore tools and tehnologies. See list BioRDF oarticipants. Fertile environment to allow people to work together. (meadow) [J: Eculture meadow. This is what ACHE should be.] Cross-pollination between the flowers. Use tools from one group and pass on requirements to other. http;//w3.org/2001/sw/hcls BioRDF wiki
Integrating Life Sciences Data on the Web using SPARQL JavaScript SPARQL (I missed his URL, maybe this is useful), JSON (JavaScript Object Notation)
Bridging the HT and Semantic Webs http://rdfa.info Web works because interoperable. Recombing data from different pwieces in way not necessarily intended. Tens of billions of HTML pages and Sem Web will evolve _in part_ from this. DRY writing - Don't Repeat Yourself If there for rendering then don't repeat in structuring. Contact address - metadata extracted from that in context of location of the page. RDFa (Semantic HTML). SweetWiki - Semantic Web enabled in Wikis. World Cup kick-off. Microformats -> RDFa and then use RDFa toolsets. Retain in context info. GRDDL Ben will talk to other creative commons people to try to find someone to participate in the MM Sem XG.
ACTION ITEM: We all add RDFA to our home pages.
I really don't get this. It seems like a two-person project that is meant to be supporting 500 media content owners.
Inigo Surguy, MyCarEvent EU project SPARQL syntax editor and use Ajax to diaplay stuff. Going to be on CSCW site asopen source in a few weeks.
Nuxeo CPS
Microformats - encode semantics using CSS and XHTML. "Pave the cow pass." E.g. VCARD His slides are also Xforms! Mozilla xforms implementation.
Some posters I have found interesting ...
Tim Berners Lee has presented his last application Tabulator, a generic RDF browser. See http://www.w3.org/2005/10/ajaw/About.html. Many features: you can browse the schema, and the data (not a tree model). Use AJAX. More on Tim blog here.
Bijan Parsia has Abandonned Mindswap on Mon May 15, 2006!
From:
Mindswap news:
"Bijan Parsia, long time inmate of the MINDSWAP asylum, has busted loose, finding his way to a
faculty position at the University of Manchester. We congratulate Bijan on his position,
miss him a lot, and wish the best for him as he joins the second best Semantic Web group around :-)"