Next: GLR Parsing versus Parallel
Up: Research Issues
Previous: Research Issues
Terms and Term Databases
From a theoretical point of view, terms are a straightforward data
type, but in the context of industry-sized applications various non-trivial
problems have to be solved:
-
- What is the most concise representation for terms? For large
terms in the mega-byte range, the time needed for file transfers
becomes comparable to the time needed for term rewriting.
-
- Which statistics of term usage are relevant when choosing a term
representation? Typically, statistics on the average arity of
function symbols and the average length of lists can be used to select
term representations that are either optimized for space or for access
time.
-
- How can sharing of subterms be maximized?
If maximal sharing is to be implemented--when a new term
is about to be created, a lookup has to be performed to check
whether that term already exists--what are the best
table lookup strategies?
-
- How can terms be represented in a language-neutral form,
so that they can be exchanged between tools that are implemented in
different languages?
-
- If we take term rewriting seriously, global information has to
be stored in term databases. How do we define and implement efficient
storage, retrieval and query operations on such persistent databases?
Currently, we have found reasonable answers to most of the above
questions in the design and implementation of the Annotated Term
Format (ATF), a language-independent prefix format that is supported
by efficient implementations in C and Java. However, several issues
still need further research. For instance, how do we organize the
access of several concurrent tools to a common term database? Initial
work on waitfree algorithms to achieve this appears in [30].
Next: GLR Parsing versus Parallel
Up: Research Issues
Previous: Research Issues
Paul Klint
2001-06-12