Terms and Term Databases

Next: GLR Parsing versus Parallel Up: Research Issues Previous: Research Issues

Terms and Term Databases

From a theoretical point of view, terms are a straightforward data type, but in the context of industry-sized applications various non-trivial problems have to be solved:

$\bullet$: What is the most concise representation for terms? For large terms in the mega-byte range, the time needed for file transfers becomes comparable to the time needed for term rewriting.
$\bullet$: Which statistics of term usage are relevant when choosing a term representation? Typically, statistics on the average arity of function symbols and the average length of lists can be used to select term representations that are either optimized for space or for access time.
$\bullet$: How can sharing of subterms be maximized? If maximal sharing is to be implemented--when a new term is about to be created, a lookup has to be performed to check whether that term already exists--what are the best table lookup strategies?
$\bullet$: How can terms be represented in a language-neutral form, so that they can be exchanged between tools that are implemented in different languages?
$\bullet$: If we take term rewriting seriously, global information has to be stored in term databases. How do we define and implement efficient storage, retrieval and query operations on such persistent databases?

Currently, we have found reasonable answers to most of the above questions in the design and implementation of the Annotated Term Format (ATF), a language-independent prefix format that is supported by efficient implementations in C and Java. However, several issues still need further research. For instance, how do we organize the access of several concurrent tools to a common term database? Initial work on waitfree algorithms to achieve this appears in [30].

Next: GLR Parsing versus Parallel Up: Research Issues Previous: Research Issues

Paul Klint 2001-06-12