A General-Purpose Database

Steven Pemberton
CWI, Amsterdam

Someone came to the database experts in our department and said that the mailing lists she had to administer were getting unmanageable. They had originally been prepared by different people, so they were all in a different format, using different conventions, and in different files. She needed a program to help her manage them, and she sketched the facilities she needed:

The database would be quite small, a couple of thousand records or so. Each record would have a number of standard fields: name, institute, department, address, city, country, email address, and so on. There should also be a 'code' field where she could say which mailing lists this address belonged to, such as ABC, the Operating System list she managed, and so on.

She should then be able to look up entries, and above all make selections which could then be printed off as labels. She should be able to find out how large a selection was. She should also be able to print all records out for a card index on her desk.

To aid searching, certain fields such as country, should be constrained so that only unique values are used; not United Kingdom in one case, and Great Britain in another.

At this point, I got called in on the discussions, and after rejecting some possibilities (the standard database package is only available on one computer that is not accessible for everyone who needs to access the mailing lists), without further ado, three of us sat down at a workstation, and in an afternoon wrote the program in ABC.

Data Representation

A first decision we had to take was how we were going to represent the database.

It was obvious from the specification that we didn't know exactly how many fields there were going to be, nor what they were, and furthermore that it was likely to change, so we needed to be as flexible as possible.

Therefore we decided to have a structure defining the allowable field names, and each record would then be a table from these field names to the field value. Each record would be regarded as having an entry for each field, though if it were empty, it wouldn't be physically there. This meant that if we added a new field-name to the defining structure, all records effectively got an empty field with that name.

It also meant that if a field-name was later deleted, all those fields in the database became no longer accessible, they apparently disappeared, though they were physically still there, so that if later the field name was reinstated, the values would reappear.

So to define the allowable field names:

PUT split "Name Institute Dept Address City Postcode Land Code" IN field.names

An entry in the database might then look like:

{["Name"]: "Jane Smith"; ["Institute"]: "Univ. of Lighf"; ["Land"]: "Erewhon"}

Missing fields are considered present but empty.

Now, given this format, we can immediately write a how-to to show a record:

HOW TO SHOW RECORD record:
    SHARE field.names
    FOR name IN field.names:
        IF name IN keys record:
            WRITE name, ": ", record[name]/

and one to read a record:

HOW TO GET record:
    SHARE field.names
    PUT {} IN record
    FOR name IN field.names:
        WRITE name, ": "
            READ field RAW
            IF field <> "":
                PUT field IN record[name]

The whole database is just a set of records (with no implied ordering) So to display the whole database, record by record, we can use:

HOW TO SHOW db:
    FOR r IN db:
        SHOW RECORD r
        WRITE /

To add a record to the database:

HOW TO ADD TO db:
    GET record
    IF record <> {}: INSERT record IN db

Selection

A basic action you want to do with a database is select records on the basis of certain criteria. To do this, we shall write some functions that given a database and a set of selection criteria, deliver a database that is a sub-set of the original one.

The represention we shall use for the criteria is just a record: for each record in the database if the record matches in the required way with the criteria-record, then it will form a part of the result:

HOW TO RETURN db equals criteria:
    PUT {} IN result
    FOR record IN db:
        IF matches:
            INSERT record IN result
    RETURN result
matches:
    REPORT EACH name IN keys criteria HAS field.match
field.match:
    REPORT name in keys record AND record[name] = criteria[name]

This version gives an exact match, so if we say

SHOW db matches {["Land"]: "UK"}

then we'll get all records with the Land field equal to UK. If we want a partial match, we can use:

HOW TO RETURN db contains criteria:
    PUT {} IN result
    FOR record IN db:
        IF matches:
            INSERT record IN result
    RETURN result
matches:
    REPORT EACH name IN keys criteria HAS field.match
field.match:
    REPORT name IN keys record AND lower record[name] includes lower criteria[r]

and includes reports whether the one text includes the other:

HOW TO REPORT t includes s:
    REPORT SOME i IN {1..#t-#s+1} HAS t@i|#s = s

The how-to contains matches any record where the field contains the relevant criteria, ignoring case. So:

SHOW db contains {["Land"]: "UK"}

would match for instance UK and Ukraine.

Note that because these functions take a database as parameter, we can chain them:

SHOW (db equals {["Land"]: "UK"}) contains {["City"]: "York"}

which selects all records that contain the city York, in the country UK. Also note that

SHOW db contains {["Land"]: ""}

would show all entries with a Land field.

Formatting

Each country has a different way of formatting its addresses, so we need a flexible method of specifying address formats.

What we are going to use here is a table of land names to formats, where each format is a single text. For instance:

>>> WRITE format["NL"]
Name / Dept / Institute / Address / Postcode _ _ City / _
>>> WRITE format["UK"]
Name / Dept / Institute / Address / City _ Postcode / UK / _

The text is a number of words. A "/" represents a new line, a "_" represents a space. Other words are either field names, in which case the corresponding entry in the record is substituted, or literal words. Completely empty lines are not output, but lines containing spaces are (so the last part of the format above ensures a blank line between records). Here's how we output a set of records:

HOW TO FORMAT records WITH formats:
    SHARE field.names, format
    FOR r IN records:
        PUT "" IN out
        FORMAT RECORD
chosen.format:
    SELECT:
        "Land" in keys r AND r["Land"] in keys format:
            RETURN format[r["Land"]]
        ELSE:
            RETURN format["default"]
FORMAT RECORD:
    FOR word IN split chosen.format:
        SELECT:
            word in field.names:
                IF word in keys r:
                    PUT out^r[word] IN out
            word = "/":
                IF out <> "": WRITE out/
                    PUT "" IN out
            word = "_":
                PUT out^" " IN out
            ELSE:
                PUT out^word IN out
    IF out <> "": WRITE out/

This same code then lets us get an overview of a set of entries, just by using another set of formats. For instance:

>>> WRITE brief
{["default"]: "Name , _ City , _ Land"}
>>> FORMAT db WITH brief

Changing

Just as we had a method of inputting entries, we also need to supply a way of changing them. Here we do more or less the same as with input, except we display the field name and entry before asking for input. If the user types a newline, the entry is unchanged; if the user types other data that then replaces the old. We then need a way of deleting an entry: we do this by saying that if the user types one or more spaces as input, it has the effect of deleting the entry:

HOW TO RETURN modified record:
    SHARE field.names
    PUT record IN new
    FOR field IN field.names:
        WRITE field, ": "
        IF field in keys record:
            WRITE record[field], " "
        READ answer RAW
        SELECT:
            answer = "": PASS
            stripped answer = "":
                IF field in keys new:
                    DELETE new[field]
            ELSE:
                PUT answer IN new[field]
    RETURN new

HOW TO REPLACE old WITH new IN db:
    IF old <> new:
        IF old IN db: REMOVE old FROM db
        IF new <> {}: INSERT new IN db

Note that we could now alter ADD TO db to:

HOW TO ADD TO db:
    REPLACE {} WITH modified {} IN db

which would have the same effect.

Putting it all together

Now that we've got a number of useful how-tos, we can put them together with a driving program. In our final version we had a parser with an extensive query language. For now here is a simple version.

There is a concept of the current selection. Initially the selection is the whole database. The selection command has the form "key = value" or "key ~ value" for exact or approximate matches (for instance "Land = UK"). Subsequent selection commands act on the current selection. The command 'all' selects the whole database again.

You use 'show' to show the current selection, 'brief' to summarize the current selection, 'format' to format the selection, 'change' to modify records in the selection. Finally 'help' gives a help message.

HOW TO DATABASE:
    SHARE db, field.names
    PUT db IN selection
    WRITE #db, "entries"/
    GET COMMAND
    WHILE command <> "quit":
        IF stripped command <> "": OBEY
        GET COMMAND
GET COMMAND:
    WRITE "> "
    READ command RAW
    PUT lower command IN command
OBEY:
    PUT split command IN words
    SELECT:
        command = "new":
            ADD TO db
        #words = 3: \Selection command
            PUT obey.select IN selection
        command = "all": \Select all
            PUT db IN selection
        command = "show":
            SHOW selection
        command = "brief":
            FORMAT selection WITH brief
        command = "format":
            FORMAT selection WITH formats
        command = "change":
            FOR r IN selection:
                PUT modified r IN new
                REPLACE r WITH new IN db
                REPLACE r WITH new IN selection
        command = "help": GIVE HELP
        ELSE:
            WRITE "Not recognised"/
            WRITE "Use 'help' for help" /
obey.select:
    SELECT:
        words[1] not.in field.names:
            WRITE "No such field name" /
            WRITE "Ignored" /
            RETURN selection
        words[2] = "=":
            PUT selection matches {[words[1]]: words[3]} IN s
        words[2] = "~":
            PUT selection contains {[words[1]]: words[3]} IN s
        ELSE:
            WRITE "Operator ", words[2]
            WRITE "unrecognised"/
            WRITE "Ignored" /
            RETURN selection
    SELECT:
        #s = 0:
            WRITE "No matches" /
            WRITE "Ignored" /
            RETURN selection
        #s = 1: WRITE "1 entry"/
        ELSE: WRITE #s, "entries"/
    RETURN s

You could easily add an undo command, by keeping a copy of the current selection every time it gets changed. For instance, replace

PUT obey.select IN selection

with

PUT selection IN undo
PUT obey.select IN selection

Then the undo command only has to swap the values of the copy and the current selection:

PUT undo, selection IN selection, undo

Initially, undo should be empty.