Steven Pemberton
CWI, Amsterdam
Someone came to the database experts in our department and said that the mailing lists she had to administer were getting unmanageable. They had originally been prepared by different people, so they were all in a different format, using different conventions, and in different files. She needed a program to help her manage them, and she sketched the facilities she needed:
The database would be quite small, a couple of thousand records or so. Each record would have a number of standard fields: name, institute, department, address, city, country, email address, and so on. There should also be a 'code' field where she could say which mailing lists this address belonged to, such as ABC, the Operating System list she managed, and so on.
She should then be able to look up entries, and above all make selections which could then be printed off as labels. She should be able to find out how large a selection was. She should also be able to print all records out for a card index on her desk.
To aid searching, certain fields such as country, should be constrained so that only unique values are used; not United Kingdom in one case, and Great Britain in another.
At this point, I got called in on the discussions, and after rejecting some possibilities (the standard database package is only available on one computer that is not accessible for everyone who needs to access the mailing lists), without further ado, three of us sat down at a workstation, and in an afternoon wrote the program in ABC.
A first decision we had to take was how we were going to represent the database.
It was obvious from the specification that we didn't know exactly how many fields there were going to be, nor what they were, and furthermore that it was likely to change, so we needed to be as flexible as possible.
Therefore we decided to have a structure defining the allowable field names, and each record would then be a table from these field names to the field value. Each record would be regarded as having an entry for each field, though if it were empty, it wouldn't be physically there. This meant that if we added a new field-name to the defining structure, all records effectively got an empty field with that name.
It also meant that if a field-name was later deleted, all those fields in the database became no longer accessible, they apparently disappeared, though they were physically still there, so that if later the field name was reinstated, the values would reappear.
So to define the allowable field names:
PUT split "Name Institute Dept Address City Postcode Land Code" IN field.names
An entry in the database might then look like:
{["Name"]: "Jane Smith"; ["Institute"]: "Univ. of Lighf"; ["Land"]: "Erewhon"}
Missing fields are considered present but empty.
Now, given this format, we can immediately write a how-to to show a record:
HOW TO SHOW RECORD record: SHARE field.names FOR name IN field.names: IF name IN keys record: WRITE name, ": ", record[name]/
and one to read a record:
HOW TO GET record: SHARE field.names PUT {} IN record FOR name IN field.names: WRITE name, ": " READ field RAW IF field <> "": PUT field IN record[name]
The whole database is just a set of records (with no implied ordering) So to display the whole database, record by record, we can use:
HOW TO SHOW db: FOR r IN db: SHOW RECORD r WRITE /
To add a record to the database:
HOW TO ADD TO db: GET record IF record <> {}: INSERT record IN db
A basic action you want to do with a database is select records on the basis of certain criteria. To do this, we shall write some functions that given a database and a set of selection criteria, deliver a database that is a sub-set of the original one.
The represention we shall use for the criteria is just a record: for each record in the database if the record matches in the required way with the criteria-record, then it will form a part of the result:
HOW TO RETURN db equals criteria: PUT {} IN result FOR record IN db: IF matches: INSERT record IN result RETURN result matches: REPORT EACH name IN keys criteria HAS field.match field.match: REPORT name in keys record AND record[name] = criteria[name]
This version gives an exact match, so if we say
SHOW db matches {["Land"]: "UK"}
then we'll get all records with the Land field equal to UK. If we want a partial match, we can use:
HOW TO RETURN db contains criteria: PUT {} IN result FOR record IN db: IF matches: INSERT record IN result RETURN result matches: REPORT EACH name IN keys criteria HAS field.match field.match: REPORT name IN keys record AND lower record[name] includes lower criteria[r]
and includes reports whether the one text includes the other:
HOW TO REPORT t includes s: REPORT SOME i IN {1..#t-#s+1} HAS t@i|#s = s
The how-to contains matches any record where the field contains the relevant criteria, ignoring case. So:
SHOW db contains {["Land"]: "UK"}
would match for instance UK and Ukraine.
Note that because these functions take a database as parameter, we can chain them:
SHOW (db equals {["Land"]: "UK"}) contains {["City"]: "York"}
which selects all records that contain the city York, in the country UK. Also note that
SHOW db contains {["Land"]: ""}
would show all entries with a Land field.
Each country has a different way of formatting its addresses, so we need a flexible method of specifying address formats.
What we are going to use here is a table of land names to formats, where each format is a single text. For instance:
>>> WRITE format["NL"] Name / Dept / Institute / Address / Postcode _ _ City / _ >>> WRITE format["UK"] Name / Dept / Institute / Address / City _ Postcode / UK / _
The text is a number of words. A "/" represents a new line, a "_" represents a space. Other words are either field names, in which case the corresponding entry in the record is substituted, or literal words. Completely empty lines are not output, but lines containing spaces are (so the last part of the format above ensures a blank line between records). Here's how we output a set of records:
HOW TO FORMAT records WITH formats: SHARE field.names, format FOR r IN records: PUT "" IN out FORMAT RECORD chosen.format: SELECT: "Land" in keys r ANDÂ r["Land"] in keys format: RETURN format[r["Land"]] ELSE: RETURN format["default"] FORMAT RECORD: FOR word IN split chosen.format: SELECT: word in field.names: IF word in keys r: PUT out^r[word] IN out word = "/": IF out <> "": WRITE out/ PUT "" IN out word = "_": PUT out^" " IN out ELSE: PUT out^word IN out IF out <> "": WRITE out/
This same code then lets us get an overview of a set of entries, just by using another set of formats. For instance:
>>> WRITE brief {["default"]: "Name , _ City , _ Land"} >>> FORMAT db WITH brief
Just as we had a method of inputting entries, we also need to supply a way of changing them. Here we do more or less the same as with input, except we display the field name and entry before asking for input. If the user types a newline, the entry is unchanged; if the user types other data that then replaces the old. We then need a way of deleting an entry: we do this by saying that if the user types one or more spaces as input, it has the effect of deleting the entry:
HOW TO RETURN modified record: SHARE field.names PUT record IN new FOR field IN field.names: WRITE field, ": " IF field in keys record: WRITE record[field], " " READ answer RAW SELECT: answer = "": PASS stripped answer = "": IF field in keys new: DELETE new[field] ELSE: PUT answer IN new[field] RETURN new
HOW TO REPLACE old WITH new IN db: IF old <> new: IF old IN db: REMOVE old FROM db IF new <> {}: INSERT new IN db
Note that we could now alter ADD TO db to:
HOW TO ADD TO db: REPLACE {} WITH modified {} IN db
which would have the same effect.
Now that we've got a number of useful how-tos, we can put them together with a driving program. In our final version we had a parser with an extensive query language. For now here is a simple version.
There is a concept of the current selection. Initially the selection is the whole database. The selection command has the form "key = value" or "key ~ value" for exact or approximate matches (for instance "Land = UK"). Subsequent selection commands act on the current selection. The command 'all' selects the whole database again.
You use 'show' to show the current selection, 'brief' to summarize the current selection, 'format' to format the selection, 'change' to modify records in the selection. Finally 'help' gives a help message.
HOW TO DATABASE: SHARE db, field.names PUT db IN selection WRITE #db, "entries"/ GET COMMAND WHILE command <> "quit": IF stripped command <> "": OBEY GET COMMAND GET COMMAND: WRITE "> " READ command RAW PUT lower command IN command OBEY: PUT split command IN words SELECT: command = "new": ADD TO db #words = 3: \Selection command PUT obey.select IN selection command = "all": \Select all PUT db IN selection command = "show": SHOW selection command = "brief": FORMAT selection WITH brief command = "format": FORMAT selection WITH formats command = "change": FOR r IN selection: PUT modified r IN new REPLACE r WITH new IN db REPLACE r WITH new IN selection command = "help": GIVE HELP ELSE: WRITE "Not recognised"/ WRITE "Use 'help' for help" / obey.select: SELECT: words[1] not.in field.names: WRITE "No such field name" / WRITE "Ignored" / RETURN selection words[2] = "=": PUT selection matches {[words[1]]: words[3]} IN s words[2] = "~": PUT selection contains {[words[1]]: words[3]} IN s ELSE: WRITE "Operator ", words[2] WRITE "unrecognised"/ WRITE "Ignored" / RETURN selection SELECT: #s = 0: WRITE "No matches" / WRITE "Ignored" / RETURN selection #s = 1: WRITE "1 entry"/ ELSE: WRITE #s, "entries"/ RETURN s
You could easily add an undo command, by keeping a copy of the current selection every time it gets changed. For instance, replace
PUT obey.select IN selection
with
PUT selection IN undo PUT obey.select IN selection
Then the undo command only has to swap the values of the copy and the current selection:
PUT undo, selection IN selection, undo
Initially, undo should be empty.