The package sgfutils
contains a few command line utilities that help working with
SGF files that describe go (igo, weiqi, baduk) games.
This page is about sgfinfo, sgfdb and sgfdbinfo.
% sgfinfo [options] [--] [file(s)]
Given a list of SGF files, select those that satisfy some conditions,
and print information about them.
For me, sgfinfo scans 120000 games in less than 3 sec.
(And sgfdbinfo scans a database with 600000 games in 0.6 sec.)
Typically, if -FOO is an option that asks for a certain value
to be printed, then -FOO=val is an option that selects games
where the value is as specified, and -FOO:part selects the games
where that value contains the specified part.
E.g., -propDT asks to print the DT field
of the game record, -propDT=1846-09-11 asks for the games
with DT field equal to "1846-09-11", and -propDT:1846
selects those where the DT field contains the substring "1846", while
-propDT: asks for the games where a DT field is present.
Negation is specified using !
(that may need to be escaped for the shell).
The selections -FOO!, -FOO!=val and -FOO!:part
select the games where the value of FOO is unspecified,
or is not equal to val, or doesn't contain part,
E.g., -propKM! asks for the game records where no KM
field is present.
Several selection flags can be combined, and then specify a logical AND
of the conditions.
As input, sgfinfo takes a list of files (or, when no files
are specified, stdin). Optionally, all files in a directory
tree can be searched.
- Do a recursive search. When an input argument is a directory,
it is searched recursively for files with a selected extension,
by default .sgf.
- Specify the desired extension for a recursive search.
Usually, EXT will start with a '.' (or be empty).
- Select the games with a filename
containing the given string (case sensitive).
- Select games with board size 13x13
- From a multi-game file, select game #26.
- Select games with handicap between 6 and 9 stones.
- -m123, -m-50, -m400-, -m200-250
- Select games with 123, resp. at most 50, resp. at least 400,
resp. between 200 and 250 moves.
- -p-cf,dd, -p1-12cf,dd, -p120C14
- Select games where positions cf and dd were played sometime in the game,
resp. between moves 1 and 12; or where move 120 was at C14.
- -Bp, -Wp
- Idem, with back/white moves.
- Select games where the pattern of FILE.sgf occurs.
Here FILE.sgf is an SGF file with a single node
that has only AB, AW and AE (and possibly SZ) properties,
denoting the restrictions that certain positions must be Black, White or Empty.
For example, the file with contents
selects the games with a position containing these six black stones.
The first move where this position occurs is given by the
-k information option, see below.
option asks for the same pattern with B/W swapped.
The -alltra option searches not only for the given pattern,
but also for all 16 patterns obtained by rotating, reflecting, and
interchanging colors. If the board size is not 19, then the pattern
needs a SZ property, for example SZ, so that transformed patterns
can be computed.
The -truncN option, for some number N,
truncates a game to N moves.
To select files where a pattern occurs in the first 50 moves, search
with -pat=FILE.sgf -trunc50.
- Select all games with given md5 signature (see below).
- Select all games with given can signature (see below).
- -DsA=SIG, -DsB=SIG
- Select all games with given Dyer signature A and/or B.
- -DnA=SIG, -DnB=SIG
- Select all games with given normalized Dyer signature A and/or B.
- -propXY:, -propXY=,
- Select all games containing a property (root property, non-root property)
with a value that contains or equals a given string.
- --XY:, --XY=
- Synonym of -propXY:, -propXY=.
- Select all games with given player (regardless of color).
- Report the number of games in a collection.
- Report board size.
- Report game number.
- Report handicap.
- Report number of moves.
- Print the actual moves, preceded by their number.
- -MI, -MA
- Idem for the initial setup moves or all moves.
- Print move 37 (in aa-format).
- Print move 37 (in aa-format) preceded by its player (B or W).
- Print color (player) of move 37.
- Print all moves (including the initial setup) as a single long string.
- Print the 32-byte md5 signature of this long string. If two games
are equal (but the files are not, due to whitespace, different information
about players, event, place, etc.) this is most easily recognized
from equality of their md5 signatures. Game records are often
incomplete, and one can use for example -md5 -trunc100
to compute the md5 signature of games over the first 100 moves.
- Print the 32-byte canonical signature (namely, the smallest of the eight
md5 signatures obtained by rotating and/or reflecting the board). This
signature is the same for all rotated/reflected versions of a game.
- Print the 34-byte extended canonical signature (the canonical signature
with a 2-byte suffix indicating the symmetry operation that transforms
the game into the version with minimal md5).
- -Ds20,40,60, -DsA, -DsB, ...
- Print Dyer-type signature. Here -DsA is equivalent to
-Ds20,40,60, and -DsB to -Ds31,51,71.
Their concatenation is given by -DsC.
- -Dn21,32,41,52,61,72, -DnC, ...
- Print normalized Dyer-type signature, that is the smallest
of the eight Dyer signatures obtained by rotating and/or reflecting
- Report the first time a given pattern occurs.
- Report the number of Black stones captured.
- Report the number of White stones captured.
- Equivalent to -Bcapt -Wcapt.
- Print all property labels (only) that occur in the file.
- Print the value of property XY. (E.g. -propRE for the
result, -propKM for the komi, -propDT for the date.)
The additional option -replacenl will replace newlines by
spaces in this output, so that a property value is reported on a
single line. Only the first occurrence of XY is considered.
- -rprop, -rpropXY
- Idem, but for root properties (properties in the root node) only.
- -nrprop, -nrpropXY
- Idem, but for non-root properties only.
- Synonym of -propXY.
- -winner, -loser
- Print the name of the winner (loser), if any.
Nothing is printed in case of insufficient information or jigo.
(for -md5, -can, -M, -s*, -pat=)
- -truncN, -trunc-N
- Truncate to N moves, resp. remove the final N moves.
- Rotate left over N times 90 degrees. (N=0,1,2,3)
- Apply transformation N (N=0,1,...,7),
one of the 8 symmetries of the board.
Here -rotM = -traN with N=2M.
See also sgftf.
Instead of specifying the date or the md5 signature (etc.),
one can give a reference file F and ask for "the same as in F".
The file F is specified using -ref F.
The syntax for "the same" is @. Thus,
% sgfinfo -ref thisfile -DsA=@ -DsB=@ *.sgf
will find all games in the current directory with the same Dyer signature
The program sgfdb takes the input files, parses them,
and stores the result in a database. Typical calls:
- Set exit status of the program to 0 (true) if precisely one
file was found that matched the requirements, to -1 if no file
was found, to 1 if several files were found.
Normally, the exit status is 0 when all was OK, and nonzero
in case of a problem.
- Bare output: print values all on a single line, without labels.
This is now the default.
- Multiline output: print values one per line, with labels.
- No filename. Suppress printing the filenames (when other output
- Ignore errors. Normally, these programs exit on error, so that you
can fix your files. When building a large data base, or searching a
large number of SGF files, there will be many problems and it is easier
to ignore problematic files and continue.
- Be a bit more quiet.
- Trace: report input read. Sometimes this helps to find at what point
of the input file there is a problem.
% sgfdb -o foo.sgfdb *.sgf
% sgfdb -i -q -r .
% sgfdb -i -r -e ".mgt" .
Here -r asks for a recursive tree walk and finds all .sgf
files. Use -e EXT to specify a different (or no) extension.
The -i flag asks to ignore errors. Without it an error causes
an abort. The -q flag asks not to report errors in the SGF.
Roughly, sgfdbinfo [options] database is equivalent to
sgfinfo [options] files when the database was created
using sgfdb -o database files. The default database is
out.sgfdb and need not be specified explicitly.
For recursive searches sgfdbinfo uses the default extension
.sgfdb instead of .sgf.
Creating the database takes about the same time as searching.
Searching in the database is (for me) 20 to 40 times as fast.
The database typically takes half a kB per game.
Presently there are some differences between the results of
sgfinfo and sgfdbinfo, mainly because
sgfdb only preserves the moves, but strips
comments and other fields, so that the -prop and
-propXY options only work with sgfinfo.