sgfutils

The package sgfutils contains a few command line utilities that help working with SGF files that describe go (igo, weiqi, baduk) games. This page is about sgfinfo, sgfdb and sgfdbinfo.

See also sgf, sgfcharset, sgfcheck, sgfcmp, sgfdb, sgfdbinfo, sgfinfo, sgfmerge, sgfsplit, sgfstrip, sgftf, sgftopng, sgfvarsplit, sgfx, ugi2sgf.

sgf‌info

% sgfinfo [options] [--] [file(s)]
Given a list of SGF files, select those that satisfy some conditions, and print information about them. For me, sgfinfo scans 120000 games in less than 3 sec. (And sgfdbinfo scans a database with 600000 games in 0.6 sec.)

Typically, if -FOO is an option that asks for a certain value to be printed, then -FOO=val is an option that selects games where the value is as specified, and -FOO:part selects the games where that value contains the specified part. E.g., -propDT asks to print the DT field of the game record, -propDT=1846-09-11 asks for the games with DT field equal to "1846-09-11", and -propDT:1846 selects those where the DT field contains the substring "1846", while -propDT: asks for the games where a DT field is present.

Negation is specified using ! (that may need to be escaped for the shell). The selections -FOO!, -FOO!=val and -FOO!:part select the games where the value of FOO is unspecified, or is not equal to val, or doesn't contain part, respectively. E.g., -propKM! asks for the game records where no KM field is present.

Several selection flags can be combined, and then specify a logical AND of the conditions.

Input options

As input, sgfinfo takes a list of files (or, when no files are specified, stdin). Optionally, all files in a directory tree can be searched.
-r
Do a recursive search. When an input argument is a directory, it is searched recursively for files with a selected extension, by default .sgf.
-eEXT
Specify the desired extension for a recursive search. Usually, EXT will start with a '.' (or be empty).

Selection options

-fn:ChoChikun
Select the games with a filename containing the given string (case sensitive).
-sz13
Select games with board size 13x13
-x26
From a multi-game file, select game #26.
-h6-9
Select games with handicap between 6 and 9 stones.
-m123, -m-50, -m400-, -m200-250
Select games with 123, resp. at most 50, resp. at least 400, resp. between 200 and 250 moves.
-p-cf,dd, -p1-12cf,dd, -p120C14
Select games where positions cf and dd were played sometime in the game, resp. between moves 1 and 12; or where move 120 was at C14.
-Bp, -Wp
Idem, with back/white moves.
-pat=FILE.sgf
Select games where the pattern of FILE.sgf occurs. Here FILE.sgf is an SGF file with a single node that has only AB, AW and AE (and possibly SZ) properties, denoting the restrictions that certain positions must be Black, White or Empty. For example, the file with contents
(;AB[dd][ci][dp][pd][qi][pp])
selects the games with a position containing these six black stones.

The first move where this position occurs is given by the -k information option, see below.

The -swapcolors option asks for the same pattern with B/W swapped.

The -alltra option searches not only for the given pattern, but also for all 16 patterns obtained by rotating, reflecting, and interchanging colors. If the board size is not 19, then the pattern needs a SZ property, for example SZ[9], so that transformed patterns can be computed.

The -truncN option, for some number N, truncates a game to N moves. To select files where a pattern occurs in the first 50 moves, search with -pat=FILE.sgf -trunc50.

-md5=MD5
Select all games with given md5 signature (see below).
-can=CAN
Select all games with given can signature (see below).
-DsA=SIG, -DsB=SIG
Select all games with given Dyer signature A and/or B.
-DnA=SIG, -DnB=SIG
Select all games with given normalized Dyer signature A and/or B.
-propXY:, -propXY=, -rpropXY:, -rpropXY=, -nrpropXY:, -nrpropXY=
Select all games containing a property (root property, non-root property) with a value that contains or equals a given string.
--XY:, --XY=
Synonym of -propXY:, -propXY=.
-player:
Select all games with given player (regardless of color).

Information options

-N
Report the number of games in a collection.
-sz
Report board size.
-x
Report game number.
-h
Report handicap.
-m
Report number of moves.
-M
Print the actual moves, preceded by their number.
-MI, -MA
Idem for the initial setup moves or all moves.
-M37
Print move 37 (in aa-format).
-Mx37
Print move 37 (in aa-format) preceded by its player (B or W).
-Mc37
Print color (player) of move 37.
-s
Print all moves (including the initial setup) as a single long string.
-md5
Print the 32-byte md5 signature of this long string. If two games are equal (but the files are not, due to whitespace, different information about players, event, place, etc.) this is most easily recognized from equality of their md5 signatures. Game records are often incomplete, and one can use for example -md5 -trunc100 to compute the md5 signature of games over the first 100 moves.
-can
Print the 32-byte canonical signature (namely, the smallest of the eight md5 signatures obtained by rotating and/or reflecting the board). This signature is the same for all rotated/reflected versions of a game.
-canx
Print the 34-byte extended canonical signature (the canonical signature with a 2-byte suffix indicating the symmetry operation that transforms the game into the version with minimal md5).
-Ds20,40,60, -DsA, -DsB, ...
Print Dyer-type signature. Here -DsA is equivalent to -Ds20,40,60, and -DsB to -Ds31,51,71. Their concatenation is given by -DsC.
-Dn21,32,41,52,61,72, -DnC, ...
Print normalized Dyer-type signature, that is the smallest of the eight Dyer signatures obtained by rotating and/or reflecting the board.
-k
Report the first time a given pattern occurs.
-Bcapt
Report the number of Black stones captured.
-Wcapt
Report the number of White stones captured.
-capt
Equivalent to -Bcapt -Wcapt.
-prop
Print all property labels (only) that occur in the file.
-propXY
Print the value of property XY. (E.g. -propRE for the result, -propKM for the komi, -propDT for the date.) The additional option -replacenl will replace newlines by spaces in this output, so that a property value is reported on a single line. Only the first occurrence of XY is considered.
-rprop, -rpropXY
Idem, but for root properties (properties in the root node) only.
-nrprop, -nrpropXY
Idem, but for non-root properties only.
--XY
Synonym of -propXY.
-winner, -loser
Print the name of the winner (loser), if any. Nothing is printed in case of insufficient information or jigo.

Operations

(for -md5, -can, -M, -s*, -pat=)
-truncN, -trunc-N
Truncate to N moves, resp. remove the final N moves.
-rotN
Rotate left over N times 90 degrees. (N=0,1,2,3)
-traN
Apply transformation N (N=0,1,...,7), one of the 8 symmetries of the board. Here -rotM = -traN with N=2M. See also sgftf.

Reference file

Instead of specifying the date or the md5 signature (etc.), one can give a reference file F and ask for "the same as in F". The file F is specified using -ref F. The syntax for "the same" is @. Thus,
% sgfinfo -ref thisfile -DsA=@ -DsB=@ *.sgf
will find all games in the current directory with the same Dyer signature as thisfile.

Miscellaneous

-E
Set exit status of the program to 0 (true) if precisely one file was found that matched the requirements, to -1 if no file was found, to 1 if several files were found. Normally, the exit status is 0 when all was OK, and nonzero in case of a problem.
-b
Bare output: print values all on a single line, without labels. This is now the default.
+b
Multiline output: print values one per line, with labels.
-nf
No filename. Suppress printing the filenames (when other output was requested).
-i
Ignore errors. Normally, these programs exit on error, so that you can fix your files. When building a large data base, or searching a large number of SGF files, there will be many problems and it is easier to ignore problematic files and continue.
-q
Be a bit more quiet.
-t
Trace: report input read. Sometimes this helps to find at what point of the input file there is a problem.

sgfdb

The program sgfdb takes the input files, parses them, and stores the result in a database. Typical calls:
% sgfdb -o foo.sgfdb *.sgf
% sgfdb -i -q -r .
% sgfdb -i -r -e ".mgt" .
Here -r asks for a recursive tree walk and finds all .sgf files. Use -e EXT to specify a different (or no) extension. The -i flag asks to ignore errors. Without it an error causes an abort. The -q flag asks not to report errors in the SGF.

sgfdbinfo

Roughly, sgfdbinfo [options] database is equivalent to sgfinfo [options] files when the database was created using sgfdb -o database files. The default database is out.sgfdb and need not be specified explicitly. For recursive searches sgfdbinfo uses the default extension .sgfdb instead of .sgf.

Creating the database takes about the same time as searching. Searching in the database is (for me) 20 to 40 times as fast. The database typically takes half a kB per game.

Presently there are some differences between the results of sgfinfo and sgfdbinfo, mainly because sgfdb only preserves the moves, but strips comments and other fields, so that the -prop and -propXY options only work with sgfinfo.