The UGF format
On Japanese sites one encounters records of go games
in the UGF format. The files have .ugi extension.
I have never seen a specification of this format.
Below a description of what can be gleaned from a handful
of example files.
converts from UGF/UGI to SGF.
UGZ files are UGF files compressed using the LHA compression algorithm.
On my Linux machine, the default lha decompressor is not
able to handle these files, but the files I have seen start with
PP%D-lh5-, and if I remove the first two bytes, all is well:
% file wa22-507.ugz
% dd if=wa22-507.ugz ibs=1 skip=2 of=zzz
% file zzz
zzz: LHa (2.x) archive data [lh5]
% lha x zzz
wa22-507.ugf - Melted : o
% rm zzz
% ls -l wa22-507*
-rw-rw-r-- 1 aeb aeb 1634 Jun 21 2000 wa22-507.ugf
-rw-rw-r-- 1 aeb aeb 851 Oct 20 22:58 wa22-507.ugz
that is, lha turns the UGZ file minus 2 bytes into an UGF file
(with extension .ugf, not .ugi).
So the tiny script (let me call it ugz2ugf)
if [ $# = 0 ]; then echo Call: ugz2ugf files; exit; fi
for i in "$@"; do
dd if="$i" ibs=1 skip=2 of=$TMP 2>/dev/null
lha x $TMP
rm -f $TMP
unpacks UGZ files into UGF files.
An UGF file consists of a number of sections. Each section starts
with a header line giving the section name enclosed in square brackets.
The section end is not designated. A section ends when the next one
starts, or at end-of-file. The last line need not end with a newline.
At the start of the file there may be comment lines starting with '#'.
Lines are separated as usual on the system (by \r, \n, or \r\n).
All .ugi files I have seen contained a [Header]
section, and almost all a [Data] section.
Some contain further sections.
The order seems to be [Header], [Remote],
[Files], [Data], [Figure].
Also [MessageLine] and [ReviewNode] occur.
The sections [Header], [Remote] and
[Files] consist of the header line followed by a number
of variable definitions, one per line, of the form name=value.
If the value is structured, it consists of a series of items
separated by commas. If an item is structured, it consists of
a series of subitems separated by semicolons.
The Header section
The Header section consists of the line [Header]
followed by variable definitions. For example,
Below an alphabetical list of the variables encountered so far.
- See PlayerB.
- A label or serial number. Often empty or 0.
The first game in the first round of the 2001 World Amateur Go Championship
commented game in the PandaNet Library has Code=PLB-0001.
- Empty or IGS or JPN.
For type IGS the 2nd coordinate is flipped compared to SGF.
- Comment. E.g. 半コウ黒勝ち.
- Commentator. E.g. M. Redmond,8-dan,,0.
Sometimes the Header and Remote sections each have
a Commentator= field.
- See RedirectURL.
- See RedirectURL.
- The entity asserting copyright over the game record.
- Encryption. Two or three fields. The first is 0 or
a 4-digit decimal integer.
The second is always PLAIN_UGF_FILE.
The third, when present, is READ_ONLY or READ_WRITE.
- Start and end date and time of the game.
- Handicap and komi. E.g. 0,5.50 or 0,6.50.
- Character set (not language!). Seen empty and SJIS.
Also when this field is empty, the character set is often SJIS.
To be more precise, I encounter CP932, a.k.a. windows-31J,
the Microsoft extension to the Japanese standard SJIS.
For example, Chen Yaoye 陳耀燁 is spelled with fb 59 for the
3rd radical, and this is in CP932 but not in SJIS.
- The number of moves (or 0 or 999).
- Place of the event. E.g. Nihon Ki-in Tokyo Japan.
- Black player. In older files one also sees the equivalent BMemb1=.
The value is structured with four fields, of which the first two
are name and rank (or country). The third is 0 or a serial number.
- Second Black player, e.g. in pairgo.
- White player. One also sees the equivalent WMemb1=.
- Second White player.
- Probably an indication of time available and byo-yomi system.
E.g. I9015;600,I9015;600,JJ0,JJ0 or
There are four fields of which the first two are structured
(and equal in all my examples).
Maybe the first two indicate the time available to each player,
and the last two the time actually used (in minutes).
- Some UGF files have only a Header section, with the five
variables Ver=, CommunityAuth=,
RedirectURL=, CommunityID=, UGIKey=.
The RedirectURL gives a URL XORed with a constant string.
This is used in cases where some kind of authorization is required.
- The rule set used. E.g. JPN.
- The board size. E.g. 19 or 9.
- Name of the event.
E.g. ,pair go RICOH CUP 2006,Quarterfinal or
23rd,JAL Cup World Amateur Go Championship,8th round.
We see three parts: year or number in a series, actual name, round.
- See RedirectURL.
- Version. Usually UGF,100 or UGF2,100
or UGF3,200 or PANDA-EGG.
- Result of the game (winner and score).
E.g. B,2.5 or W,2.50
or B,T or B,C or B,F2
or P,E or P,0.
In SGF notation the first five would have been
B+2.5, W+2.5, B+T, B+R, B+F. Probably the digit in B,F2
indicates a reason. In this case it was an illegal ko capture.
The P,- may be a game that is being played now.
For the first field, Jan van Loenen's code also knows about
D (draw), O (both lost), N (no result),
P (playing), A (abandoned), E (other).
Usually each is followed by ,E.
A void game with triple ko was marked N1,E.
- See PlayerW.
- Name of a person, sometimes of a program.
The Remote section
Like [Header], also [Remote] consists
of a series of variable definitions. For example
This example had an empty Copyright= field in the Header section,
but a copyright notice here. The three fields PhotoB,
PhotoW and AdvertisingIcon refer to photographs
given in the Files section. Sometimes there is also a PhotoC
(of the commentator).
The Files section
In almost all games I have seen, if there was a [Files] section,
there were precisely three entries. The [Remote] section
would contain the lines
and then the [Files] section would be
where the (long) lines are photographs of the players
and an icon, all as hexdump of a JPEG image.
The Data section
The [Data] section gives the moves, one per line,
e.g. AL,W2,250,0, where this denotes
move number 250, which is W[al], by the 2nd White player.
The player color is B or W. The following digit indicates
the player. For an ordinary game it will always be 1.
For pairgo one expects the pattern 1,1,2,2,1,1,2,2,1,1,2,2,...
Larger numbers may occur for a multi-player relay game.
Move numbers count from 1.
The fourth field gives the time used in seconds.
Coordinates here count from left and bottom when CoordinateType=IGS,
and from left and top when CoordinateType=JPN.
A pass is indicated by YA, e.g. YA,B1,221,0.
Once seen: YZ,MK,278,0.
Some files have move lines like QO,B1,20,0 '00:01:13
and ZZ,W1,283,0 '00:15:00, where the parts after the quote
are per-player cumulative times used.
The MessageLine section
Some of the pairgo files examined had a [MessageLine] section.
.Text,121,125,Rotation error from 121 to 124.
indicates that moves 121–124 in
deviate from the 1,1,2,2 pattern.
The Figure section
Instead of the hane, Black A would have been more severe. The cuts at O-9 and P-6 are miai, so White is in trouble.
This section gives the per-move comments.
There can be .Text and .Fig subsections.
Also the Comment section can have .Fig subsections.
A .Fig subsection consists of a series of moves
(setup moves with move number 0 and numbered moves)
and a .Text subsubsection. The first parameter
of .Fig is the move number this figure is related to,
but these figures are not variations in the SGF sense of the word.
They are just diagrams that have some relation to the current game.
Inside a .Text (sub)subsection, there can be
.# lines defining the point referred to.
Mail corrections and additions to