The UGF format

On Japanese sites one encounters records of go games in the UGF format. The files have .ugi extension. I have never seen a specification of this format. Below a description of what can be gleaned from a handful of example files.

The utility ugi2sgf converts from UGF/UGI to SGF.

UGZ

UGZ files are UGF files compressed using the LHA compression algorithm. On my Linux machine, the default lha decompressor is not able to handle these files, but the files I have seen start with PP%D-lh5-, and if I remove the first two bytes, all is well:
% file wa22-507.ugz
wa22-507.ugz: data
% dd if=wa22-507.ugz ibs=1 skip=2 of=zzz
% file zzz
zzz: LHa (2.x) archive data [lh5]
% lha x zzz
wa22-507.ugf    - Melted   :  o
% rm zzz
% ls -l wa22-507*
-rw-rw-r-- 1 aeb aeb 1634 Jun 21  2000 wa22-507.ugf
-rw-rw-r-- 1 aeb aeb  851 Oct 20 22:58 wa22-507.ugz
that is, lha turns the UGZ file minus 2 bytes into an UGF file (with extension .ugf, not .ugi).

So the tiny script (let me call it ugz2ugf)

#!/bin/sh
if [ $# = 0 ]; then echo Call: ugz2ugf files; exit; fi
TMP=/tmp/ugz2ugf.$$
for i in "$@"; do
  dd if="$i" ibs=1 skip=2 of=$TMP 2>/dev/null
  lha x $TMP
done
rm -f $TMP
unpacks UGZ files into UGF files.

UGF

An UGF file consists of a number of sections. Each section starts with a header line giving the section name enclosed in square brackets. The section end is not designated. A section ends when the next one starts, or at end-of-file. The last line need not end with a newline. At the start of the file there may be comment lines starting with '#'. Lines are separated as usual on the system (by \r, \n, or \r\n).

The sections

All .ugi files I have seen contained a [Header] section, and almost all a [Data] section. Some contain further sections. The order seems to be [Header], [Remote], [Files], [Data], [Figure]. Also [MessageLine] and [ReviewNode] occur.

Variables

The sections [Header], [Remote] and [Files] consist of the header line followed by a number of variable definitions, one per line, of the form name=value. If the value is structured, it consists of a series of items separated by commas. If an item is structured, it consists of a series of subitems separated by semicolons.

The Header section

The Header section consists of the line [Header] followed by variable definitions. For example,
[Header]
Ver=UGF3,200
Lang=SJIS
Crypt=0,PLAIN_UGF_FILE,READ_WRITE
Code=
Title=2013,...
Place=Miyagi Japan
Date=2013/08/01,10:00:00,2013/08/01,22:05:00
Rule=JPN
Size=19
Hdcp=0,6.50
Ptime=J;0;0;0,J;0;0;0,0,0
Winner=W,C
Moves=0
Writer=
Copyright=
CoordinateType=IGS
Comment=
PlayerB=山下 敬吾,七段,0,
PlayerW=...

Below an alphabetical list of the variables encountered so far.

BMemb1=
See PlayerB.
Code=
A label or serial number. Often empty or 0. The first game in the first round of the 2001 World Amateur Go Championship has Code=WA01-101. The first commented game in the PandaNet Library has Code=PLB-0001.
CoordinateType=
Empty or IGS or JPN. For type IGS the 2nd coordinate is flipped compared to SGF.
Comment=
Comment. E.g. 半コウ黒勝ち.
Commentator=
Commentator. E.g. M. Redmond,8-dan,,0. Sometimes the Header and Remote sections each have a Commentator= field.
CommunityAuth=
See RedirectURL.
CommunityID=
See RedirectURL.
Copyright=
The entity asserting copyright over the game record.
Crypt=
Encryption. Two or three fields. The first is 0 or a 4-digit decimal integer. The second is always PLAIN_UGF_FILE. The third, when present, is READ_ONLY or READ_WRITE.
Date=
Start and end date and time of the game.
Hdcp=
Handicap and komi. E.g. 0,5.50 or 0,6.50.
Lang=
Character set (not language!). Seen empty and SJIS. Also when this field is empty, the character set is often SJIS. To be more precise, I encounter CP932, a.k.a. windows-31J, the Microsoft extension to the Japanese standard SJIS. For example, Chen Yaoye 陳耀燁 is spelled with fb 59 for the 3rd radical, and this is in CP932 but not in SJIS.
Moves=
The number of moves (or 0 or 999).
Place=
Place of the event. E.g. Nihon Ki-in Tokyo Japan.
PlayerB=
Black player. In older files one also sees the equivalent BMemb1=. The value is structured with four fields, of which the first two are name and rank (or country). The third is 0 or a serial number.
PlayerB2=
Second Black player, e.g. in pairgo.
PlayerW=
White player. One also sees the equivalent WMemb1=.
PlayerW2=
Second White player.
Ptime=
Probably an indication of time available and byo-yomi system. E.g. I9015;600,I9015;600,JJ0,JJ0 or I;90;15;600,I;90;15;600,90,90 or J;300;60;5,J;300;60;5,292,287 or N;0;10;30,N;0;10;30,0,0. There are four fields of which the first two are structured (and equal in all my examples). Maybe the first two indicate the time available to each player, and the last two the time actually used (in minutes).
RedirectURL=
Some UGF files have only a Header section, with the five variables Ver=, CommunityAuth=, RedirectURL=, CommunityID=, UGIKey=. The RedirectURL gives a URL XORed with a constant string. This is used in cases where some kind of authorization is required.
Rule=
The rule set used. E.g. JPN.
Size=
The board size. E.g. 19 or 9.
Title=
Name of the event. E.g. ,pair go RICOH CUP 2006,Quarterfinal or 2008,国際新鋭囲碁対抗戦,1回戦 or 23rd,JAL Cup World Amateur Go Championship,8th round. We see three parts: year or number in a series, actual name, round.
UGIKey=
See RedirectURL.
Ver=
Version. Usually UGF,100 or UGF2,100 or UGF3,200 or PANDA-EGG.
Winner=
Result of the game (winner and score). E.g. B,2.5 or W,2.50 or B,T or B,C or B,F2 or P,E or P,0. In SGF notation the first five would have been B+2.5, W+2.5, B+T, B+R, B+F. Probably the digit in B,F2 indicates a reason. In this case it was an illegal ko capture. The P,- may be a game that is being played now. For the first field, Jan van Loenen's code also knows about D (draw), O (both lost), N (no result), P (playing), A (abandoned), E (other). Usually each is followed by ,E. A void game with triple ko was marked N1,E.
WMemb1=
See PlayerW.
Writer=
Name of a person, sometimes of a program.

The Remote section

Like [Header], also [Remote] consists of a series of variable definitions. For example
[Remote]
HostName=telnet:live.pandanet.co.jp:28155
Player=
Commentator=
PhotoB=meijin38-1-iyama
PhotoW=meijin38-1-yamashita
PhotoC=
AdvertisingIcon=asahicom-igo
AdvertisingPage=http://www.asahi.com/igo/
Notice=(C) 朝日新聞社
NoticePage=http://www.asahi.com/
AutoSpeed=1000
SendInfo=0
SoundKey=
SoundDLDelay=0
SoundDLHostName=
CloseURLGuest=https://sec.pandanet.co.jp/asp/forms/asahicom/enquete38.asp?k=1
CloseURLMember=https://sec.pandanet.co.jp/asp/forms/asahicom/enquete38.asp?k=1&e=Y
This example had an empty Copyright= field in the Header section, but a copyright notice here. The three fields PhotoB, PhotoW and AdvertisingIcon refer to photographs given in the Files section. Sometimes there is also a PhotoC (of the commentator). Another example:
[Remote]
HostName=IGS-PandaNet
Player=!ricohcup2
...

The Files section

In almost all games I have seen, if there was a [Files] section, there were precisely three entries. The [Remote] section would contain the lines
PhotoB=foofile
PhotoW=barfile
AdvertisingIcon=iconfile
and then the [Files] section would be
[Files]
foofile=FFD8FFE0...
barfile=FFD8FFE0...
iconfile=FFD8FFE0...
where the (long) lines are photographs of the players and an icon, all as hexdump of a JPEG image.

The Data section

The [Data] section gives the moves, one per line, e.g. AL,W2,250,0, where this denotes move number 250, which is W[al], by the 2nd White player. The player color is B or W. The following digit indicates the player. For an ordinary game it will always be 1. For pairgo one expects the pattern 1,1,2,2,1,1,2,2,1,1,2,2,... Larger numbers may occur for a multi-player relay game. Move numbers count from 1. The fourth field gives the time used in seconds.

Coordinates here count from left and bottom when CoordinateType=IGS, and from left and top when CoordinateType=JPN.

A pass is indicated by YA, e.g. YA,B1,221,0. Once seen: YZ,MK,278,0.

Some files have move lines like QO,B1,20,0 '00:01:13 and ZZ,W1,283,0 '00:15:00, where the parts after the quote are per-player cumulative times used.

The MessageLine section

Some of the pairgo files examined had a [MessageLine] section. For example
[MessageLine]
 .Text,121,125,Rotation error from 121 to 124.
indicates that moves 121–124 in
FL,B1,117,0
HK,W1,118,0
EK,B2,119,0
MM,W2,120,0
NM,B2,121,0
NL,W1,122,0
LN,B1,123,0
LQ,W2,124,0
MQ,B2,125,0
LR,W1,126,0
GK,B2,127,0
MR,W2,128,0
NR,B1,129,0
FK,W1,130,0
deviate from the 1,1,2,2 pattern.

The Figure section

For example,
[Figure]
.Text,0,1,0
この対局には観戦記があります
.EndText

.Text,135,2,0
135手まで、
張栩名人が中押し勝ち
.EndText
or
...
.Text,107
Instead of the hane, Black A would have been more severe. The cuts at O-9 and P-6 are miai, so White is in trouble.
.#,13,6,A
.EndText
...
This section gives the per-move comments. There can be .Text and .Fig subsections. Also the Comment section can have .Fig subsections.

A .Fig subsection consists of a series of moves (setup moves with move number 0 and numbered moves) and a .Text subsubsection. The first parameter of .Fig is the move number this figure is related to, but these figures are not variations in the SGF sense of the word. They are just diagrams that have some relation to the current game.

Inside a .Text (sub)subsection, there can be .# lines defining the point referred to.

Mail corrections and additions to aeb@cwi.nl.