Computer program detects author gender
Simple algorithm suggests words and syntax bear sex and genre
stamp.
18 July 2003
PHILIP BALL
 |
| A.S Byatt confuses the computer; will it see through George
Elliot? |
|
|
|
A new computer program can tell whether a book was written by a man or a
woman. The simple scan of key words and syntax is around 80% accurate on both
fiction and non-fiction1,2.
The program's success seems to confirm the stereotypical perception of
differences in male and female language use. Crudely put, men talk more about
objects, and women more about relationships.
Female writers use more pronouns (I, you, she, their, myself), say the
program's developers, Moshe Koppel of Bar-Ilan University in Ramat Gan, Israel,
and colleagues. Males prefer words that identify or determine nouns (a, the,
that) and words that quantify them (one, two, more).
So this article would already, through sentences such as this, have
probably betrayed its author as male: there is a prevalence of plural pronouns
(they, them), indicating the male tendency to categorize rather than
personalize.
If I were female, the researchers imply, I'd be more likely to write
sentences like this, which assume that you and I share common knowledge or
engage us in a direct relationship. These differing styles have previously been
called 'informational' and 'involved', respectively.
Koppel and colleagues trained their algorithm on a few test cases to
identify the most prevalent fingerprints of gender and of fiction and
non-fiction. They then set it searching for these fingerprints in 566
English-language works in a variety of genres, ranging from A Guide to Prague
to A. S. Byatt's novel Possession - which, intriguingly, the programme
misclassified by gender, along with Kazuo Ishiguro's The Remains of the
Day.
Strikingly, the distinctions between male and female writers are much
the same as those that, even more clearly, differentiate non-fiction and
fiction. The programme can tell these two genres apart with 98% accuracy. This
is perhaps unsurprising, given that non-fiction is more informational and
fiction more involved.
Most of the works studied were published after 1975. The Israeli team
now intends to probe whether the differences extend further back in time - and
so whether George Eliot was wasting her time disguising herself with a male nom
de plume - and also whether they occur in other languages.
|