Consider the UK political landscape.
Because of the ancient voting system it has a tendency to produce a small number of parties, two large parties, and a small number of regional and other parties.
Because there is such a small number of parties, the two main parties tend to be very broad, each a sort of pre-arranged coalition of interests.
Normally the UK parties are described on a left-right axis
Left..............Centre..............Right Labour Libdem Tory→
Because there are a large group of people who would never vote Tory, and another large group who would never vote Labour, the parties tend to drift towards the centre where the voters who change their voting choice are situated.
You could describe the British parties by a position representing (approximately) where they are located on this left-right axis from -1 to 1:
Labour: -0.25
Libdem: 0
Tory: 0.7
Another axis might reflect their current position on Europe:
Anti...................................Pro Tory Labour Libdem
Tory: -1
Labour: 0
Libdem: 1
You could then create a two-dimensional idea of the parties by combining these axes:
Labour: (-0.25, 0)
Libdem: (0, 1)
Tory: (0.7, -1)
There is nothing essential to using -1 to +1 as the numbers.
You could just as well use 0 to 1 with the same effect, with 0.5 representing 'in the middle':
Labour: (0.375, 0.5)
Libdem: (0.5, 1)
Tory: (0.85, 0)
More modern voting systems allow a greater range of parties.
For instance The Netherlands had 25 parties at the last election, of which 15 got elected.
It is less informative to display them just on a left-right axis.
One way they are displayed there is on two axes: left-right, progressive-conservative
So you could represent the parties on this diagram by a position of two coordinates. For instance, D66, about the same as the UK Libdems, is at roughly (0, 0.5).
The CDA and the VVD are very close on the above diagram, both similar to the (pre-Brexit) Conservatives, but the CDA are Christian, and the VVD secular.
So you could add another dimension of religion.
Two parties considered themselves close enough to coalesce, at least for the election, The Dutch Labour Party, and the Green-Left party, where the main difference was on the environment.
So you could add environment as a dimension. Or Europeanism vs Nationalism.
Similarly there's a party for older people, and one for animal rights, and so on.
The website that produced the above image helps voters discover who they should vote for.
They ask 30 questions, and on that basis say which parties you are closest to.
This means that they use 30 dimensions to represent the parties, so really the 'semantics' of a party is a list of 30 numbers.
Your position is also a list of 30 numbers, and then a good match is the party that is the 'nearest' to you in those 30 dimensions.
You could subtract the lists of numbers for two parties, and get a list of numbers that would expose the differences in approach between them, or between a party and you.
We are very bad at visualising anything above 3 dimensions, so they reduce the picture to the two above.
Computers don't have that problem, so they can find clusters, and tell you the semantic 'distance' you are from various parties.
This is the basis of the method that GPT programs represent the meaning of words: each word has a list of numbers, each number representing that word's position on a particular meaning axis.
Words that are synonyms, or near synonyms are then close to each other in the semantic space.
There are two notable things:
The axes likely include male-female, big-small, young-old, singular-plural, and so on, but because machine learning is so good at spotting patterns that we can't even see, there are probably axes that we don't even have a name for.
There are interesting properties of those lists of numbers: you can do a sort of arithmetic on them.
For instance, you can subtract Woman from Man:
D = Man - Woman
the resulting list of numbers then represents the semantic 'distance' between the words Man and Woman. The extraordinary thing is that you can do things with this difference. For instance
Father + D
gives you a position very close to Mother.
Similarly
Uncle + D
gives you a position very close to Aunt.
Another example is
F = Italy - Pizza
If you add F to Germany
Germany + F
you get a position very close to Bratwurst.
So when GPTs produce the next word, they don't just do it on the basis of syntax (as we have been doing up to now), they also use meaning to help choose the next word.