To tell the history of programming languages, we first have to talk about the history of computers.

The first computer so cheap that they gave it away on the cover of a magazine
The Elliot ran for about a decade, 24 hours a day.
How long do you think it would take the Raspberry Pi Zero to duplicate that amount of computing?
The Elliot ran for about a decade, 24 hours a day.
How long do you think it would take the Raspberry Pi Zero to duplicate that amount of computing?
The Raspberry Pi is about one million times faster...
The Raspberry Pi is not only one million times faster. It is also one millionth the price.
A factor of a million million times better.
A terabyte is a million million bytes: nowadays we talk in terms of very large numbers.
Want to guess how long a million million seconds is?
The Raspberry Pi is not only one million times faster. It is also one millionth the price.
A factor of a million million times better.
A terabyte is a million million bytes: nowadays we talk in terms of very large numbers.
Want to guess how long a million million seconds is?
A really big number...
In fact a million million times improvement is about what you would expect from Moore's Law over 58 years.

In 1965 Gordon Moore proposed that the number of 'components' on a chip would double per year at constant price (and size).
In 1975, he adjusted it to 18 months.
In 2025 Moore's Law has turned 60 years old.
Or: Moore's Law is 40 iterations of itself old.
The first time I head that Moore's Law was nearly at an end was in 1977. From no less than Grace Hopper, at Manchester University.
Since then I have heard many times that it was close to its end, or even has already ended (including recently again).
People get confused about what Moore's Law says. The orange line on top is Moore's Law, not the others.

So a million million times improvement is about what you would expect from Moore's Law over 58 years.
Except: the Raspberry Pi is two million times smaller as well, so it is much better than even that.
This probably has to do with how early computers were priced.
When Edison introduced electric light
in the home, he charged not what it cost to produce the electricity, but what
it would cost you to produce the same light without electricity, but now with
extra advantages, such as instant-on, and less danger.
Similarly, the first computers were not priced on how much they cost to produce, but on how much you would have to pay to do the same work without computers.
In the 50's, computers were so expensive that nearly no one bought them, nearly everyone leased them.
To rent time on a computer then would cost you of the order of $1000 per hour: several times the annual salary of a programmer!
When you leased a computer in those days, you would get programmers for free to go with it.
Compared to the cost of a computer, a programmer was almost free.
In the 50's since
the computer's time was expensive, a programmer would:
Why? Because it was much cheaper to let 3 people check it, than to let the computer discover the errors.
In the very beginning, there was only machine code.
Each machine type had different instruction codes.
Each machine instruction is a combination of instruction code and operand(s).
001 5635 015 1 002 5635
where the first number is the operation, and the second the operand.
In the very early days, programmers had to memorise those codes, deal with memory storage, and write entirely in numerical form.
It is obvious that it is easier to program without having to memorise those codes.
It is also obvious that it is easier if the computer assigns addresses to storage locations for you.
LOAD I ADDC 1 STOR I
but in fact there were people at the time who thought this was a complete waste of (expensive) computer time, when the programmer could do it just as well, for far less money.
Since every machine type was different, if you wanted to move a program to a different machine, you had to translate it from the one machine code to another.
So one advantage of inventing programming languages was that you could more easily move the programs, share them, or sell them. Also they were much easier to read.
There were still people who thought it was a waste of computer resources, and that they could personally produce better code than the compiler (right into the 70's!)
The first programming languages were designed in the 50s, with the economic relationship of computer and programmer in mind.
It was much cheaper to let the programmer spend lots of time producing a program than to let the computer do some of the work for you.
Programming languages were designed so that you tell the computer exactly what to do, in its terms, not what you want to achieve in yours.
In the mid 50's the first important programming languages were:
Business applications: used on mainframes, in industries like banks, aviation, insurance, government, and healthcare.
"English-like"
Still today widely in use (in 300+ versions), maybe 800 billion lines of code still in use.
Big problems now with maintenance, and programmers retiring.
EVALUATE FUNCTION UPPER-CASE(INPUT-1) CALC1
WHEN "END"
MOVE "END" TO INPUT-1
WHEN "LOAN"
PERFORM CALCULATE-LOAN
WHEN "PVALUE"
PERFORM CALCULATE-VALUE
WHEN OTHER
DISPLAY "Invalid input: " INPUT-1
END-EVALUATE.
Designed for 'scientific programming'.
Since been modernised (several times).
Still in use in some specialist cases, but mostly replaced by more modern languages.
90 WRITE (5,100)
100 FORMAT(1X,'Would you like to review the data?'$)
READ (5,110) LITERL(1)
110 FORMAT(A1)
IF (LITERL(1).EQ.'N') GOTO 140
DO 130 I=1,N
WRITE (5,120) I, X(I), Y(1,I)
120 FORMAT(1X,'DATA PAIR',I3,'). ',2F20.10)
130 CONTINUE
140 WRITE (5,150)
150 FORMAT(1X,'Would you like to make any changes?'$)
READ (5,110) LITERL(1)
IF (LITERL(1).EQ.'N') GOTO 200
Functional programming: a major area of study.
Versions of Lisp still widely used in some special areas.
(defun prime(n)
(and
(> n 1)
(forall (nums 2 (floor (sqrt n)))
#'(lambda (divisor) (not (= (mod n divisor) 0)))
))))
For describing algorithms. "Machine independent".
Developed partly at CWI in Amsterdam, where the first Algol compiler was written.
Most modern programming languages are descended from Algol, including C, Python, Java, Javascript, and many many more.
procedure Transpose(a)Order :(a) ; value n ;
array a ; integer n ;
begin real w ; integer i, k ;
for i := 1 step 1 until n do for k := l+i step 1 until n do
begin w := a[i,k] ;
a[i,k] := a[k,i] ;
a[k,i] := w
end
end Transpose
This was the age of hundreds of new programming languages being researched and defined. For instance
C: for low-level systems programming, in particular the operating system Unix
Pascal: a newer version of Algol
Basic: a very simple and easy-to implement interactive language, running on very cheap and slow machines
10 INPUT "What is your name: "; U$ 20 PRINT "Hello "; U$ 30 INPUT "How many stars do you want: "; N 40 S$ = "" 50 FOR I = 1 TO N 60 S$ = S$ + "*" 70 NEXT I 80 PRINT S$ 90 INPUT "Do you want more stars? "; A$ 100 IF LEN(A$) = 0 THEN GOTO 90 110 A$ = LEFT$(A$, 1) 120 IF A$ = "Y" OR A$ = "y" THEN GOTO 30 130 PRINT "Goodbye "; U$ 140 END
10% of the cost of programming is writing the code.
90% is debugging.
The number of bugs grows not with the number of lines of code, but with L1.5
If a program is 10 times longer, it is 30 times as difficult to write.
Put another way: if a program is 1/10th the size, it costs 3% of the larger program.
Compact programming languages have fewer bugs!
Languages that report errors before running have fewer bugs.
Low-level::Medium-level::High-level
Compiled::Interpreted
Checks before running::dynamic
Simple data::structured data
Procedural::Functional
You know from mathematics:
(a+b)² = a² + 2ab + b²

Some versions of functional programming are like mathematics.
I have some lists
a = {1; 2; 3}
b = {3; 4}
c = {6; 7; 8; 9}
If I have a list, I can get its size:
#a = 3
I can concatenate lists:
a ++ b = {1; 2; 3; 3; 4}
I can also have lists of lists:
{a; b; c}
I can get a list of their sizes by saying
#*{a; b; c} = {3; 2; 4}
This is called a map.
If I have a list of numbers, I can add them up
+/{3; 2; 4} = 9
This is called a reduce.
So I can find the sum of the sizes of the lists by saying:
+/#*{a; b; c} = 9
But I can also combine the lists:
++/{a; b; c} = {1; 2; 3; 3; 4; 6; 7; 8; 9}
to concatenate the three lists into one.
And so I can also say
#++/{a; b; c} = 9
So in other words, these two are equivalent
+/#*list = #++/list
In functional programming you can prove these are equal.
Most languages now in use are based on Algol. For instance
Java: compiled, checks; mostly for applications
Javascript: interpreted, no checks; mostly for web pages
Python: interpreted, some checks; for applications. Developed at CWI in Amsterdam.
It happened slowly, almost unnoticed, but nowadays we have the exact opposite position:
Compared to the cost of a programmer, a computer is almost free.
I call this Moore's Switch.

Relative costs of computers and programmers, 1957-now
But, we are still programming in programming languages that are direct descendants of the languages designed in the 1950s!
We are still telling the computers what to do.
A new way of programming: declarative programming.
This describes what you want to achieve, but not how to achieve it.
Declarative approaches describe the solution space.
A classic example is when you learn in school that
The square root of a number is a number that multiplied by itself gives the original number.
This doesn't tell you how to calculate the square root; but no problem, because we have machines to do that for us.
function f a: {
x ← a
x' ← (a + 1) ÷ 2
eps ← 1.19209290e-07
while abs(x − x') > eps × x: {
x ← x'
x' ← ((a ÷ x') + x') ÷ 2
}
return x'
}
This is why documentation is so essential in procedural programming, because there is such a large gap between the problem space and the solution space.

1000 lines, almost all of it administrative. Only 2 or 3 lines have anything to do with time.
And this was the smallest example I could find. The largest was more than 4000 lines.
type clock = (h, m, s) displayed as circled(combined(hhand; mhand; shand; decor)) shand = line(slength) rotated (s × 6) mhand = line(mlength) rotated (m × 6) hhand = line(hlength) rotated (h × 30 + m ÷ 2) decor = ... slength = ... ... clock c c.s = system:seconds mod 60 c.m = (system:seconds div 60) mod 60 c.h = (system:seconds div 3600) mod 24

XForms is a declarative programming language for defining applications.
I've got a position in the world as x and y coordinates, and I want to display the map tile of that location at a certain zoom.
My data:
x, y, zoom
Openstreetmap has an interface for getting such a thing:
http://openstreetmap.org/<zoom>/<x>/<y>.png
However, the Openstreetmap coordinate system changes at each level of zoom.
As you zoom out, there are in each axis half as many tiles, so there are ¼ as many tiles. And the interface indexes tiles, not locations.
So to get a tile:
The data: x, y, zoom
scale = 226 - zoom
tilex = floor(x/scale)
tiley = floor(y/scale)
url = concat("http://tile.openstreetmap.org/", zoom, "/", tilex, "/", tiley, ".png")
That is really all that is needed (except the syntax, which looks like this:)
<bind ref="tilex" calculate="floor(../x div ../scale)"/>
That's the form. Now the content:
<input ref="zoom" label="zoom"/> <input ref="x" label="x"/> <input ref="y" label="y"/> <output ref="url" mediatype="image/*"/>
and the tile will be updated each time any of the values change.
Programming languages have been around for 70 years.
There is still reason to design new ones, in particular to make life for the programmer easier.