A Short Primer on Assembers, Compilers and Interpreters

Wednesday, 03 Oct 2018

by ejo

Beginnings

In the early days of computing, hardware was expensive and programmers were cheap. In fact, programmers were so cheap they weren’t even called “programmers” and were in fact usually mathematicans or electrical engineers. Early computers were used to solve complex mathematical problems quickly, so mathematicans were a natural fit for the job of “programming”.

First a little background on what a program is.

Computers can’t do anything by themselves, they require programs to drive their behavior. Programs can be thought of as very detailed recipes that take an input and produce an output. The steps in the recipe are composed of instructions that operate on data. While that sounds complicated, you probably know how this statement works:

1 + 2 = 3

The plus sign is the “instruction” while the numbers 1 and 2 are the data. Mathematically, the equal sign indicates that both sides of an equation are “equivalent”, however most computer languages use some variant of equals to mean “assignment”. If a computer were executing that statment, it would store the results of the addition, the “3”, somewhere in memory.

Computers know how to do math with numbers and move data around the machine’s memory heirarchy. I won’t say too much about memory except to say it generally comes in two different flavors: fast/small, and slow/big. CPU registers are very fast, very small and act as scratch pads. Main memory is typically very big and not nearly as fast as register memory. CPUs shuffle the data they are working with from main memory to registers and back again while a program executes.

Assembler

Computers were very expensive and people were cheap. Programmers spent endless hours translating hand written math into computer instructions that the computer could execute. The very first computers had terrible user interfaces, some only consisting of toggle switches on the front panel. The switches represented 1s and 0s in a single “word” of memory. The programmer would configure a word, indicate where to store it and then commit the word to memory. It was time consuming and error prone.

Eventually, an electrical engineer decided his time wasn’t cheap and wrote a program whose input was a recipe expressed in terms people could read and output a computer readable version. This was the first “assembler” and it was very controversial. The people that owned the expensive machines didn’t want to “waste” compute time on a task that people were already doing; albeit slowly and with errors. Over time, people came to appreciate the speed and accuracy of the assembler versus a hand-assembled program and the amount of “real work” done with the computer increased.

While assembler programs were a big step up from toggling bit patterns into the front panel of a machine, they were still pretty specialized. The addition example from above might have looked something like this:

   01  MOV R0, 1
   02  MOV R1, 2
   03  ADD R0, R1, R2
   04  MOV 64, R0
   05  STO R2, R0

Each line is a computer instruction, beginning with a shorthand name of the instruction followed by the data the instruction works on. This little program will first “move” the value 1 into a register called R0, then 2 into register R1. Line 03 adds the contents of registers R0 and R1 and stores the resulting value into register R2. Finally, lines 04 and 05 identify where the result should be stored in main memory (address 64). Mananaging where data is stored in memory is one of the most time consuming and error-prone parts of writing computer programs.

Compiler

Assembly was much better than writing computer instructions by hand, however early programmers yearned to write programs like they were accustomed to writing mathematical formulae. This drove the development of higher level compiled languages, some of which are historical footnotes and others are still in use today. Algo is one such footnote, while real problems continue to be solved today with languages like FORTRAN and C.

These new “high level” langagues allowed programmers to write their programs in simpler terms. In the C language, our addition assembly program would be written as:

    int x;
    x = 1 + 2;

The first statement describes a piece of memory that the program will use. In this case, the memory should be the size of an integer and it’s name is ‘x’. The second statement is the addition, although written “backwards”. A C programmer would read that as “X is assigned the result of one plus two”. Notice the programmer doesn’t need to say where to put ‘x’ in memory, the compiler takes care of that.

A new type of program, called a “compiler”, would turn the program written in a high level language into an assembly language version and then finally run it thru the assembler to produce a machine-readable version of the program. This composition of programs is often called a “tool chain”, in that one program’s output is sent directly to another program’s input.

The huge advantage of compiled languages over assembly language programs was porting from one computer model or brand to another. In the early days of computing there was an explosion of different types of computing hardware from companies like IBM, Digital Equipment Corporation, Texas Instruments, UNIVAC, Hewlet Packard and others. None of these computers shared much in common besides needing to be plugged in to an electrical power supply. Memory and CPU architectures differed wildly and it often took man-years to translate programs from one computer to another.

With high level languages, it was only necessary to port the compiler tool chain to the new platfrom. Once the compiler was available, high level language programs could be re-compiled for the new computer with little or no modification. Compliation of high level languages was truly revolutionary.

Life was very good now for programmers. It was much easier to express the problems they wanted to solve using high level languages. The cost of computer hardware was falling dramatically due to advances in semiconductors and the invention of integrated chips. Computers were getting faster and more capable in addition to become much less expensive. At some point, in the late 80s possibly, there was an inversion and programmers became more expensive than the hardware they used.

Interpreter

Over time a new programming model arose where a special program called an “interpreter” would read a program and turn it into computer instructions to be executed immediately. The interpreter takes the program as input and interprets it into an intermediate form, much like a compiler. Unlike a compiler, the interpreter then executes the intermediate form of the program. This happens every time an interpreted program runs, whereas a compiled program is only compiled one time and the computer only has to execute the machine instructions “as written”.

As a sidenote, when people say that “interpreted programs are slow”, that is the main source of the perceived lack of performance. Modern computers are so amazingly capable that most people aren’t usually able to tell the difference between compiled and interpreted programs.

Interpreted programs, sometimes called “scripts”, are even easier to port to different hardware platforms. Because the script doesn’t contain any machine specific instructions, a single version of a program can run on many different computers without change. The catch of course is the interpreter must be ported to the new machine to make that possible.

One example of a very popular interpreted language is Python. A complete python expression of our addition problem would be:

   x = 1 + 2

While it looks and acts much like the C version, it lacks the variable initialization statement. There are other differences which are beyond the scope of this article, but you can see that we are able to write a computer program that is very close to how a mathematician would write it by hand with pencil and paper.