4.2.2 TYPES OF PROGRAMMING LANGUAGES (CIE)

4.2.2 | ASSEMBLY LANGUAGE AND MACHINE CODE

Topics from the Cambridge IGCSE (9-1) Computer Science 0984 syllabus 2023 - 2025

OBJECTIVES
Understand that assembly language is a form of low-level language that uses mnemonics, and that an assembler is needed to translate an assembly language program into machine code.

ALSO IN THIS TOPIC
4.1.1 TYPES OF SOFTWARE AND INTERRUPTS
4.1.2 TYPES OF SOFTWARE AND INTERRUPTS
4.1.3 TYPES OF SOFTWARE AND INTERRUPTS
4.1.4 TYPES OF SOFTWARE AND INTERRUPTS
4.2.1 TYPES OF PROGRAMMING LANGUAGES
YOU ARE HERE | 4.2.2 TYPES OF PROGRAMMING LANGUAGES
4.2.3 - 4.2.4 TYPES OF PROGRAMMING LANGUAGES
4.2.5 TYPES OF PROGRAMMING LANGUAGES
SOFTWARE REVISION CARDS
TOPIC 4 KEY TERMINOLOGY
TOPIC 4 ANSWERS
TOPIC 4 TEACHER RESOURCES

LANGUAGE DEVELOPMENT

Programming languages have changed a lot over the last 60 years. Although one of the first computer program or algorithm design to be machine computed was made by Ada Lovelace in the 1840s, which was nearly 100 years before the first computer was built. Later in the mid 20th century with the introduction of computers many programs were first written in machine code, a programmer would firstly write the algorithm in structured English and then convert the instructions to machine code. Machine code is not easy, a program to output 'Hello World' in machine code would look like this:

0x55 0x89 0xe5 0xe8 0xfc 0xff 0xff 0xff
0x83 0xf8 0x41 0x75 0x68 0x0d 0x00 0x00
0x00 0x00 0xe8 0xfc 0xff 0xff 0xff 0x83
0xb8 0x00 0x04 0xc4 0x00 0x00 0x00 0x89
0xec 0x5d 0xc3
CODE FROM: helperByte

ASSEMBLY LANGUAGE

Because machine code is not easy and would make it very difficult to write complex programs computer scientist started to develop more easy to use/understand languages, one of the first steps is Assembly language which is machine based instructions that use mnemonics to make it easier than machine code for humans to understand. Assembly language and every other language we use today still needs to go through an Assembler to be translated to machine code for the computer to process. The code shown below adapted from Towards Data Science is an example program to output 'Hello world' in Assembly language.

Assembly language is a low-level computer programming language that provides a direct correspondence between the language's mnemonics and the machine code instructions executed by the computer's central processing unit (CPU). Unlike higher-level programming languages, which are translated into machine code by compilers, assembly code must be translated into machine code by an assembler. The mnemonics used in assembly language serve as shorthand for specific machine code instructions, making it easier for the programmer to remember and write the code. Despite its advantages in terms of control over the hardware and performance optimization, assembly language is considered a low-level language due to its limited readability and maintainability compared to higher-level languages.

A mnemonic is a memory aid used to help recall information more easily. In the context of computer programming, a mnemonic is a symbol or abbreviation that represents a machine instruction or operation in an assembly language. Mnemonics are used to make assembly code more readable and easier to remember, as they serve as a shorthand representation of the underlying machine instruction. For example, the mnemonic "MOV" is used to represent the instruction to move data from one location to another, while "ADD" represents the instruction to add two values together. The use of mnemonics in assembly language enables the programmer to write code that is easier to understand and maintain, compared to writing raw machine code.

MACHINE CODE

Machine code, also known as machine language or assembly code, is the lowest-level programming language that is used to write instructions for a computer's central processing unit (CPU) to execute. Machine code consists of binary digits (0s and 1s) that the CPU can directly interpret and execute, without the need for further translation or interpretation. Each machine instruction is represented by a unique sequence of binary digits, and the CPU decodes these binary instructions to perform the desired operations. Unlike higher-level programming languages, which are more abstract and easier for humans to read and write, machine code is difficult for humans to understand and is often considered the most primitive form of computer programming. The use of machine code is typically reserved for specialized applications, such as writing device drivers or firmware, where the programmer needs to have a direct control over the hardware and maximize performance.

What is the main difference between high-level and low-level programming languages?
Can you provide an example of a commonly used high-level programming language?
What is the advantage of using a high-level programming language over a low-level language?
What type of tasks are low-level programming languages commonly used for?
Can you explain the concept of abstraction in relation to programming languages?
How does the level of a programming language impact the performance of a program?
Why might a programmer choose to use a low-level programming language over a high-level one?
Can you provide an example of a low-level programming language commonly used for system programming?
What are the disadvantages of using a low-level programming language?
How does the choice of programming language impact the development process?

EXTENDED LEARNING

THE PROCESS FROM HIGH LEVEL TO LOW LEVEL

The process of converting source code to low-level code involves several steps, typically including compilation, assembly, and linking. Here’s a step-by-step explanation of how high-level programming languages are translated into machine code that a computer can execute:

Preprocessing |Before the actual compilation process begins, the preprocessor takes the source code and processes directives such as #include in C/C++. These directives instruct the preprocessor to include other files, define macros, or conditionally compile parts of the program.
Compilation | The compiler translates the high-level source code into an intermediate form known as assembly language. This step involves parsing the code (checking syntax and building the structure of the code into a parse tree), and then generating a corresponding assembly language code for each instruction. The output at this stage is a set of assembly instructions that are specific to the target processor architecture but still readable by humans.
Assembly | The assembler takes the assembly language code and translates it into machine code, which consists of binary instructions that are specific to the processor’s instruction set architecture (ISA). The output of this step is an object file, which contains machine code but might not yet be executable because it can include placeholders for addresses that need to be resolved.
Linking | Finally, the linker takes one or more object files produced by the assembler and combines them into a single executable program. During this process, the linker resolves references to external libraries or other parts of the program, assigns final memory addresses to functions and variables, and fixes the placeholders left by the assembler. The result is a standalone executable that can be run on the operating system.

Each of these steps is critical for turning human-readable source code into a form that the computer hardware can directly execute. Different programming languages and development environments might optimise or change certain aspects of this process, but fundamentally, these are the steps involved in going from source code to executable machine code.

CREATING AN EXE FILE

LINKING
After the source code is preprocessed, compiled into assembly, and then assembled into machine code (creating object files), the linker steps in. The linker's job is to take these object files and possibly static libraries (collections of object files) and produce a single executable file. During this process, the linker:

Resolves References | The linker looks at all the external function calls or references to external variables in the object files and finds out where their definitions are. This could be in other object files or in libraries included in the linking process.
Assigns Memory Locations | It assigns final memory addresses to all of the functions and variables. This includes adjusting the code within the object files so that all calls and references to functions and variables point to the correct addresses.
Combines Object Files | All the object files are combined into a single file. If there are any libraries that are statically linked, their code is also included directly into the final executable.
Creates an Executable Header | The linker adds a header to the file, which includes important metadata that the operating system needs to load and run the executable. This includes the type of executable, start of the code, start of the data, size of the executable, and more.

The output from this process is the .exe file on Windows, .app on Mac, which is ready to be executed directly by the operating system. This file contains all the binary instructions, data, and resources needed to run the program as specified by the original high-level source code.

ALSO IN THIS TOPIC

4.1.1 TYPES OF SOFTWARE AND INTERRUPTS
4.1.2 TYPES OF SOFTWARE AND INTERRUPTS
4.1.3 TYPES OF SOFTWARE AND INTERRUPTS
4.1.4 TYPES OF SOFTWARE AND INTERRUPTS
4.2.1 TYPES OF PROGRAMMING LANGUAGES
4.2.2 TYPES OF PROGRAMMING LANGUAGES
4.2.3 - 4.2.4 TYPES OF PROGRAMMING LANGUAGES
4.2.5 TYPES OF PROGRAMMING LANGUAGES
SOFTWARE REVISION CARDS
TOPIC 4 KEY TERMINOLOGY
TOPIC 4 ANSWERS
TOPIC 4 TEACHER RESOURCES