02 September 2016

Learning about ARM architecture using ARMSim#

ARMSim# is a desktop application which simulates the operation of an ARM-based system. It was designed as an educational tool, so for the purpose of learning about how the ARM architecture works, it may be a good choice.

Installation

Go to the ARMSim home page and download the program. There are also optional plugins available to simulate external hardware, such as an eight segment display. However, I didn't download any of these. The ARMSim# 1.91 package includes two standards plugins:

ARMSim# plugins preferences dialog

EmbestBoardPlugin -- simulates the components of the Embest S3CEV40 Evaluation Board.
SWIInstructions -- allows use of SWIs (software interrupts) to perform basic input/output such as reading from stdin and writing to stdout.

Hello, world!

When getting started with a new tool, a traditional programming example is to make a program which prints the string "Hello, world!". In this context, it means we should try to get the string "Hello, world!" to appear on the stdout device, using an SWI call (also known as a supervisor call). Here is one version of the program, based on the code I found in ARM System-on-Chip Architecture, 2nd edition:

; HelloW: ARM assembly language Hello World.
;
        AREA    HelloW, CODE, READONLY
SWI_WriteC      EQU     &0
SWI_Exit        EQU     &11
        ENTRY
START   ADR     r1, TEXT                    
LOOP    LDRB    r0, [r1], #1                ; r0 = *r1; r1 += 1
        CMP     r0, #0
        SWINE   SWI_WriteC                  ; if (r0 != 0) call SWI_WriteC
        BNE     LOOP                        ; if (r0 != 0) goto LOOP
        SWI     SWI_Exit                    ; call SWI_Exit
TEXT    =       "Hello, world!",&0a,&0d,0   ; "Hello, world!\n\r\0"
        END

Here is a brief description for each notation:
  • #n -- Immediate value. e.g. #1 is the value 1.
  • &n -- Hexadecimal constant. e.g. &11 means hexadecimal 11.
  • ; -- Comment introducer. When writing assembly language, I sometimes write C-like pseudocode in comments to remind myself of the meaning of the assembly code.
  • [ reg ] -- Dereference operator. reg is taken to be the address of a location in memory.
  • ADR ("address to register")
  • CMP ("compare")
  • EQU ("equivalent") -- Defines a constant.
  • LDRB ("load register byte") -- Load one byte into a register.
  • SWI ("software interrupt")
  • SWINE ("software interrupt if not equal")

The method used to print the string is as follows:

While there is another character:
   Read the next character into r0. (SWI_PrChr looks in r0)
   Call SWI_PrChr.
Call SWI_Exit to finish.

To get this program to work in ARMSim, I adapted the code to read as follows:

; HelloW: ARM assembly language Hello World.
; Adapted for running within the ARMSim# simulator.
;
        .equ SWI_WriteC, 0x00       ; This is called SWI_PrChr in the
                                    ; ARMSim# documentation.
        .equ SWI_Exit, 0x11         ; For numerical codes to define these
                                    ; standard functions, see the ARMSim#
                                    ; documentation ARM-Tutorial.pdf.
        .text                       ; AREA HelloW,CODE,READONLY
        .global _start              ; ENTRY
_start:                             ; START
        ldr     r1, =TEXT           ; ADR r1, TEXT
                                    ; See ARMSim# User Guide p23.
LOOP:   ldrb    r0, [r1], #1        ; r0 = *r1; r1 += 1
        cmp     r0, #0
        swine   SWI_WriteC          ; if (r0 != 0) call SWI_WriteC
        bne     LOOP                ; if (r0 != 0) goto LOOP
        swi     SWI_Exit
TEXT:   .ascii "Hello, World!\012\015\000"
        .end                        ; END

Observations

AREA -- This directive is not used in ARMSim.

EQU -- In ARMSim, the .equ directive is used instead.
SWI_WriteC and SWI_Exit are names that the author gave to SWIs 0x0 and 0x11, respectively. SWI 0x0's purpose is to output the character in r0 to stdout, and SWI_Exit is used to finish the program. In ARMSim, the same interrupt numbers are used for the same purpose and are documented in the ARMSim documentation.

ENTRY -- The closest translation of this in ARMSim seems to be to declare a label called _start near the beginning and to mark it using the .global directive. Notice that labels in the original version are declared by simply writing a label name in the first column. In the ARMSim version, labels are declared using C-style label declarations, with a colon following the label's name.

ADR r1, TEXT -- In ARM Assembly language, ADR (address-to-register) is a pseudo instruction used to tell the assembler to generate binary codes which causes the memory address TEXT to become loaded into the specified register r1. In ARMSim, ldr is used for this purpose instead. Notice also that the memory address must be prefixed by an = sign in this version.

TEXT = ... -- In the ARMSim version, I used the label syntax for this purpose along with an .ascii directive. ARMSim does not understand the &0a,&0d,0 notation (line feed, carriage return, zero), so I wrote this using C-style octal escape sequences in a string "\012\015\000", which has the same meaning as the original byte sequence.

& (ampersand) -- In some places where a hexadecimal constant would have appeared using the & notation, I have replaced it with a C-style 0x... (hexadecimal constant) notation in the ARMSim version. For example, the line

SWI_Exit   EQU    &11

in the original program becomes the following when translated to ARMSim Assembly:

.equ SWI_Exit, 0x11

END -- In ARMSim, this is written .end.

Running the program from the ARMSim GUI

To run the program interactively from ARMSim, use the File → Load menu command to load the assembly file into the GUI. If there are assembly errors, they will be reported on the GUI. If this happens, make appropriate changes in your text editor (the ARMSim GUI itself includes no editor) and then click File → Reload to see if the changes are acceptable to ARMSim.

ARMSim# after loading a source file without errors

After the program is deemed to be error-free, the GUI commands for running and debugging the program become available.

Running from the command line

To run the program from a command line, use a command such as

armsim /Stdout:stdout.txt HelloW.s

where stdout.txt is the file name where you want stdout to be stored, and HelloW.s is the source file where you saved the Hello World assembly language source. According to the documentation, there is a way to write a script file (using C#) that can perform functions such as reading a location in memory in order to implement automated tests. However, I could yet not get this facility working, so I'm relying on the GUI to run my code.

ARM instruction mnemonics

Even in this simple program I noticed what seems to be a characteristic of the ARM instruction set, that instructions can be modified using instruction modifier letters. The notation for this leads to some strange looking mnemonics, such as

ble
addgt
swine

The way to read these mnemonics is to identify the base instruction mnemonic and then pronounce the modifying letters separately. For example,

ble -- branch if less than or equal
addgt -- add if greater than
swine -- software interrupt if not equal

There are many combinations possible; I'm sure a comprehensive list can be found in an appropriate manual from ARM. However, for learning purposes, I find it sufficient to experiment by running the code in question in ARMSim using the Step Into and Step Over debugging commands.

Output a register's value in hexadecimal

For the second test program, we assume that we have an integer stored in a (32-bit) register, and we would like to print a hexadecimal representation of the register to stdout. For example,

Let r1 = 0x2468ABCD.

To do the conversion, the following procedure is used:

While r1 != 0:
   Read the top nibble from the register. (1)
   If the nibble is in 0...9, then output an appropriate digit character '0' through '9'.
   If the nibble is in A...F, then output an appropriate capital letter 'A' through 'F'.
   Shift the register left by one nibble.

(1) Recall that a nibble refers to a 4-bit group of binary digits or a single hexadecimal digit; the top nibble in this context refers to the leading or leftmost hexadecimal digit -- the 2.

Since we are using ASCII for output, looking at an ASCII table confirms the following relevant facts:

Decimal 48 is the digit '0'.
Decimal 65 is the letter 'A'.

If n is the nibble and we are to output a digit (case 1 in the above pseudocode), then we are to output the ASCII character n + 48. If we are to output a capital letter (case 2), then we are to output the ASCII character n + 65 - 10 = n + 55. For example, suppose we need to output the letter 'B'. That implies n = 11, and so the ASCII character to output is n + 55 = 66 = 'B', as expected.

; Hex_Out: Dump register to stdout in hexadecimal.
;
; To get the test value into a register, one way is to use ldr with an .equ 
; assignment. e.g.
;
;   .equ   VALUE, 0x12345678
;    ...
;    ldr   r1, =VALUE
;
; Another way is to place the test value into memory somewhere, load the 
; address into a register, and then dereference that address. This is the way
; I used in the code below.
;
        .equ SWI_WriteC, 0x00
        .equ SWI_Exit, 0x11

        .text                       ; AREA Hex_Out,CODE,READONLY
        .global _start              ; ENTRY
_start:
        ldr     r1, =VALUE          
        ldr     r1, [r1]            ; r1 = *r1
        bl      HexOut              ; call HexOut
        SWI     SWI_Exit
        
HexOut: mov     r2, #8              ; r2 = nibble_count = 8
loop:   mov     r0, r1, lsr #28     ; Copy r1 to r0, but shift it right by 28 
                                    ; bits as the copy is made. This causes 
                                    ; the high nibble of r1 to be placed into
                                    ; the low nibble of r0.
        cmp     r0, #9
        addle   r0, r0, #48         ; if (r0 <= 9) r0 = r0 + 48
        addgt   r0, r0, #55         ; if (r0 > 9) r0 = r0 + 55
        swi     SWI_WriteC
        mov     r1, r1, lsl #4      ; r1 = (r1 << 4)
                                    ; This shift sets up r1 for the next loop
                                    ; iteration, so that the next nibble to
                                    ; print is now in the high nibble of r1.
        subs    r2, r2, #1          ; nibble_count -= 1
        bne     loop                ; if (nibble_count != 0) goto loop
        mov     pc, lr              ; return

        .data
VALUE:  .word   0x2468ABCD
        .end

Observations

bl stands for branch and link and is basically a way of implementing a subroutine call. If you set a breakpoint at point (1) below, you will notice that the call to the subroutine causes the address of point (2) (the next instruction after the bl line) to be placed into lr (the link register), which allows us to return later.

(1) bl HexOut
(2) ???

You may have noticed that the registers r10 through r15 may be used for special purposes and have aliases to suggest their use. For example, lr is an alias for r14. pc (program counter) is an alias for r15. When speaking about a special register, either name can be used in Assembly language programs.

subs (subtract and set condition codes) shows another example of using instruction modifier letters. The base instruction mnemonic is sub (subtract). But appending an s to the mnemonic tells the processor to set the condition codes in the CPSR (Current Program Status Register).

  • Would the Hex_Out program still work if subs were changed to sub in the above program? What changes, if any, would be necessary to ensure correct operation in that case?

Bit shifting can be performed as part of another instruction. For example, the line

mov r0, r1, lsr #28

moves r1 into r0, but also performs a left shift (lsr) as part of the same instruction. The mnemonics for bit shifting operations are as follows:

  • ASL (arithmetic shift left) -- Synonym for LSL.
  • ASR (arithmetic shift right) -- Vacated bits will be filled with sign bits.
  • LSL (logical shift left)
  • LSR (logical shift right) -- Vacated bits will be filled with zeros.


The notation for performing a bit shift as part of an instruction is sometimes associated with the term barrel shifter, which refers to a way that a digital circuit can be implemented.

References

ARM
ARMSim# home page
ARM System-on-Chip Architecture, 2nd edition [goodreads.com]
Embest S3CEV40 EVB - User Guide [PDF]


Files

armsim_examples-0.0.1.zip


2 comments: