Writing an OS: 0x02

What we have, where to go

Last post I discussed the plan for getting started and now I want to show off some code in this post as well as some (possibly) simple build instructions for those of you wanting to try things out.

I am using Git for version control (aka undo on steroids) and am hosting with Drew Devault's sourcehut. If you are new to Git, I recommend reading the documentation on the previously linked Git page as well as Git for Ages 4 and up put on by Michael Schwern.

Get with Git

git clone https://git.sr.ht/~slylax/rvLispOS.git
git checkout hello_world
cd rvLispOS/sketch/

Build Dependencies You will need a RISC-V 64bit toolchain:

# apt install binutils-riscv64-linux-gnu gcc-riscv64-linux-gnu

The goal of this foray is to eventually do away with the -linux-gnu part of the toolchain and use the -unknown-elf toolchain, but for now, we are relying on features in linux to get a better grasp on rv64asm.

Code

Let us have a look at a RISC-V 64bit assembly program that simply displays hello world using Linux syscalls.

# hello.S
.text
.global _start

# GNU/Linux looks for a label called "_start" for the entry point into the application
_start:

  # Call write(), outputting the text of 'helloworld' to stdout
  la a0, 1
  la a1, helloworld
  la a2, endHW
  la a7, 64
  ecall

  # Call exit() with status code 0 (typically means success)
  la a0, 0
  la a7, 93
  ecall

# Constants used in this program
.data
helloworld: .ascii "Hello World!\n"
.equiv endHW, . - helloworld
.end

GNU calling convention has a program start at a label called "_start" and the ".global" directive is much like a function declaration in C/C++, it tells the assembler that other objects can know about _start. Getting to the meat of the _start function we see:

la a0, 1
la a1, helloworld
la a2, endHW
la a7, 64

la reg, imm is shorthand for addi reg, x0, imm in RISC-V assembly which stores the 'imm' immediate value into the 'reg' register. A calling convention used in RISC-V assembly lays out registers a0-a6 as arguments for a method call and a7 specifically used for syscall values. Since printing "Hello World!" is the object of this function, we need the write() system call that linux provides. 'a7' being the syscall register, we load it with 64, the numeric representing "write()." 'a0' has a 1 stored into it, which represents standard output. 'a1' has the memory location for our ascii string "helloworld" stored. And finally, so write() knows how much to write, endHW is stored into the 3rd argument register, a2. RISC-V has a specific opcode for calling syscalls using the a0-a7 registers, called ecall. At this point, "Hello World!" will be written to stdout, which typically is your terminal window.

la a0, 0
la a7, 93
ecall

To exit cleanly, we set the return code to 0 in the 'a0' register, and set the syscall value in 'a7' to 93. I am not sure why the values 64 and 93, I got them from Stephen Smith's Blog, but am unable to find documentation on why those work, thus far.

Lastly, the '.data' section that holds our message and the length of the message. The .end directive tells the assembler to not bother continuing further in this file. To build our hello world example, run the following:

riscv64-linux-gnu-gcc -nostartfiles -o HelloWorld hello.S

Now this is where you'll need QEMU, specifically qemu-riscv64 so one can run the program as such:

$ qemu-riscv64 ./helloworld
Hello, World

Success! Our initial RISC-V application works and our development environment is setup.

When 0x03 is ready, this post will be updated with the link here.