Welcome back to my small series about creating a assembler and linker for the original gameboy.

This time we will create the parser for the assembler code. I originally started reading this excelent blog post about parser combinators. It might be helpful to read it first! The resulting code was good enough for the first few bits of the assemblers parser.

Imagine the following assembler code:

1
2
3
4
#include "io_registers.asm" ; add a bunch of constants to name the constant IO register addresses.

MyLabel:
  LD A, [LCD_Y] ; LCD_Y is imported from the io_registers.asm

This little code raises two problems while parsing it:

1) The file include: the include could appear anywhere and it should be valid, as long as the content of the included file is valid at the current position. 2) The constant value: we need to take the value of all constant values with us while parsing, because it could change in the very next line after using it…

To solve the first problem the position type from the original parser combinator got a new filename property and the InputState got a new parentState which is an option to a InputState option. This allows us to just have a little stack of input files to which we can push a new file in case of an file include. This allows us to switch the next following input, no matter what we are currently parsing. It just takes a few modifications on the nextChar function to pop the parent state, once we reached the end of the file.

The second problem could be solved by carrying some state around while parsing. This state might be read and modified by the parser implementations. That way we always have access to all defined constants and could also keep track of other things like the current global label.

The global label problem can be seen in this example:

1
2
3
4
5
GlobalLabel:
  NOP
  NOP
  .Local:: 
  JP GlobalLabe

At the point where we parse the local label, we have no idea what the name of the parent global label is. If we also carry this arround in the state, the problem is solved :)

Once we have include files, we could also simply add macros. Just treat them as a file within a file :)

I’ve also modified the parser combinator to differ between syntax errors and parser missmatches. Because on a missmatch you might try another parser which is also valid at the current position. But in case of an error you can simply stop the whole parsing process. This also improves the error reporting.

Take the following code for example:

1
2
3
4
5
let opCode = 
  let add = pString "ADD" >>. Register .>> pChar ',' .>>. Register <?> "ADD"
  let sub = pString "SUB" >>. Register .>> pChar ',' .>>. Register <?> "SUB"

  add <|> sub

if you want to parse ADD A, "this is invalid" you would get an error message telling you that either ADD or SUB is required. With a little helper function and the difference between missmatch and error the error reporting could tell you that a register was expected but a quote was found.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
let must p =
    let parserFn input = 
        match run p input with 
        | Ok r -> Ok r
        | Error (l, m, p, _) -> Error (l, m, p, Fail)
    {p with fn = parserFn}

let opCode = 
  let add = pString "ADD" >>. must (Register .>> pChar ',' .>>. Register) <?> "ADD"
  let sub = pString "SUB" >>. must (Register .>> pChar ',' .>>. Register) <?> "SUB"

  add <|> sub

You can find the modified parser combinator here. I’ve also added some other helper functions to create parsers. For example terminatedList which helps parsing things like Item Item Item STOP or and EOF parser which succeeds when the end of the file is reached.

I will not post the whole assembler parser code here since its very trivial once you’ve read the parser combinator blog post mentioned above. The whole thing can be found here.

The only thing left to do is to glue the whole thing together. Check out the source of the final source code at github.