My book on iOS interface design, Design Teardowns: Step-by-step iOS interface design walkthroughs is now available!

Writing a parser in Swift (II)

This is a follow up to Writing a Parser in Swift (I), I'd highly suggest reading that first if you haven't already.

Last time, we looked at Context-free Grammars, Recursive Descent Parsing and Operator Precedence Parsing. Let's complete the missing pieces in our trivial language, Kaleidoscope.

Here's some grammar that describes declaring a function:

Function -> Prototype Expression  
Prototype -> Define Identifier ( Arguments )  
Arguments -> Arguments, Identifier | Identifier | ε  

Functions are declared with a Prototype followed by a single Expression. The Prototype consists of the keyword def, which is represented by the Define symbol, followed by an Identifier and a pair of parenthesis. Any arguments are included comma-separated within the parenthesis. In the grammar above, we denote the empty string "" with epsilon, ε.

Similarly we can write production rules for calling a function:

PrimaryExpression -> Call | Identifier | Number | ( Expression )  
Call -> Identifier ( Parameters )  
Parameters -> Parameters, Expression | Expression | ε  

To call a function, we first have an Identifier that represents the name of the function, followed by any parameters comma-separated within the parenthesis.

Let's look at how we can create the corresponding nodes in our AST:

struct PrototypeNode: CustomStringConvertible {  
    let name: String
    let argumentNames: [String]
    var description: String {
        return "PrototypeNode(name: \(name), argumentNames: \(argumentNames))"
    }
}

struct FunctionNode: CustomStringConvertible {  
    let prototype: PrototypeNode?
    let body: ExprNode
    var description: String {
        return "FunctionNode(prototype: \(prototype), body: \(body))"
    }
}

We'll explain why the prototype property is optional later. Similarly, here's how function calls are represented:

struct CallNode: ExprNode {  
    let name: String
    let arguments: [ExprNode]
    var description: String {
        return "CallNode(name: \(name), arguments: \(arguments))"
    }
}

Notice the symmetry between function declarations and function calls. This would hint to us that we should write code that share logic in some way. In particular, we can factor the logic that parses a comma-separated list into a method that takes as input the logic for generating the list elements (specifically ExprNode for parameters and String for argument names.)

Here's the parsing logic for identifiers and function calls:

func readIdentifier() throws -> String {  
    guard case let Token.Identifier(name) = popCurrentToken() else {
        throw ParseError.ExpectedIdentifier
    }
    return name
}

func parseIdentifierOrFunctionCall() throws -> ExprNode {  
    let name = try readIdentifier()

    guard tokensAvailable else {
        return VariableNode(name: name)            
    }

    guard case Token.ParensOpen = peekCurrentToken() else {
        return VariableNode(name: name)
    }

    let arguments = try parseParensList(parseExpression)
    return CallNode(name: name, arguments: arguments)
}

If there is a ( symbol after the Identifier, we assume it to be a function call. We also assume the method func parseParensList will parse a comma-separated list of something. In this case, we have it return parameters which are just Expressions via the func parseExpression() method defined earlier.

Next, let's take a look at parsing function declarations:

enum ParseError: ErrorType {  
    ...
    case ExpectedFunctionDeclaration
    case ExpectedFunctionName
}

func parsePrototype() throws -> PrototypeNode {  
    guard case let Token.Identifier(name) = popCurrentToken() else {
        throw ParseError.ExpectedFunctionName
    }

    let argumentNames = try parseParensList(readIdentifier)
    return PrototypeNode(name: name, argumentNames: argumentNames)
}

func parseDefinition() throws -> FunctionNode {  
    guard case Token.Define = popCurrentToken() else {
        throw ParseError.ExpectedFunctionDeclaration
    }

    let prototype = try parsePrototype()
    let body = try parseExpression()
    return FunctionNode(prototype: prototype, body: body)
}

This time, our customisation to func parseParensList() is func readIdentifier() which returns the String of the name of an Identifier. The parsing of the function definition is rather trivial.

Now, we're ready to talk about how we can actually parse the list of ExprNodes or Strings in each case. We allow customisation with 2 swift features:

1) We take a closure as input to perform the actual reading
2) We use a generic method that allows us to have the correct return type depending on the input closure

enum ParseError: ErrorType {  
    ...
    case ExpectedCommaSeparatedList
}

func parseParensList<T>(@noescape read: () throws -> T) throws -> [T] {  
    guard case Token.ParensOpen = popCurrentToken() else {
        throw ParseError.ExpectedCharacter("(")
    }

    if case Token.ParensClose = peekCurrentToken() {
        return []
    }

    var list = [T]()
    while true {
        let element = try read()
        list.append(element)

        if case Token.ParensClose = peekCurrentToken() {
            break
        }

        guard case Token.Comma = popCurrentToken() else {
            throw ParseError.ExpectedCommaSeparatedList
        }
    }

    guard case Token.ParensClose = popCurrentToken() else {
        throw ParseError.ExpectedCharacter(")")
    }

    return list
}

We're using the @noescape attribute here to tell the Swift compiler that this closure will not be used outside of this method and allows us to ignore any reference semantics for capturing variables.

We begin by looking for ( followed by ) which essentially means we have an empty list. If we don't find the closing parenthesis, we should read the element and add it to our list. Each element should be followed by the closing parenthesis or a comma if there are more elements after. Finally, we discard the closing ) and return the list.

Next, we also need to update our method func parsePrimary() to understand function calls:

func parsePrimary() throws -> ExprNode {  
    switch (peekCurrentToken()) {
    case .Identifier:
        return try parseIdentifierOrFunctionCall()
    case .Number:
        return try parseNumber()
    case .ParensOpen:
        return try parseParens()
    default:
        throw ParseError.ExpectedExpression
    }
}

Now, we are ready to finish our Parser class. We're going to simplify things by representing top-level expressions as a FunctionNode without a prototype (we made this optional earlier.)

func parseTopLevelExpr() throws -> FunctionNode {  
    let body = try parseExpression()
    return FunctionNode(prototype: nil, body: body)
}

func parse() throws -> [Any] {  
    index = 0

    var nodes = [Any]()
    while index < tokens.count {
        switch peekCurrentToken() {
        case .Define:
            let node = try parseDefinition()
            nodes.append(node)
        default:
            let expr = try parseTopLevelExpr()
            nodes.append(expr)
        }
    }

    return nodes
}

Now we can parse a series of function definitions or expressions:

let parser = Parser(tokens: tokens)  
do {  
    print(try parser.parse())
}
catch {  
    print(error)
}

If we pass in the following input:

def sum(x, y) x+y  
sum(4, 3)  

We get a nice list of FunctionNodes:

[
FunctionNode(  
    prototype: Optional(PrototypeNode(
        name: sum, 
        argumentNames: ["x", "y"])), 
    body: BinaryOpNode(
        +, 
        lhs: VariableNode(x), 
        rhs: VariableNode(y))), 
FunctionNode(  
    prototype: nil, 
    body: CallNode(
        name: sum, 
        argument: [NumberNode(4.0), NumberNode(3.0)]))
]

Coming up next, we'll talk about how we can use LLVM to run this program!