Recently, I've been developing a command-line JSON processor called hson. hson is implemented in Haskell and comprises a CLI and interpreter for manipulating JSON data with the hson language.
hson takes two inputs, an hson script and some JSON data, and enables the user to format the JSON data via the hson script. For example, given the following sinistcha.json
,
we can get the front_default
sprite like so:
The semantics of the above hson script $.sprites.front_default
are intuitive:
- The contents of
sinistcha.json
are parsed and bound to the$
identifier. - The
sprites
field ofsinistcha.json
is accessed. - The
front_default
field ofsprites
is accessed. - hson prints the value of the
front_default
field.
The meat of this expression is parsed in the call
function of my recursive descent parser, which is responsible for
- Get expressions that access a property on an object, like
$.sprites
. - Call expressions that execute code within a callable value. For example,
keys()
executes thekeys
function. - Index expressions that access an element at an index within an indexable value. For example,
items[0]
accesses the index0
initems
.
The call
function must also consider that each expression can be chained. As we saw above, Get expressions can be chained to access properties within properties (e.g., $.sprites.front_default
). The same kind of chaining can occur with Call and Index expressions, too, with expressions like logging.getLogs()[0].toJSON()
and listOfFunctions[0]().result
.
All of these responsibilities are captured in the following function definition.
At first glance, this definition may seem daunting. However, the responsibilities of call
are just as I shared above.
- Parse a primary expression,
expr
, which will evaluate to the value being called/indexed/accessed by the call/index/get expression. - If
expr
is not called, indexed, or accessed, simply returnexpr
- Otherwise, handle many calls, indexes, or property accesses.
The complexity in this function arises with the last point: handling a chain of many Get, Call, or Index expressions. Because getting, calling, and indexing are all left-associative, the original expression being accessed/called/indexed becomes the most deeply nested expression in the parse tree. To illustrate, the parse tree for our original expression $.sprites.front_default
has DollarExpr
as its deepest expression node:
The challenge is that our parser scans left to right but must nest the expressions right to left. In an imperative language with mutability, one could repeatedly set the previously parsed expression as the child of the next Get/Call/Index expression. For example, one might handle chained Get expressions in Go as follows:
However, a pure functional language such as Haskell doesn't afford us the liberty of reassigning variables like expr
.
So, how can we parse chained expressions?
The answer lies in our definitions of parseGet
, parseCall
, and parseIndex
, as well as in our use of threadl
. First, let's examine those definitions.
The parsing performed by each function is straightforward:
parseGet
parses a dot followed by an identifierparseCall
parses arguments surrounded by parenthesesparseIndex
parses an expression surrounded by brackets
The unique aspect of these functions, though, is they each return
a function, namely Expr -> Expr
. What do these functions do? They each return a newly constructed parent expression with the input expression as its child. Essentially, these functions map child expressions to parent expressions.
So, when we parse many
instances of Get, Call, or Index, we receive a list of Expr -> Expr
.
Now our goal is to use the leftmost expression (e.g., $
) to repeatedly construct nested Get/Call/Index expressions. We can achieve this by sequential, left-associative application of each function in [Expr -> Expr]
to the result of the previous function. More specifically,
- Pass the leftmost expression as the argument to the first function (
Expr -> Expr
) in the list, producing anExpr
- Take the newly constructed
Expr
and pass it to the next function in the list, producing anotherExpr
- Repeat through all the functions in the list.
This repeated function application is carried about by our threadl
function, which is simply a left-associative fold over the list of functions.
The (&)
operator comes from Data.Function
and is the flip
of infix function application ($)
:
To summarize, we can parse chained expressions in Haskell by creating a list of functions that map child expressions to newly constructed parent expressions, then thread the original, leftmost expression through those functions to produce the final root expression. This solution exemplifies a broader pattern of returning and applying "setter" functions wherever we might employ mutability and reassignment in an imperative language.