DCG object level representation in prolog
Prolog is an excellent language for writing parsers using Definite Clause Grammars (DCG) which are supported as first class objects in the language. The language provides syntactic sugar to make writing these grammars more easily than without.
Writing a DCG isn’t far off translating a given grammar into prolog syntax, but before we get to that we’ll first look at writing DCGs the hard way without syntactic sugar. It’s important to understand the object level representation of a DCG as this is what prolog actually executes.
Parsing a sentence using a DCG makes use of a standard prolog programming
pattern: the difference
difference list programming pattern represents a list
L as the difference
between two other lists
L2 is a sublist of
append(L, L2, L1) (that is,
L appended with
A list in difference list form is represented by two lists. Let’s say we want
L = [element1, element2, ..., elementN] in difference list form.
We need two lists
L2 in which we’ll encode
L as the difference
between those lists. We’ll represent the two lists as a pair, pairs in prolog
use the syntax
A is the first element of the pair, and
second. This is not necessary, and later we’ll drop the pair syntax and just
use arguments to hold each of the lists. Getting back to encoding
L2 we can form the difference list
L1-L2 = [element1, element2,
..., elementN | Rest]-Rest.
L prepended to
L2 is some
other list, we don’t really care what.
“Why do this, you’re insane!?” You might ask, well we now have
Rest on the end of the list which allows us to do some clever tricks
to write efficient predicates, and it makes writing parsers super
Appending with difference lists
One of the simplest examples of the difference list pattern is an
implementation of the
append/3 predicate. You may have seen the naive
append(, L, L). append([Head|Tail],List,[Head|Rest]) :- append(Tail, List, Rest).
However this takes linear time in execution… for something as frequently used
as appending, this isn’t good enough, we want constant time execution! To achieve
constant time execution of appending we’ll implement
append/3 using difference lists.
Let’s start with a simple example, appending
[1, 2, 3] to
[4, 5, 6, 7].
First lets convert these to difference list representation:
3 | Rest1]-Rest1,
[4, 5, 6, 7 | Rest2]-Rest2 (suspend your suspicion that this is
a pointless, useless representation, we’ll get there!). What should the result
of appending the two lists result in (in difference list form?)? Well it should
at least start with
[1, 2, 3, 4, 5, 6, 7 | Something]-Something as this contains the
result, but how about the
Rest part of the list which we’ve left as
Something? I suppose we could put anything here but there’s an alternative
that makes a bit more sense: reuse the
Rest part from the second list, this
makes a bit more sense, as we preserve more information about our inputs this
way. In conclusion we’ve decided that appending the difference lists
3 | Rest1]-Rest1 and
[4, 5, 6, 7 | Rest2]-Rest2 should result in
[1, 2, 3,
4, 5, 6, 7 | Rest2]-Rest2.
How we write
append/3 such that it uses difference lists? Lets distinguish it
from the recursive definition we presented above my denoting it as
(for difference-list append). We know the inputs, so they form the first
2 arguments of the predicate, and the last argument will be the appended
difference lists. Let’s sketch this out:
dappend([L1element1, L1element2, ..., L1elementN | Rest1]-Rest1, [L2element1, L2element2, ..., L2elementM | Rest2]-Rest2, [L1element1, L1element2, ..., L1elementN, L2element1, L2element2, ..., L2elementM | Rest2]-Rest2]).
We could write out a bunch of predicates for all different sizes of lists, but
that’d be insanity! Instead we can make clever use of the
Rest parts of the
difference lists to combine them!
dappend(L1-Rest1, L2-Rest2, L1-Rest2) :- Rest1 = L2. % or more simply: dappend(L1-L2, L2-Rest, L1-Rest).
L2 we force the difference part of the
difference list to be the
L2 difference list. Going back to our example, we
[1, 2, 3 | Rest1]-Rest1 with
[4, 5, 6, 7 | Rest2] resulting in
[1, 2, 3, 4, 5, 6, 7 | Rest2], then make this into a difference list by lifting
it into the pair functor
-/2 in the output argument
[1, 2, 3, 4, 5, 6,
7 | Rest2]-Rest2. No recursion, just unification!
Now what if we want to wrap this up so we don’t have to use difference lists
directly? Let’s try and write a wrapper
my_append/3 that takes 2 normal lists
L2 and concatenates them to form
L, first we have to convert
L2 into difference lists, then we can call
dappend/3 and pull out the
difference of the result to get
diff_list(, End-End). diff_list([Head|Rest], [Head|DiffListRest]-End) :- diff_list(Rest, DiffListRest-End). my_append(L1, L2, L) :- diff_list(L1, DiffList1), dappend(L1-_, L2-_, L-).
And some examples using
?- L2 = [4, 5, 6], dappend([1,2,3|L2]-L2, L2-, L-). L = [1, 2, 3, 4, 5, 6]. % We force the difference list representing `L1` to be the % difference between L1 appended to L2. ?- dappend([1, 2, 3]-_, [4, 5, 6]-, L-). L = [1, 2, 3]. % By failing to create a proper difference list in the L1 position, namely % `[1, 2, 3]-_`, `_` unifies with `[4, 5, 6]` and L unifies with `[1, 2, 3]`.
To make the normal lists into difference lists we have to traverse them to the
end… defeating the whole object of the exercise: to make
append/3 work in
constant time, so DON’T use this unless you already have your lists in
difference list form.
Exercise: Take a few example difference lists and try appending them with the
DCGs: Grammars with difference lists
You should now be familiar with the concept of difference lists from the
dappend/3 example. We now show how to construct parsers from difference
lists. What does it mean to parse a sentence? How are we going to represent
a sentence? What does it mean to partially parse a sentence? Keep these
questions in mind.
Prolog represents sentences as lists of words, an example being
name, is, will]. How about parsing this sentence? To parse it, we need
a grammar. A grammar specifies the legal sentences in the language; we’ll constrain
our language to be sentences of the form
hello my name is <name> where
<name> is either
Lets define this in EBNF, a standard form for representing grammars.
sentence = greeting, introduction; greeting = "hello"; introduction = "my", "name", "is", name; name = "will" | "tom" | "henry";
What are the valid sentences represented in Prolog?
[hello, my, name, is, will] [hello, my, name, is, tom] [hello, my, name, is, henry]
Parsing the sentence top-down from
sentence involves substituting the LHS of
a rule for the RHS repeatedly until all non-terminals are replaced with
[hello, my name, is will] parsed by the rule
sentence takes the
, greeting, introduction
[hello, my, name, is], name
[hello, my, name, is, will]
At each stage we replace a non-terminal with the RHS of its rule, we accumulate the non-terminals into a list starting with the empty list. Parsers consume terminals, that is given a sentence, rules will match part of the sentence.
How would we parse the
greeting rule in prolog using difference lists?
We represent the parsed terminals as the difference between two lists:
greeting(L1, L2) the difference between
L2 should be
we can write this as
greeting([hello|Rest], Rest), we match
hello from the
head of the first list, consuming it, leaving
Let’s see how this predicate functions in a few queries:
?- greeting([hello], Rest). Rest =  ?- greeting([hello, my, name, is ,will], Rest). Rest = [my, name is will] ?- greeting(Sentence, [my, name, is will]). Sentence = [hello, my, name, is will] ?- greeting(, ) false. ?- greeting([hello], ) true. ?- greeting([hello, my, name, is will], ) false.
greeting was arguably the simplest of the grammar rules, the concept
applies to all other grammar rules, all we’re doing is consuming terminals from
the first list, putting the resulting list in the second argument.
Let’s convert the rest of the grammar rules to Prolog in difference list form.
name is very similar to
greeting, we simply consume the name.
name([will|Rest], Rest). name([tom|Rest], Rest). name([henry|Rest], Rest).
introduction is slightly more complicated as now we have a non-terminal in
the RHS of the rule
introduction([my, name, is|Rest1], Rest2) :- name(Rest1, Rest2).
This deserves some explanation.
introduction consumes the non-terminals on the
RHS of its corresponding grammar rule, but then we also need to give the rest of the
sentence to the
name parser so it can check that the rest of the sentence is
a name. To do this we pass the remainder of the sentence,
results in another list, containing the remainder of the sentence after
matching the name in
Rest2. Let’s give some pseudo-Prolog examples to further
demonstrate how this works:
% This is not a real prolog session, I've written out the intermediate % variables to aid the understanding of which parts of the sentence are % consumed by which parsers ?- introduction([my, name, is, will], X). Rest1 = [will], Rest2 = , X = Rest2 =  ?- introduction([my], X). false % Prolog can't match the [my] to [my, name, is|Rest1] as the list is missing [name, is] ?- introduction([my, name, is, Name], ). Name = will; Name = tom; Name = henry
Finally the sentence rule:
sentence(L1, L3) :- greeting(L1, L2), introduction(L2, L3).
and some examples:
% This is not a real prolog session, I've written out the intermediate % variables to aid the understanding of which parts of the sentence are % consumed by which parsers ?- sentence([hello, my, name, is will], ). L1 = [hello, my, name, is will], L2 = [my, name, is will], L3 =  true. % greeting/2 consumes `hello` from the list leaving L2, % which is then fully consumed by introduction/2 resulting % in L3 =  ?- sentence(Sentence, ). Sentence = [hello, my, name, is will]; Sentence = [hello, my, name, is tom]; Sentence = [hello, my, name, is henry]. ?- sentence([my, name, is, will], [will]). false. % Because name is called like `name([will|Rest], [will])`, it doesn't match the % definition of `name/2` so fails.
Translating these grammar rules is very mechanical, which is why Prolog introduces syntactic sugar for them! Yay! The grammar we gave before can be rewritten as:
sentence --> greeting, introduction. greeting --> [hello]. introduction --> [hello, my, name, is], name. name --> [will]. name --> [tom]. name --> [henry].
We also don’t have to explicitly call the predicates like
), we can use
phrase(sentence, Sentence) is equivalent
to the previous call.