A GF grammar consists of a set of modules, which can be combined in different ways to build different grammars. There are several different types of modules:
abstract
concrete
resource
interface
instance
incomplete concrete
We will go through the module types in this order, which is also their order of "importance" from the most basic to the more advanced ones.
This document presupposes knowledge of GF judgements and expressions, which can be gained from the GF tutorial. It aims to give a systamatic description of the module system; some tutorial information is repeated to make the document self-contained.
Any GF grammar that is used in an application
will probably contain at least one module
of the abstract
module type. Here is an example of
such a module, defining a fragment of propositional logic.
abstract Logic = { cat Prop ; fun Conj : Prop -> Prop -> Prop ; fun Disj : Prop -> Prop -> Prop ; fun Impl : Prop -> Prop -> Prop ; fun Falsum : Prop ; }
The name of this module is Logic
.
An abstract
module defines an abstract syntax, which
is a language-independent representation of a fragment of language.
It consists of two kinds of judgements:
cat
judgements telling what categories there are
(types of abstract syntax trees)
fun
judgements telling what functions there are
(to build abstract syntax trees)
There can also be def
and data
judgements in an
abstract syntax.
The GF grammar compiler expects to find the module Logic
in a file named
Logic.gf
. When the compiler is run, it produces
another file, named Logic.gfc
. This file is in the
format called canonical GF, which is the "machine language"
of GF. Next time that the module Logic
is needed in
compiling a grammar, it can be read from the compiled (gfc
)
file instead of the source (gf
) file, unless the source
has been changed after the compilation.
In order for a GF grammar to describe a concrete language, the abstract
syntax must be completed with a concrete syntax of it.
For this purpose, we use modules of type concrete
: for instance,
concrete LogicEng of Logic = { lincat Prop = {s : Str} ; lin Conj a b = {s = a.s ++ "and" ++ b.s} ; lin Disj a b = {s = a.s ++ "or" ++ b.s} ; lin Impl a b = {s = "if" ++ a.s ++ "then" ++ b.s} ; lin Falsum = {s = ["we have a contradiction"]} ; }
The module LogicEng
is a concrete syntax of
the
abstract syntax Logic
. The GF grammar compiler checks that
the concrete is valid with respect to the abstract syntax of
which it is claimed to be. The validity requires that there has to be
lincat
judgement for each cat
judgement, telling what the
linearization types of categories are
lin
judgement for each fun
judgement, telling what the
linearization functions corresponding to functions are
Validity also requires that the linearization functions defined by
lin
judgements are type-correct with respect to the
linearization types of the arguments and value of the function.
There can also be lindef
and printname
judgements in a
concrete syntax.
When a concrete
module is successfully compiled, a gfc
file is produced in the same way as for abstract
modules. The
pair of an abstract
and a corresponding concrete
module
is a top-level grammar, which can be used in the GF system to
perform various tasks. The most fundamental tasks are
In the current grammar, infinitely many trees and strings are recognized, although no very interesting ones. For example, the tree
Impl (Disj Falsum Falsum) Falsum
has the linearization
if we have a contradiction or we have a contradiction then we have a contradiction
which in turn can be parsed uniquely as that tree.
When GF compiles the module LogicEng
it also has to compile
all modules that it depends on (in this case, just Logic
).
The compilation process starts with dependency analysis to find
all these modules, recursively, starting from the explicitly imported one.
The compiler then reads either gf
or gfc
files, in
a dependency order. The decision on which files to read depends on
time stamps and dependencies in a natural way, so that all and only
those modules that have to be compiled are compiled. (This behaviour can
be changed with flags, see below.)
To use a top-level grammar in the GF system, one uses the import
command (short name i
). For instance,
i LogicEng.gf
It is also possible to specify the imported grammar(s) on the command line when invoking GF:
gf LogicEng.gf
Various compilation flags can be added to both ways of compiling a module:
-src
forces compilation form source files
-v
gives more verbose information on compilation
-s
makes compilation silent (except if it fails with an error message)
A complete list of flags can be obtained in GF by help i
.
Importing a grammar makes it visible in GF's internal state. To see
what modules are available, use the command print_options
(po
).
You can empty the state with the command empty
(e
); this is
needed if you want to read in grammars with a different abstract syntax
than the current one without exiting GF.
Grammar modules can reside in different directories. They can then be found by means of a search path, which is a flag such as
-path=.:api/toplevel:prelude
given to the import
command or the shell command invoking GF.
(It can also be defined in the grammar file; see below.) The compiler
writes every gfc
file in the same directory as the corresponding
gf
file.
The path
is relative to the working directory pwd
, so that
all directories listed are primarily interpreted as subdirectories of
pwd
. Secondarily, they are searched relative to the value of the
environment variable GF_LIB_PATH
, which is by default set to
/usr/local/share/GF
.
Parsing and linearization can be performed with the parse
(p
) and linearize
(l
) commands, respectively.
For instance,
> l Impl (Disj Falsum Falsum) Falsum if we have a contradiction or we have a contradiction then we have a contradiction > p -cat=Prop "we have a contradiction" Falsum
Notice that the parse
command needs the parsing category
as a flag. This necessary since a grammar can have several
possible parsing categories ("entry points").
One abstract
syntax can have several concrete
syntaxes.
Here are two new ones for Logic
:
concrete LogicFre of Logic = { lincat Prop = {s : Str} ; lin Conj a b = {s = a.s ++ "et" ++ b.s} ; lin Disj a b = {s = a.s ++ "ou" ++ b.s} ; lin Impl a b = {s = "si" ++ a.s ++ "alors" ++ b.s} ; lin Falsum = {s = ["nous avons une contradiction"]} ; } concrete LogicSymb of Logic = { lincat Prop = {s : Str} ; lin Conj a b = {s = "(" ++ a.s ++ "&" ++ b.s ++ ")"} ; lin Disj a b = {s = "(" ++ a.s ++ "v" ++ b.s ++ ")"} ; lin Impl a b = {s = "(" ++ a.s ++ "->" ++ b.s ++ ")"} ; lin Falsum = {s = "_|_"} ; }
The four modules Logic
, LogicEng
, LogicFre
, and
LogicSymb
together form a multilingual grammar, in which
it is possible to perform parsing and linearization with respect to any
of the concrete syntaxes. As a combination of parsing and linearization,
one can also perform translation from one language to another.
(By language we mean the set of expressions generated by one
concrete syntax.)
Any combination of abstract syntax and corresponding concrete syntaxes
is thus a multilingual grammar. With many languages and other enrichments
(as described below), a multilingual grammar easily grows to the size of
tens of modules. The grammar developer, having finished her job, can
package the result in a multilingual canonical grammar, a file
with the suffix .gfcm
. For instance, to compile the set of grammars
described by now, the following sequence of GF commands can be used:
i LogicEng.gf i LogicFre.gf i LogicSymb.gf pm | wf logic.gfcm
The "end user" of the grammar only needs the file logic.gfcm
to
access all the functionality of the multilingual grammar. It can be
imported in the GF system in the same way as .gf
files. But
it can also be used in the
Embedded Java Interpreter for GF
to build Java programs of which the multilingual grammar functionalities
(linearization, parsing, translation) form a part.
In a multilingual grammar, the concrete syntax module names work as names of languages that can be selected for linearization and parsing:
> l -lang=LogicFre Impl Falsum Falsum si nous avons une contradiction alors nous avons une contradiction > l -lang=LogicSymb Impl Falsum Falsum ( _|_ -> _|_ ) > p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" Conj Falsum Falsum
The option -multi
gives linearization to all languages:
> l -multi Impl Falsum Falsum if we have a contradiction then we have a contradiction si nous avons une contradiction alors nous avons une contradiction ( _|_ -> _|_ )
Translation can be obtained by using a pipe from a parser to a linearizer:
> p -cat=Prop -lang=LogicSymb "( _|_ & _|_ )" | l -lang=LogicEng if we have a contradiction then we have a contradiction
The concrete
modules shown above would look much nicer if
we used the main idea of functional programming: avoid repetitive
code by using functions that capture repeated patterns of
expressions. A collection of such functions can be a valuable
resource for a programmer, reusable in many different
top-level grammars. Thus we introduce the resource
module type, with the first example
resource Util = { oper SS : Type = {s : Str} ; oper ss : Str -> SS = \s -> {s = s} ; oper paren : Str -> Str = \s -> "(" ++ s ++ ")" ; oper infix : Str -> SS -> SS -> SS = \h,x,y -> ss (x.s ++ h ++ y.s) ; oper infixp : Str -> SS -> SS -> SS = \h,x,y -> ss (paren (infix h x y)) ; }
Modules of resource
type have two forms of judgement:
oper
defining auxiliary operations
param
defining parameter types
A resource
can be used in a concrete
(or another
resource
) by open
ing it. This means that
all operations (and parameter types) defined in the resource
module become usable in module that opens it. For instance,
we can rewrite the module LogicSymb
much more concisely:
concrete LogicSymb of Logic = open Util in { lincat Prop = SS ; lin Conj = infixp "&" ; lin Disj = infixp "v" ; lin Impl = infixp "->" ; lin Falsum = ss "_|_" ; }
What happens when this variant of LogicSymb
is
compiled is that the oper
-defined constants
of Util
are inlined in the
right-hand-sides of the judgements of LogicSymb
,
and these expressions are partially evaluated, i.e.
computed as far as possible. The generated gfc
file
will look just like the file generated for the first version
of LogicSymb
- at least, it will do the same job.
Several resource
modules can be open
ed
at the same time. If the modules contain same names, the
conflict can be resolved by qualified opening and
reference. For instance,
concrete LogicSymb of Logic = open Util, Prelude in { ... } ;
(where Prelude
is a standard library of GF) brings
into scope two definitions of the constant SS
.
To specify which one is used, you can write
Util.SS
or Prelude.SS
instead of just SS
.
You can also introduce abbreviations to avoid long qualifiers, e.g.
concrete LogicSymb of Logic = open (U=Util), (P=Prelude) in { ... } ;
which means that you can write U.SS
and P.SS
.
Judgements of param
and oper
forms may also be used
in concrete
modules, and they are then considered local
to those modules, i.e. they are not exported.
The compilation of a resource
module differs
from the compilation of abstract
and
concrete
modules because oper
operations
do not in general have values in gfc
. A gfc
file is generated, but it contains only
param
judgements (also recall that oper
s
are inlined in their top-level use sites, so it is not
necessary to save them in the compiled grammar).
However, since computing the operations over and over
again can be time comsuming, and since type checking
resource
modules also takes time, a third kind
of file is generated for resource modules: a .gfr
file. This file is written in the GF source code notation,
but it is type checked and type annotated, and oper
s
are computed as far as possible.
If you look at any gfc
or gfr
file generated
by the GF compiler, you see that all names have been replaced by
their qualified variants. This is an important first step (after parsing)
the compiler does. As for the commands in the GF shell, some output
qualified names and some not. The difference does not always result
from firm principles.
The typical use is through open
in a
concrete
module, which means that
resource
modules are not imported on their own.
However, in the developing and testing phase of grammars, it
can be useful to evaluate oper
s with different
arguments. To prevent them from being thrown away after inlining, the
-retain
option can be used:
> i -retain Util.gf
The command compute_concrete
(cc
)
can now be used for evaluating expressions that may contain
operations defined in Util
:
> cc ss (paren "foo") {s = "(" ++ "foo" ++ ")"}
To find out what oper
s are available for a given type,
the command show_operations
(so
) can be used:
> so SS Util.ss : Str -> SS ; Util.infix : Str -> SS -> SS -> SS ; Util.infixp : Str -> SS -> SS -> SS ;
The most characteristic modularity of GF lies in the division of
grammars into abstract
, concrete
, and
resource
modules. This permits writing multilingual
grammar and sharing the maximum of code between different
languages.
In addition to this special kind of modularity, GF provides inheritance, which is familiar from other programming languages (in particular, object-oriented ones). Inheritance means that a module inherits all judgements from another module; we also say that it extends the other module. Inheritance is useful to divide big grammars into smaller units, and also to reuse the same units in different bigger grammars.
The first example of inheritance is for abstract syntax. Let us
extend the module Logic
to Arithmetic
:
abstract Arithmetic = Logic ** { cat Nat ; fun Even : Nat -> Prop ; fun Odd : Nat -> Prop ; fun Zero : Nat ; fun Succ : Nat -> Nat ; }
In parallel with the extension of the abstract syntax
Logic
to Arithmetic
, we can extend
the concrete syntax LogicEng
to ArithmeticEng
:
concrete ArithmeticEng of Arithmetic = LogicEng ** open Util in { lincat Nat = SS ; lin Even x = ss (x.s ++ "is" ++ "even") ; lin Odd x = ss (x.s ++ "is" ++ "odd") ; lin Zero = ss "zero" ; lin Succ x = ss ("the" ++ "successor" ++ "of" ++ x.s) ; }
Another extension of Logic
is Geometry
,
abstract Geometry = Logic ** { cat Point ; cat Line ; fun Incident : Point -> Line -> Prop ; }
The corresponding concrete syntax is left as exercise.
Inheritance can be multiple, which means that a module may extend many modules at the same time. Suppose, for instance, that we want to build a module for mathematics covering both arithmetic and geometry, and the underlying logic. We then write
abstract Mathematics = Arithmetic, Geometry ** { } ;
We could of course add some new judgements in this module, but it is not necessary to do so. If no new judgements are added, the module body can be omitted:
abstract Mathematics = Arithmetic, Geometry ;
The module Mathematics
shows that it is possibe
to extend a module already built by extension. The correctness
criterion for extensions is that the same name
(cat
, fun
, oper
, or param
)
may not be defined twice in the resulting union of names.
That the names defined in Logic
are "inherited twice"
by Mathematics
(via both Arithmetic
and
Geometry
) is no violation of this rule; the usual
problems of multiple inheritance do not arise, since
the definitions of inherited constants cannot be changed.
Inheritance can be restricted, which means that only some of the constants are inherited. There are two dual notations for this:
A [f,g]
meaning that only f
and g
are inherited from A
, and
A-[f,g]
meaning that everything except f
is g
are inherited from A
.
Constants that are not inherited may be redefined in the inheriting module.
Inherited judgements are not copied into the inheriting modules.
Instead, an indirection is created for each inherited name,
as can be seen by looking into the generated gfc
(and
gfr
) files. Thus for instance the names
Mathematics.Prop Arithmetic.Prop Geometry.Prop Logic.Prop
all refer to the same category, declared in the module
Logic
.
The command visualize_graph
(vg
) shows the
dependency graph in the current GF shell state. The graph can
also be saved in a file and used e.g. in documentation, by the
command print_multi -graph
(pm -graph
).
The vg
command uses the free software packages Graphviz (commad dot
)
and Ghostscript (command gv
).
Top-level grammars have a straightforward translation to
resource
modules. The translation concerns
pairs of abstract-concrete judgements:
cat C ; ===> oper C : Type = T ; lincat C = T ; fun f : A ; ===> oper f : A = t ; lin f = t ;
Due to this translation, a concrete
module
can be open
ed in the same way as a
resource
module; the translation is done
on the fly (it is computationally very cheap).
Modular grammar engineering often means that some grammarians focus on the semantics of the domain whereas others take care of linguistic details. Thus a typical reuse opens a linguistically oriented resource grammar,
abstract Resource = { cat S ; NP ; A ; fun PredA : NP -> A -> S ; } concrete ResourceEng of Resource = { lincat S = ... ; lin PredA = ... ; }
The application grammar, instead of giving linearizations explicitly, just reduces them to categories and functions in the resource grammar:
concrete ArithmeticEng of Arithmetic = LogicEng ** open ResourceEng in { lincat Nat = NP ; lin Even x = PredA x (regA "even") ; }
If the resource grammar is only capable of generating grammatically correct expressions, then the grammaticality of the application grammar is also guaranteed: the type checker of GF is used as grammar checker. To guarantee distinctions between categories that have the same linearization type, the actual translation used in GF adds to every linearization type and linearization a lock field,
cat C ; ===> oper C : Type = T ** {lock_C : {}} ; lincat C = T ; fun f : C_1 ... C_n -> C ; ===> oper f : C_1 ... C_n -> C = \x_1,...,x_n -> lin f = t ; t x_1 ... x_n ** {lock_C = <>};
(Notice that the latter translation is type-correct because of
record subtyping, which means that t
can ignore the
lock fields of its arguments.) An application grammarian who
only uses resource grammar categories and functions never
needs to write these lock fields herself. Having to do so
serves as a warning that the grammaticality guarantee given
by the resource grammar no longer holds.
Note. The lock field mechanism is experimental, and may be changed to a stronger abstraction mechnism in the future. This may result in hand-written lock fields ceasing to work.
One difference between top-level grammars and resource
modules is that the former systematically separete the
declarations of categories and functions from their definitions.
In the reuse translation creating and oper
judgement,
the declaration coming from the abstract
module is put
together with the definition coming from the concrete
module.
However, the separation of declarations and definitions is so
useful a notion that GF also has specific modules types that
resource
modules into two parts. In this splitting,
an interface
module corresponds to an abstract syntax,
in giving the declarations of operations (and parameter types).
For instance, a generic markup interface would look as follows:
interface Markup = open Util in { oper Boldface : Str -> Str ; oper Heading : Str -> Str ; oper markupSS : (Str -> Str) -> SS -> SS = \f,r -> ss (f r.s) ; }
The definitions of the constants declared in an interface
are given in an instance
module (which is always of
an interface, in the same way as a concrete
is always
of
an abstract). The following instance
s
define markup in HTML and latex.
instance MarkupHTML of Markup = open Util in { oper Boldface s = "<b>" ++ s ++ "</b>" ; oper Heading s = "<h2>" ++ s ++ "</h2>" ; } instance MarkupLatex of Markup = open Util in { oper Boldface s = "\\textbf{" ++ s ++ "}" ; oper Heading s = "\\section{" ++ s ++ "}" ; }
Notice that both interface
s and instance
s may
open
resource
s (and also reused top-level grammars).
An interface
may moreover define some of the operations it
declares; these definitions are inherited by all instances and cannot
be changed in them. Inheritance by module extension
is possible, as always, between modules of the same type.
An interface
or an instance
can be open
ed in
a concrete
using the same syntax as when opening
a resource
. For an instance
, the semantics
is the same as when opening the definitions together with
the type signatures - one can think of an interface
and an instance
of it together forming an ordinary
resource
. Opening an interface
, however,
is different: functions that are only declared without
having a definition cannot be compiled (inlined); neither
can functions whose definitions depend on undefined functions.
A module that open
s an interface
is therefore
incomplete, and has to be completed with an
instance
of the interface to become complete. To make
this situation clear, GF requires any module that opens an
interface
to be marked as incomplete
. Thus
the module
incomplete concrete DocMarkup of Doc = open Markup in { ... }
uses the interface Markup
to place markup in
chosen places in its linearization rules, but the
implementation of markup - whether in HTML or in LaTeX - is
left unspecified. This is a powerful way of sharing
the code of a whole module with just differences in
the definitions of some constants.
Another terminology for incomplete
modules is
parametrized modules or functors.
The interface
gives the list of parameters
that the functor depends on.
To complete an incomplete
module, each inteface
that it opens has to be provided an instance
. The following
syntax is used for this:
concrete DocHTML of Doc = DocMarkup with (Markup = MarkupHTML) ;
Instantiation of Markup
with MarkupLatex
is
another one-liner.
If more interfaces than one are instantiated, a comma-separated list of equations in parentheses is used, e.g.
concrete MusicIta = MusicI with (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ;
This example shows a common design pattern for building applications:
the concrete syntax is a functor on the generic resource grammar library
interface Syntax
and a domain-specific lexicon interface, here
LexMusic
.
All interfaces that are open
ed in the completed model
must be completed.
Notice that the completion of an incomplete
module
may at the same time extend modules of the same type (which need
not be completions). It can also add new judgements in a module body,
and restrict inheritance from the functor.
concrete MusicIta = MusicI - [f] with (Syntax = SyntaxIta), (LexMusic = LexMusicIta) ** { lin f = ... } ;
Interfaces, instances, and parametric modules are purely a
front-end feature of GF: these module types do not exist in
the gfc
and gfr
formats. The compiler has
nevertheless to keep track of their dependencies and modification
times. Here is a summary of how they are compiled:
interface
is compiled into a resource
with an empty body
instance
is compiled into a resource
in union with its
interface
incomplete
module (concrete
or resource
) is compiled
into a module of the same type with an empty body
concrete
or resource
) is compiled
into a module of the same type by compiling its functor so that, instead of
each interface
, its given instance
is used
This means that some generated code is duplicated, because those operations that
do have complete definitions in an interface
are copied to each of
the instances
.
Syntax:
abstract
A =
(A1,...,An **
)?
{
J1 ;
... ;
Jm ; }
where
-[
f,..,g]
or Ai[
f,..,g]
cat, fun, def, data
Semantic conditions:
Syntax:
incomplete
? concrete
C of
A =
(C1,...,Cn **
)?
(open
O1,...,Ok in
)?
{
J1 ;
... ;
Jm ; }
where
-[
f,..,g]
(
Q=
R)
where R is a resource, instance, or concrete, and Q is any identifier
lincat, lin, lindef, printname
; also the forms oper, param
are
allowed, but they cannot be inherited.
If the modifier incomplete
appears, then any R in
an open specification may also be an interface or an abstract.
Semantic conditions:
cat
judgement in A
must have a corresponding, unique
lincat
judgement in C
fun
judgement in A
must have a corresponding, unique
lin
judgement in C
Syntax:
resource
R =
(R1,...,Rn **
)?
(open
O1,...,Ok in
)?
{
J1 ;
... ;
Jm ; }
where
-[
f,..,g]
(
Q=
R)
where P is a resource, instance, or concrete, and Q is any identifier
oper, param
Semantic conditions:
Syntax:
interface
R =
(R1,...,Rn **
)?
(open
O1,...,Ok in
)?
{
J1 ;
... ;
Jm ; }
where
-[
f,..,g]
(
Q=
R)
where P is a resource, instance, or concrete, and Q is any identifier
oper, param
Semantic conditions:
Syntax:
instance
R of
I =
(R1,...,Rn **
)?
(open
O1,...,Ok in
)?
{
J1 ;
... ;
Jm ; }
where
-[
f,..,g]
(
Q=
R)
where P is a resource, instance, or concrete, and Q is any identifier
oper, param
Semantic conditions:
Syntax:
concrete
C of
A =
(C1,...,Cn **
)?
B
with
(
I1 =
J1),
...
, (
Ip =
Jp)
(-
? [
c1,...,cq ]
)?
(**
?
(open
O1,...,Ok in
)?
{
J1 ;
... ;
Jm ; }
)? ;
where
-[
f,..,g]
(
Q=
R)
where R is a resource, instance, or concrete, and Q is any identifier
lincat, lin, lindef, printname
; also the forms oper, param
are
allowed, but they cannot be inherited.