Grammatical Framework Version 2

Highlights, versions 2.0, 2.1, and 2.2 (2.2 coming soon)

13/10/2003 - 25/11 - 2/4/2004 - 18/6 - 13/10 - 16/2/2005

Aarne Ranta

Syntax of GF

An accurate language specification is now available.

Summary of novelties in Versions 2.0 to 2.2

Module system

  • Separate modules for abstract, concrete, and resource.
  • Replaces the file-based include system
  • Name space handling with qualified names
  • Hierarchic structure (single inheritance **) + cross-cutting reuse (open)
  • Separate compilation, one module per file
  • Reuse of abstract+concrete as resource
    Version 2.2: separate reuse modules no longer needed
  • Parametrized modules: interface, instance, incomplete.
  • New experimental module types: transfer, union.
  • Version 2.1: multiple inheritance in module extension.

    Canonical format GFC

  • The target of GF compiler; to reuse, just read in.
  • Readable by Haskell/Java/C++/C applications.
  • Version 2.1: Java interpreter available for GFC (by Björn Bringert).
  • Version 2.2: new optimizations to reduce the size of GFC files

    New features in expression language

  • Disjunctive patterns P | ... | Q.
  • String patterns "foo".
  • Binding token &+ to glue separate tokens at unlexing phase, and unlexer to resolve this.
  • New syntax alternatives for local definitions: let without braces and where.
  • Pattern variables can be used on lhs's of oper definitions.
  • New Unicode transliterations (by Harad Hammarström).
  • Version 2.1: Initial segments of integers (Intsn) available as parameter types.

    New shell commands and command functionalities

  • pi = print_info: information on an identifier in scope.
  • h = help now in long or short form, and on individual commands.
  • gt = generate_trees: all trees of a given category or instantiations of a given incomplete term, up to a given depth.
  • gr = generate_random can now be given an incomplete term as an argument, to constrain generation.
  • so = show_opers shows all ope operations with a given value type.
  • pm = print_multi prints the multilingual grammar resident in the current state to a ready-compiles .gfcm file.
  • Version 2.2: several new command options
  • Version 2.2: vg visializes the module dependency graph
  • All commands have both long and short names (see help). Short names are easier to type, whereas long names make scripts more readable.
  • Meaningless command options generate warnings.

    New editor features

  • Active text field: click the middle button in the focus to send in refinement through the parser.
  • Clipboard: copy complex terms into the refine menu.
  • Version 2.2: text corresponding to subtrees with constraints marked with red colour

    Improved implementation

  • Haskell source code is organized into subdirectories.
  • BNF Converter is used for defining the languages GF and GFC, which also give reliable LaTeX documentation.
  • Lexical rules sorted out by option -cflexer for efficient parsing with large lexica.
  • GHC optimizations and strictness flags are used for improving performance.
  • Version 2.2: started haddock documentation by using uniform module headers

    New parser (work in progress)

  • By Peter Ljunglöf, based on MCFG.
  • Much more efficient for morphology and discontinuous constituents.
  • Treatment of cyclic rules.
  • Version 2.1: improved generation of speech recognition grammars (by Björn Bringert).
  • Version 2.1: output of Labelled BNF files readable by the BNF Converter.

    Abstract, concrete, and resource modules

    Judgement forms are sorted as follows: Example:
      abstract Sums = {
        cat 
          Exp ;
        fun 
          One : Exp ;
          plus : Exp -> Exp -> Exp ;
      }
    
      concrete EnglishSums of Sums = open ResEng in {
        lincat 
          Exp = {s : Str ; n : Number} ;
        lin
          One = expSg "one" ;
          sum x y = expSg ("the" ++ "sum" ++ "of" ++ x.s ++ "and" ++ y.s) ;
      }
    
      resource ResEng = {
        param 
          Number = Sg | Pl ;
        oper 
          expSG : Str -> {s : Str ; n : Number} = \s -> {s = s ; n = Sg} ;
      }
    

    Opening and extending modules

    A concrete or resource can open a resource. This means that A module of any type can moreover extend a module of the same type. This means that Examples of extension:
      abstract Products = Sums ** {
        fun times : Exp -> Exp -> Exp ;
      }
      -- names exported: Exp, plus, times
    
      concrete English of Products = EnglishSums ** open ResEng in {
        lin times x y = expSg ("the" ++ "product" ++ "of" ++ x.s ++ "and" ++ y.s) ;
      }
    

    Opening, but not extension, can be qualified:

      concrete NumberSystems of Systems = open (Bin = Binary), (Dec = Decimal) in {
        lin 
          BZero = Bin.Zero ;
          DZero = Dec.Zero
      }
    

    Version 2.1 introduces multiple inheritance: a module can extend several modules at the same time, for instance,

      abstract Dialogue = User, System ** { ...}
    
    may be used to put together "User's moves" and "System's moves" into one Dialogue System grammar.

    Compiling modules

    Separate compilation assumes there is one module per file.

    The module header is the beginning of the module code up to the first left bracket ({). The header gives

    filename = modulename . extension

    File name extensions:

    Only gf files should ever be written/edited manually!

    What the make facility does when compiling Foo.gf

    1. read the module header of Foo.gf, and recursively all headers from the modules it depends on (i.e. extends or opens)
    2. build a dependency graph of these modules, and do topological sorting
    3. starting from the first module in topological order, compare the modification times of each gf and gfc file:
      • if gf is later, compile the module and all modules depending on it
      • if gfc is later, just read in the module
    Inside the GF shell, also time stamps of modules read into memory are taken into account. Thus a module need not be read from a file if the module is in the memory and the file has not been modified.

    If the compilation of a grammar fails at some module, the state of the GF shell contains all modules read up to that point. This makes it faster to compile the faulty module again after fixing it.

    Use the command po = print_options to see what modules are in the state.

    To force compilation:

    Compiler optimizations

    Version 2.2

    The sometimes exploding size of generated gfc and gfr files has made it urgent to find optimizations that reduce the size of the code. There are five combinations optimizations that can be chosen, as the value of the optimize flag:

    The share and parametrize optimizations are always just good, whereas the values optimization may slow down the use of the table. However, it is very good for grammars mostly consisting of the inflection tables of lexical items: it can reduce the file size by the factor of 4.

    An optimization can be selected individually for each resource and concrete module by including the judgement

      flags optimize=(share|parametrize|values|all|none) ;
    
    in the module body. These flags can be overridden by a flag given in the i command, e.g.
      i -src -optimize=none Foo.gf
    
    Notice that the option -src is needed if there already are generated files created with other optimization flags.

    Module search paths

    Modules can reside in different directories. Use the path flag to extend the directory search path. For instance,
      -path=.:../resource/russian:../prelude
    
    enables files to be found in three different directories. By default, only the current directory is included. If a path flag is given, the current directory . must be explicitly included if it is wanted.

    The path flag can be set in any of the following places:

    A flag set on a command line overrides ones set in files.

    How to use GF 1.* files

    Backward compatibility with respect to old GF grammars has been a central goal. All GF grammars, from version 0.9, should work in the old way in GF2. The main exceptions are some features that are rarely used.

    Very old GF grammars (from versions before 0.9), with the completely different notation, do not work. They should be first converted to GF1 by using GF version 1.2.

    The import command i can be given the option -old. E.g.

      i -old tut1.Eng.g2
    
    But this is no more necessary: GF2 detects automatically if a grammar is in the GF1 format.

    Importing a set of GF2 files generates, internally, three modules:

      abstract tut1 = ...
      resource ResEng = ...
      concrete Eng of tut1 = open ResEng in ...
    
    (The names are different if the file name has fewer parts.)

    The option -o causes GF2 to write these modules into files.

    The flags -abs, -cnc, and -res can be used to give custom names to the modules. In particular, it is good to use the -abs flag to guarantee that the abstract syntax module has the same name for all grammars in a multilingual environmens:

      i -old -abs=Numerals hungarian.gf
      i -old -abs=Numerals tamil.gf
      i -old -abs=Numerals sanskrit.gf
    

    The same flags as in the import command can be used when invoking GF2 from the system shell. Many grammars can be imported on the same command line, e.g.

      % gf2 -old -abs=Tutorial tut1.Eng.gf tut1.Fin.gf tut1.Fra.gf
    

    To write a GF2 grammar back to GF1 (as one big file), use the command

      > pg -old
    

    GF2 has more reserved words than GF 1.2. When old files are read, a preprocessor replaces every identifier that has the shape of a new reserved word with a variant where the last letter is replaced by Z, e.g. instance is replaced by instancZ. This method is of course unsafe and should be replaced by something better.

    Missing features of GF 1.2 (13/10/2004)

    Generally, GF1 grammars can be automatically translated to GF2, although the result is not as good as manual, since indentation and comments are destroyed. The results can be saved in GF2 files, but this is not necessary. Some rarely used GF1 features are no longer supported (see next section). It is also possible to write a GF2 grammar back to GF1, with the command pg -printer=old.

    Resource libraries and some example grammars have been converted. Most old example grammars work without any changes. However, there is a new resource API with many new constructions, and which is recommended.

    Soundness checking of module depencencies and completeness is not complete. This means that some errors may show up too late.

    Latex and XML printing of grammars do not work yet.