Using GF grammars in Java and OAA

Björn Bringert, bringert@cs.chalmers.se

Introduction

The Grammatical Framework (GF) is a grammar formalism well suited to writing multilingual grammars for natural languages. The system presented in this document implements parsing and linearization using GF grammars.

Notes

Download

If you just need to run the interpreter, getting gfc2java.jar should be enough.

Get the source code for the latest version of the GF interpreter here.

There is also a gfc2java darcs repo with the development version. Get darcs here.

License

The Embedded GF Interpreter is distributed under the GNU Lesser General Public License.

Documentation

Building

The README file describes how to build the library and included demo applications.

Java API

JavaDoc for the GF API.

JavaDoc for the API can also be built using make javadoc.

Writing a program using the Java API

Have a look at the simple demo program. Given the files created above, you can run the program with:

$ java -cp gfc2java.jar:. se.chalmers.cs.gf.translategui.Test test.properties TestSwe TestEng

Note that the current directory must be included in the classpath, since properties file is loaded as a Java resource from the classpath.

Typed abstract syntax trees

The standard parsing and linearization functions use untyped abstract syntax trees. The Grammar2API program in the gfc2java distribution generates Java classes which can represent an abstract syntax tree using typed trees. To generate a Java package named test with classes for the abstract syntax in the file test.gfcm, run:

$ java -cp gfc2java.jar Grammar2API test.gfcm test

For an example of a simple dialog system which uses the Embedded GF Interpreter and type abstract syntax trees, see SimpleDemoText.java.

There is also a slighty larger version of the demo program, SimpleDemo.java, which uses speech recognition for input and speech synthesis for output. This version requires the Dialogutil package, a speech recognizer and a speech synthesizer.

OAA interface

See "GF OAA Agent".

Producing a grammar for the interpreter

The GF interpreter needs two representations of the grammar to do linearization and parsing. These two representations can be generated from a GF source grammar by using the GF system. This example assumes that we have the concrete syntax modules TestEng and TestSwe.

  1. Load all the source modules into GF:

    > i TestEng.gf
    > i TestSwe.gf
    
  2. Create a GFCM file:

    > pm -utf8 -utf8id -printer=header | wf test.gfcm
    
  3. Create a CFGM file:

    > pm -utf8 -utf8id -printer=cfgm | wf test.cfgm
    
  4. Create a properties file (here test.properties) so that the interpreter can find these files. The file should have these contents:

    name: test
    gfcm: test.gfcm
    cfgm: test.cfgm
    

See the build-translet script for an example of how to do this automatically.

Using the translation GUI

The build-translet script will create a JAR-file which runs the translation GUI.

To run the translation GUI on files you have created manually, as in the description above, run:

$ java -cp gfc2java.jar:. se.chalmers.cs.gf.translategui.TranslateApp test.properties

Note that the current directory must be included in the classpath, since properties file is loaded as a Java resource from the classpath.

Specifying which lexer to use

By default, the Embedded GF Interpreter uses a fairly simple lexer. You can write new lexers, which must implement the Lexer interface. To specify which lexer to use, you can add lines such as:

lexer = se.chalmers.cs.gf.parse.lex.SimpleLexer
decimal.lexer = se.chalmers.cs.gf.parse.lex.DigitLexer

to the properties file. lexer specifies the default lexer to use. language.lexer specifies the lexer to use for the language language. The value of the field should be the name of a class (on the classpath) which implements the Lexer interface.

Logging

The Embedded GF Interpreter uses the Java Logging API to handle log messages. The following logging namespaces are used:

By default, little information is logged. To change the log level or other logging parameters, you can use the normal Logging API mechanisms. For example, to print FINE level message for the parse package and inearize packages to the standard error stream, create a file called logging.properties in the current directory:

# Log to System.err:
handlers=java.util.logging.ConsoleHandler

# Set the log level of the log handler to FINEST
java.util.logging.ConsoleHandler.level=FINEST
# Use our log formatter, which just prints log messages and timestamps
java.util.logging.ConsoleHandler.formatter=se.chalmers.cs.gf.util.MilliLogFormatter

# Set global logging level
.level=INFO

# Set log levels for our packages
se.chalmers.cs.gf.parse.level=FINER
se.chalmers.cs.gf.parse.TreeBuilder.level=FINER
se.chalmers.cs.gf.linearize.level=FINER

Then instruct the logging API to use the properties from this file, for example:

$ java -Djava.util.logging.config.file=logging.properties -jar translate-numerals.jar

Demos and examples

Numerals translator

The Numerals translator is a demo applet which uses the Java GF interpreter and the GF numerals grammar to translate numerals between a number of languages.

Tramdemo

Tramdemo demonstrates the use of multimodal grammars as a method of implementing multimodal dialog systems.