Using GF grammars in Java and OAA

Björn Bringert, bringert@cs.chalmers.se

Introduction

The Grammatical Framework (GF) is a grammar formalism well suited to writing multilingual grammars for natural languages. The system presented in this document implements parsing and linearization using GF grammars.

Notes

Download

If you just need to run the interpreter, getting gfc2java.jar should be enough.

Get the source code for the latest version of the GF interpreter here.

There is also a gfc2java darcs repo with the development version. Get darcs here. Check out a copy:

$ darcs get --partial --set-scripts-executable http://www.cs.chalmers.se/~bringert/darcs/gfc2java/

License

The Embedded GF Interpreter is distributed under the GNU Lesser General Public License.

Documentation

Building

The README file describes how to build the library and included demo applications.

Producing a grammar for the interpreter

The GF interpreter needs two representations of the grammar to do linearization and parsing. These two representations can be generated from a GF source grammar by using the GF system. This example assumes that we have the concrete syntax modules TestEng and TestSwe.

  1. Load all the source modules into GF:

    > i TestEng.gf
    > i TestSwe.gf
    
  2. Create a GFCM file:

    > pm -utf8 -utf8id -printer=header | wf test.gfcm
    
  3. Create a CFGM file:

    > pm -utf8 -utf8id -printer=cfgm | wf test.cfgm
    
  4. Create a properties file (here test.properties) so that the interpreter can find these files. The file should have these contents:

    name: test
    gfcm: test.gfcm
    cfgm: test.cfgm
    

See the build-translet script for an example of how to do this automatically.

Java API

JavaDoc for the GF API.

JavaDoc for the API can also be built using make javadoc.

Writing a program using the Java API

Have a look at the simple demo program. Given the files created above, you can run the program with:

$ java -cp gfc2java.jar:. se.chalmers.cs.gf.translategui.Test test.properties TestSwe TestEng

Note that the current directory must be included in the classpath, since properties file is loaded as a Java resource from the classpath.

Typed abstract syntax trees

The standard parsing and linearization functions use untyped abstract syntax trees. The Grammar2API program in the gfc2java distribution generates Java classes which can represent an abstract syntax tree using typed trees. To generate a Java package named test with classes for the abstract syntax in the file test.gfcm, run:

$ java -cp gfc2java.jar Grammar2API test.gfcm test

For an example of a simple dialog system which uses the Embedded GF Interpreter and typed abstract syntax trees, see SimpleDemoText.java. This examples uses these grammars. There is a Darcs repo for the complete demo program.

There is also a slighty larger version of the demo program, SimpleDemo.java, which uses speech recognition for input and speech synthesis for output. This version requires the Dialogutil package, a speech recognizer and a speech synthesizer.

OAA interface

See "GF OAA Agent".

Using the translation GUI

The build-translet script will create a JAR-file which runs the translation GUI.

To run the translation GUI on files you have created manually, as in the description above, run:

$ java -cp gfc2java.jar:. se.chalmers.cs.gf.translategui.TranslateApp test.properties

Note that the current directory must be included in the classpath, since properties file is loaded as a Java resource from the classpath.

Specifying which lexer to use

By default, the Embedded GF Interpreter uses a fairly simple lexer. You can write new lexers, which must implement the Lexer interface. To specify which lexer to use, you can add lines such as:

lexer = se.chalmers.cs.gf.parse.lex.SimpleLexer
decimal.lexer = se.chalmers.cs.gf.parse.lex.DigitLexer

to the properties file. lexer specifies the default lexer to use. language.lexer specifies the lexer to use for the language language. The value of the field should be the name of a class (on the classpath) which implements the Lexer interface.

Specifying which unlexer to use

By default, the Embedded GF Interpreter uses a fairly simple unlexer. You can write new unlexers, which must implement the Unlexer interface. To specify which unlexer to use, you can add lines such as:

unlexer = se.chalmers.cs.gf.linearize.unlex.TextUnlexer
decimal.unlexer = se.chalmers.cs.gf.parse.unlex.SimpleUnlexer

to the properties file. unlexer specifies the default unlexer to use. language.unlexer specifies the unlexer to use for the language language. The value of the field should be the name of a class (on the classpath) which implements the Unlexer interface.

When implementing a custom unlexer, the easiest path is probably to extend an exisiting unlexer class, such as BasicUnlexer or SimpleUnlexer.

Logging

The Embedded GF Interpreter uses the Java Logging API to handle log messages. The following logging namespaces are used:

By default, little information is logged. To change the log level or other logging parameters, you can use the normal Logging API mechanisms. For example, to print FINE level message for the parse package and inearize packages to the standard error stream, create a file called logging.properties in the current directory:

# Log to System.err:
handlers=java.util.logging.ConsoleHandler

# Set the log level of the log handler to FINEST
java.util.logging.ConsoleHandler.level=FINEST
# Use our log formatter, which just prints log messages and timestamps
java.util.logging.ConsoleHandler.formatter=se.chalmers.cs.gf.util.MilliLogFormatter

# Set global logging level
.level=INFO

# Set log levels for our packages
se.chalmers.cs.gf.parse.level=FINER
se.chalmers.cs.gf.parse.TreeBuilder.level=FINER
se.chalmers.cs.gf.linearize.level=FINER

Then instruct the logging API to use the properties from this file, for example:

$ java -Djava.util.logging.config.file=logging.properties -jar translate-numerals.jar

Demos and examples

SimpleDemo

The smallest example program is SimpleDemo (darcs repository). It is a very small dialog system which uses typed abstract syntax trees.

SimpleDemoText is the simplest version, it uses text for input and output. The SimpleDemo program uses input from a speech frecognizer and sends output to a speech synthesizer. The latter program requires the Dialogutil package, a speech recognizer and a speech synthesizer.

Both programs use the same user and system grammars. The user and system packages used by these programs are produced by the Grammar2API tool included with the Embedded GF Interpreter. Look at the SimpleDemo Makefile to see how this is done.

Numerals translator

The Numerals translator is a demo applet which uses the Java GF interpreter and the GF numerals grammar to translate numerals between a number of languages.

Tramdemo

Tramdemo demonstrates the use of multimodal grammars as a method of implementing multimodal dialog systems.