The GF Resource Grammar Library defines the basic grammar of ten languages: Danish, English, Finnish, French, German, Italian, Norwegian, Russian, Spanish, Swedish. A still incomplete implementation for Arabic is also included.
New in Version 1.1
C
can be built by the function mkC
.
An example of use is logic
.
The API of version 1.0 remains valid and can be used in combination with this.
Inger Andersson and Therese Soderberg (Spanish morphology), Nicolas Barth and Sylvain Pogodalla (French verb list), Ali El Dada (Arabic modules), Janna Khegai (Russian modules), Bjorn Bringert (many Swadesh lexica), Carlos Gonzalía (Spanish cardinals), Harald Hammarström (German morphology), Patrik Jansson (Swedish cardinals), Andreas Priesnitz (German lexicon), Aarne Ranta.
We are grateful for contributions and comments to several other people who have used this and the previous versions of the resource library, including Ludmilla Bogavac, Ana Bove, David Burke, Lauri Carlson, Gloria Casanellas, Karin Cavallin, Robin Cooper, Hans-Joachim Daniels, Elisabet Engdahl, Markus Forsberg, Kristofer Johannisson, Anni Laine, Peter Ljunglöf, Saara Myllyntausta, Wanjiku Ng'ang'a, Jordi Saludes.
The GF Resource Grammar Library is open-source software licensed under GNU General Public License. See the file LICENSE for more details.
Coverage, for each language:
Organization:
Presentation:
gfdoc
for generating HTML from grammars
Go to the main directory, compile the grammars, and run a test.
cd GF/lib/resource-1.0 make make test
This will take quite some time. An alternative is to use the
precompiled grammar package compiled.tgz
.
This package has the necessary gfc
and gfr
files directly under GF/lib
.
GF/lib/alltenses GF/lib/mathematical GF/lib/multimodal GF/lib/present
Do for instance
cd GF/lib/ gf > i -path=present:prelude present/LangEng.gfc > gr -cat=S -number=3 -cf | tb
For more examples, see the Overview slides.
The make
procedure does not make Arabic, but it can
be compiled in a similar way as the other languages.
Finnish, German, Romance, and Scandinavian languages are in isolatin-1.
Arabic and Russian are in UTF-8.
English is in pure ASCII.
This API is accessible by both present
and alltenses
.
The API is divided into a bunch of abstract
modules.
The following figure gives the dependencies of these modules.
The documentation of the individual modules:
Lexicon
Grammar
and Lexicon
This is the structure of each language-dependent top module.
The API is the same as for the full ground API, but the compiler
has ignored all verb and sentence tenses except the present.
Lines ignored in the source files are marked by --# notpresent
.
The result is a smaller and more efficient grammar, which is still
sufficient for many applications.
The API is the same as for the full ground API, but with modified
linearization types of NP
and Adv
, and all other categories
depending on them: an extra field is added to a demonstrative pointing
gesture. Some functions for constructing demonstratives are provided.
The simplest way to get the library is to install the precompiled version
lib/compiled.tgz
. Just do
cd GF/lib tar xvfz compiled.tgz
There is no need to link application grammars to the source directories of the library. Use one (or several) of the following packages instead:
lib/alltenses
the complete ground-API library with all forms
lib/present
a pruned ground-API library with present tense only
lib/mathematical
special-purpose API for mathematical applications
lib/multimodal
the complete ground-API with demonstratives for
multimodal dialogue applications
Typically, open one of
GrammarX
for just syntax
LangX
for both syntax and a small lexicon
X
(e.g. English
) for syntax, lexicon, and language-dependent extensions
Usually you also need your own lexicon, and hence have to open
ParadigmsX
for lexicon-building functions
It is advisable to use the bare package names in paths pointing to the
libraries. Here is an example, from examples/dialogue/LightsEng.gf
:
--# -path=.:alltenses:multimodal:prelude
To reach these directories from anywhere, set the environment variable
GF_LIB_PATH
to point to the directory GF/lib/
. For instance,
I have the following line in my .bashrc
file:
export GF_LIB_PATH=/home/aarne/GF/lib
The mathematical
API shares modules with
present
. It is therefore not a good idea to use it in combination with
alltenses
.
If you have done make
in lib/resource-1.0
, you will have
a file langs.gfcm
. This file can be used with fast startup for
tasks such as treebank generation:
> i -nocf langs.gfcm > gr -cat=S -cf -number=10 | tb
The -nocf
flag saves startup time and memory by preventing the
creation of context-free parse grammars.
The resource grammar libraries do not support
parsing very well. While it is theoretically possible to parse with any
GF grammar, the resource grammars are so abstract and complex that
building the actual parser in memory may just need too much resources
to succeed.
An exception is LangEng
. It is actually feasible to parse with
both alltenses/LangEng
and present/LangEng
- the latter being
much faster than the former. The -fcfg
flag (fast multiple context-free grammar)
must be used:
p -lang=LangEng -fcfg "this man is old"
Parsing with the -fcfg
flag takes a few extra seconds the first time during
each session, but gets faster at later runs. From GF 2.6, fcfg
is the
default parser of GF and the flag is not needed.
It is also possible to parse in Scandinavian languages
(Danish, Norwegian, Swedish) and, with enough memory (gf +RTS -K512M
),
German.
These applications are meant to serve as starting points for new applications, showing how the libraries can be used in typical situations.
The examples/bronzeage grammar set implements a language fragment based on the Swadesh list of 200 words. It is useful for things like language training.
The examples/dialogue grammar set implements the user grammars of some multimodal dialogue system. Its purpose is to serve as a prototype for applications in the TALK project.
The examples/animal grammar set implements some queries about animals. Its purpose is to serve as a prototype for example-based grammar writing.
Danish
English
Finnish
French
German
Italian
Norwegian
Russian
Spanish
Swedish
GF Resource Grammar Library (pdf). Printable user manual with API documentation (version 1.0).
Grammars as Software Libraries. Slides with background and motivation for the resource grammar library.
GF Resource Grammar Library Version 1.0. Slides giving an overview of the library and practical hints on its use.
How to write resource grammars. Helps you start if you want to add another language to the library.
Parametrized modules for Romance languages. Slides explaining some ideas in the implementation of French, Italian, and Spanish.
Grammar writing by examples. Slides showing how the method is used.
Multimodal Resource Grammars.
Slides showing how to use the multimodal resource library. N.B. the library
examples are from multimodal/old
, which is a reduced-size API.