25/6 (BB) Added new speech recognition grammar printers for non-recursive SRGS grammars, as used by Nuance Recognizer 9.0. Try pg -printer=srgs_xml_non_rec or pg -printer=srgs_abnf_non_rec.
19/6 (AR) Extended the functor syntax (with modules) so that the functor can have restricted import and a module body (whose function is normally to complete restricted import). Thus the following format is now possible:
concrete C of A = E ** CI - [f,g] with (...) ** open R in {...}At the same time, the possibility of an empty module body was added to other modules for symmetry. This can be useful for "proxy modules" that just collect other modules without adding anything, e.g.
abstract Math = Arithmetic, Geometry ;
18/6 (AR) Added a warning for clashing constants. A constant coming from multiple opened modules was interpreted as "the first" found by the compiler, which was a source of difficult errors. Clashing is officially forbidden, but we chose to give a warning instead of raising an error to begin with (in version 2.8).
30/1/2007 (AR) Semantics of variants fixed for complex types. Officially, it was only defined for basic types (Str and parameters). When used for records, results were multiplicative, which was nor usable. But now variants should work for any type.
22/12 (AR) Release of GF version 2.7.
21/12 (AR) Overloading rules for GF version 2.7:
21/12 (BB) Java Speech Grammar Format with SISR tags can now be generated. Use pg -printer=jsgf_sisr_old. The SISR tags are in Working Draft 20030401 format, which is supported by the OptimTALK VoiceXML interpreter and the IBM XHTML+Voice implementation use by the Opera web browser.
21/12 (BB)
VoiceXML 2.0 dialog systems can now be generated from GF grammars.
Use pg -printer=vxml.
21/12 (BB)
JavaScript code for linearization and type annotation can now be
generated from a multilingual GF grammar. Use pm -printer=js.
5/12 (BB)
A new tool for generating C linearization libraries
from a GFCC file. make gfcc2c in src
compiles the tool. The generated
code includes header files in lib/c and should be linked
against libgfcc.a in lib/c. For an example of
using the generated code, see src/tools/c/examples/bronzeage.
make in that directory generates a GFCC file, then generates
C code from that, and then compiles a program bronzeage-test.
The main function for that program is defined in
bronzeage-test.c.
20/11 (AR) Type error messages in concrete syntax are printed with a
heuristic where a type of the form {... ; lock_C : {} ; ...}
is printed as C. This gives more readable error messages, but
can produce wrong results if lock fields are hand-written or if subtypes
of lock-fielded categories are used.
17/11 (AR)
Operation overloading: an oper can have many types,
from which one is picked at compile time. The types must have different
argument lists. Exact match with the arguments given to the oper
is required. An example is given in
Constructors.gf.
The purpose of overloading is to make libraries easier to use, since
only one name for each grammatical operation is needed: predication, modification,
coordination, etc. The concrete syntax is, at this experimental level, not
extended but relies on using a record with the function name repeated
as label name (see the example). The treatment of overloading is inspired
by C++, and was first suggested by Björn Nringert.
3/10 (AR) A new low-level format gfcc ("Canonical Canonical GF").
It is going to replace the gfc format later, but is already now
an efficient format for multilingual generation.
See GFCC document
for more information.
1/9 (AR) New way for managing errors in grammar compilation:
16/8 (AR) New generation algorithm: slower but works with less
memory. Default of gt; use gt -mem for the old
algorithm. The new option gt -all lazily generates all
trees until interrupted. It cannot be piped to other GF commands,
hence use gt -all -lin to print out linearized strings
rather than trees.
20/6 (AR) The FCFG parser is know the default, as it even handles literals.
The old default can be selected by p -old. Since
FCFG does not support variable bindings, -old is automatically
selected if the grammar has bindings - and unless the -fcfg flag
is used.
17/6 (AR) The FCFG parser is now the recommended method for parsing
heavy grammars such as the resource grammars. It does not yet support
literals and variable bindings.
1/6 (AR) Added the FCFG parser written by Krasimir Angelov. Invoked by
p -fcfg. This parser is as general as MCFG but faster.
It needs more testing and debugging.
1/6 (AR) The command r = reload repeats the latest
i = import command.
30/5 (AR) It is now possible to use the flags -all, -table, -record
in combination with l -multi, and also with tb.
18/5 (AR) Introduced a wordlist format gfwl for
quick creation of language exercises and (in future) multilingual lexica.
The format is now very simple:
3/4 (AR) The predefined abstract syntax type Int now has two
inherent parameters indicating its last digit and its size. The (hard-coded)
linearization type is
31/3 (AR) Added flags and options to some commands, to help generation:
16/3 (AR) Added two flag values to pt -transform=X:
nodup which excludes terms where a constant is duplicated,
and
nodupatom which excludes terms where an atomic constant is duplicated.
The latter, in particular, is useful as a filter in generation:
6/3 (AR) Generalized the gfe file format in two ways:
4/3 (AR) Added command use_treebank = ut for lookup in a treebank.
This command can be used as a fast substitute for parsing, but also as a
way to browse treebanks.
3/3 (AR) Added option -treebank to the i command. This adds treebanks to
the shell state. The possible file formats are
1/3 (AR) Added option -trees to the command tree_bank = tb.
By this option, the command just returns the trees in the treebank. It can be
used for producing new treebanks with the same trees:
1/3 (AR) A .gfe file can have a --# -path=PATH on its
second line. The file given on the first line (--# -resource=FILE)
is then read w.r.t. this path. This is useful if the resource file has
no path itself, which happens when it is gfc-only.
25/2 (AR) The flag preproc of the i command (and thereby
to gf itself) causes GF to apply a preprocessor to each sourcefile
it reads.
8/2 (AR) The command tb = tree_bank for creating and testing against
multilingual treebanks. Example uses:
10/1 (AR) Forbade variable binding inside negation and Kleene star
patterns.
7/1 (AR) Full set of regular expression patterns, with
as-patterns to enable variable bindings to matched expressions:
6/1 (AR) Concatenative string patterns to help morphology definitions...
This can be seen as a step towards regular expression string patterns.
The natural notation p1 + p2 will be considered later.
Note. This was done on 7/1.
5/1/2006 (BB) New grammar printers slf_sub and slf_sub_graphviz
for creating SLF networks with sub-automata.
21/12 (AR) It now works to parse escaped string literals from command
line, and also string literals with spaces:
20/12 (AR) Support for full disjunctive patterns (P|Q) i.e.
not just on top level.
14/12 (BB) The command si (speech_input) which creates
a speech recognizer from a grammar for English and admits speech input
of strings has been added. The command uses an
ATK recognizer and
creates a recognition
network which accepts strings in the currently active grammar.
In order to use the si command,
you need to install the
atkrec library
and configure GF with ./configure --with-atk before compiling.
You need to set two environment variables for the si command to
work. ATK_HOME should contain the path to your copy of ATK
and GF_ATK_CFG should contain the path to your GF ATK configuration
file. A default version of this file can be found in
GF/src/gf_atk.cfg.
11/12 (AR) Parsing of float literals now possible in object language.
Use the flag lexer=literals.
6/12 (AR) Accept param and oper definitions in
concrete modules. The definitions are just inlined in the
current module and not inherited. The purpose is to support rapid
prototyping of grammars.
2/12 (AR) The built-in type Float added to abstract syntax (and
resource). Values are stored as Haskell's Double precision
floats. For the syntax of float literals, see BNFC document.
NB: some bug still prevents parsing float literals in object
languages. Bug fixed 11/12.
1/12 (BB,AR) The command at = apply_transfer, which applies
a transfer function to a term. This is used for noncompositional
translation. Transfer functions are defined in a special transfer
language (file suffix .tr), which is compiled into a
run-time transfer core language (file suffix .trc).
The compiler is included in GF/transfer. The following is
a complete example of how to try out transfer:
17/11 (AR) Made it possible for lexers to be nondeterministic.
Now with a simple-minded implementation that the parser is sent
each lexing result in turn. The option -cut is used for
breaking after first lexing leading to successful parse. The only
nondeterministic lexer right now is -lexer=subseqs, which
first filters with -lexer=ignore (dropping words neither in
the grammar nor literals) and then starts ignoring other words from
longest to shortest subsequence. This is usable for parser tasks
of keyword spotting type, but expensive (2n) in long input.
A smarter implementation is therefore desirable.
14/11 (AR) Functions can be made unparsable (or "internal" as
in BNFC). This is done by i -noparse=file, where
the nonparsable functions are given in file using the
line format --# noparse Funs. This can be used e.g. to
rule out expensive parsing rules. It is used in
lib/resource/abstract/LangVP.gf to get parse values
structured with VP, which is obtained via transfer.
So far only the default (= old) parser generator supports this.
14/11 (AR) Removed the restrictions how a lincat may look like.
Now any record type that has a value in GFC (i.e. without any
functions in it) can be used, e.g. {np : NP ; cn : Bool => CN}.
To display linearization values, only l -record shows
nice results.
9/11 (AR) GF shell state can now have several abstract syntaxes with
their associated concrete syntaxes. This allows e.g. parsing with
resource while testing an application. One can also have a
parse-transfer-lin chain from one abstract syntax to another.
7/11 (BB) Running commands can now be interrupted with Ctrl-C, without
killing the GF process. This feature is not supported on Windows.
1/11 (AR) Yet another method for adding probabilities: append
--# prob Double to the end of a line defining a function.
This can be (1) a .cf rule (2) a fun rule, or
(3) a lin rule. The probability is attached to the
first identifier on the line.
1/11 (BB) Added generation of weighted SRGS grammars. The weights
are calculated from the function probabilities. The algorithm
for calculating the weights is not yet very good.
Use pg -printer=srgs_xml_prob.
31/10 (BB) Added option for converting grammars to SRGS grammars in XML format.
Use pg -printer=srgs_xml.
31/10 (AR) Probabilistic grammars. Probabilities can be used to
weight random generation (gr -prob) and to rank parse
results (p -prob). They are read from a separate file
(flag i -probs=File, format --# prob Fun Double)
or from the top-level grammar file itself (option i -prob).
To see the probabilities, use pg -printer=probs.
12/10 (AR) Flag -atoms=Int to the command gt = generate_trees
takes away all zero-argument functions except Int per category. In
this way, it is possible to generate a corpus illustrating each
syntactic structure even when the lexicon (which consists of
zero-argument functions) is large.
6/10 (AR) New commands dc = define_command and
dt = define_tree to define macros in a GF session.
See help for details and examples.
5/10 (AR) Printing missing linearization rules:
pm -printer=missing. Command g = grep,
which works in a way similar to Unix grep.
5/10 (PL) Printing graphs with function and category dependencies:
pg -printer=functiongraph, pg -printer=typegraph.
20/9 (AR) Added optimization by common subexpression elimination.
It works on GFC modules and creates oper definitions for
subterms that occur more than once in lin definitions. These
oper definitions are automatically reinlined in functionalities
that don't support opers in GFC. This conversion is done by
module and the opers are not inherited. Moreover, the subterms
can contain free variables which means that the opers are not
always well typed. However, since all variables in GFC are type-specific
(and local variables are lin-specific), this does not destroy
subject reduction or cause illegal captures.
18/9 (AR) Removed superfluous spaces from GFC printing. This shrinks
the GFC size by 5-10%.
15/9 (AR) Fixed some bugs in dependent-type type checking of abstract
modules at compile time. The type checker is more severe now, which means
that some old grammars may fail to compile - but this is usually the
right result. However, the type checker of def judgements still
needs work.
14/9 (AR) Added printing of grammars to a format without parameters, in
the spirit of Peanos "Latino sine flexione". The command pg -unpar
does the trick, and the result can be saved in a gfcm file. The generated
concrete syntax modules get the prefix UP_. The translation is briefly:
14/9 (BB) Added finite state approximation of grammars.
Internally the conversion is done cfg -> regular -> fa -> slf, so the
different printers can be used to check the output of each stage.
The new options are:
4/9 (AR) Added the option pg -printer=stat to show
statistics of gfc compilation result. To be extended with new information.
The most important stats now are the top-40 sized definitions.
1/7 (AR) Added the flag -o to the vt command
to just write the .dot file without going to .ps
(cf. 20/6).
29/6 (AR) The printer used by Embedded Java GF Interpreter
(pm -header) now produces
working code from all optimized grammars - hence you need not select a
weaker optimization just to use the interpreter. However, the
optimization -optimize=share usually produces smaller object
grammars because the "unoptimizer" just undoes all optimizations.
(This is to be considered a temporary solution until the interpreter
knows how to handle stronger optimizations.)
27/6 (AR) The flag flags optimize=noexpand placed in a
resource module prevents the optimization phase of the compiler when
the .gfr file is created. This can prevent serious code
explosion, but it will also make the processing of modules using the
resource slowwer. A favourable example is lib/resource/finnish/ParadigmsFin.
23/6 (HD,AR) The new editor GUI gfeditor by Hans-Joachim
Daniels can now be used. It is based on Janna Khegai's jgf.
New functionality include HTML display (gfeditor -h) and
programmable refinement tooltips.
23/6 (AR) The flag unlexer=finnish can be used to bind
Finnish suffixes (e.g. possessives) to preceding words. The GF source
notation is e.g. "isä" ++ "&*" ++ "nsa" ++ "&*" ++ "ko",
which unlexes to "isänsäkö". There is no corresponding lexer
support yet.
22/6 (PL,AR) The MCFG parser (p -mcfg) now works on all
optimized grammars - hence you need not select a weaker optimization
to use this parser. The same concerns the CFGM printer (pm -printer=cfgm).
20/6 (AR) Added the command visualize_tree = vt, to
display syntax trees graphically. Like vg, this command uses
GraphViz and Ghostview. The foremost use is to pipe the parser to this
command.
17/6 (BB) There is now support for lists in GF abstract syntax.
A list category is declared as:
cat [C]{n} is equivalent to the declarations:
A lincat declaration on the form:
10/6 (AR) Preprocessor of .gfe files can now be performed as part of
any grammar compilation. The flag -ex causes GF to look for
the .gfe files and preprocess those that are younger
than the corresponding .gf files. The files are first sorted
and grouped by the resource, so that each resource only need be compiled once.
10/6 (AR) Editor GUI can now be alternatively invoked by the shell
command gf -edit (equivalent to jgf).
10/6 (AR) Editor GUI command pc Int to pop Int
items from the clip board.
4/6 (AR) Sequence of commands in the Java editor GUI now possible.
The commands are separated by ;; (notice the space on
both sides of the two semicolons). Such a sequence can be sent
from the "GF Command" pop-up field, but is mostly intended
for external processes that communicate with GF.
3/6 (AR) The format .gfe defined to support
grammar writing by examples. Files of this format are first
converted to .gf files by the command
31/5 (AR) Default of p -rawtrees=k changed to 999999.
31/5 (AR) Support for restricted inheritance. Syntax:
29/5 (AR) Parser support for reading GFC files line per line.
The category Line in GFC.cf can be used
as entrypoint instead of Grammar to achieve this.
28/5 (AR) Environment variables and path wild cards.
26/5/2005 (BB) Notation for list categories.
Predef.Error : Type ;
Predef.error : Str -> Predef.Error ;
Denotationally, Error is the empty type and thus a
subtype of any other types: it can be used anywhere. But the
error function is not canonical. Hence the compilation
is interrupted when (error s) is translated to GFC, and
the message s is emitted. An example use is given in
english/ParadigmsEng.gf:
regDuplV : Str -> V ;
regDuplV fit =
case last fit of {
("a" | "e" | "i" | "o" | "u" | "y") =>
Predef.error (["final duplication makes no sense for"] ++ fit) ;
t =>
let fitt = fit + t in
mkV fit (fit + "s") (fitt + "ed") (fitt + "ed") (fitt + "ing")
} ;
This function thus cannot be applied to a stem ending with a vowel,
which is exactly what we want. In future, it may be good to add similar
checks to all morphological paradigms in the resource.
22/6 (AR) Release of GF version 2.6.
# Svenska - Franska - Finska
berg - montagne - vuori
klättra - grimper / escalader - kiivetä / kiipeillä
but can be extended to cover paradigm functions in addition to just
words.
{s : Str ; size : Predef.Ints 1 ; last : Predef.Ints 9}
The size field has value 1 for integers greater than 9, and
value 0 for other integers (which are never negative). This parameter can
be used e.g. in calculating number agreement,
Risala i = {s = i.s ++ table (Predef.Ints 1 * Predef.Ints 9) {
<0,1> => "risalah" ;
<0,2> => "risalatan" ;
<0,_> | <1,0> => "rasail" ;
_ => "risalah"
} ! <i.size,i.last>
} ;
Notice that the table has to be typed explicitly for Ints k,
because type inference would otherwise return Int and therefore
fail to expand the table.
21/3/2006 Release of GF 2.5.
gt -cat=Cl | pt -transform=nodupatom
This gives a corpus where words don't (usually) occur twice in the same clause.
A minor novelty is that the --# -resource=FILE flag can now be
relative to GF_LIB_PATH, both for grammars and treebanks.
The flag --# -treebank=IDENT gives the language whose treebank
entries are used, in case of a multilingual treebank.
ut "He adds this to that" | l -multi -- use treebank lookup as parser in translation
ut -assocs | grep "ComplV2" -- show all associations with ComplV2
Notice that the treebanks in shell state are unilingual, and have strings as keys.
Multilingual treebanks have trees as keys. In case 1, one unilingual treebank per
language is built in the shell state.
rf old.xml | tb -trees | tb -xml | wf new.xml
Recall that only treebanks in the XML format can be read with the -trees
and -c flags.
gr -cat=S -number=100 | tb -xml | wf tb.xml -- random treebank into file
rf tb.txt | tb -c -- read comparison treebank from file
The last three apply to all types of patterns, the first two only to token strings.
Example: plural formation in Swedish 2nd declension
(pojke-pojkar, nyckel-nycklar, seger-segrar, bil-bilar):
plural2 : Str -> Str = \w -> case w of {
pojk + "e" => pojk + "ar" ;
nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ;
bil => bil + "ar"
} ;
Semantics: variables are always bound to the first match, in the sequence defined
as the list Match p v as follows:
Match (p1|p2) v = Match p1 v ++ Match p2 v
Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s]
Match p* s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
Match c v = [[]] if c == v -- for constant patterns c
Match x v = [[(x,v)]] -- for variable patterns x
Match x@p v = [[(x,v)]] + M if M = Match p v /= []
Match p v = [] otherwise -- failure
Examples:
22/12 Release of GF 2.4.
gf examples/tram0/TramEng.gf
> p -lexer=literals "I want to go to \"Gustaf Adolfs torg\" ;"
QInput (GoTo (DestNamed "Gustaf Adolfs torg"))
% cd GF/transfer
% make -- compile the trc compiler
% cd examples -- GF/transfer/examples
% ../compile_to_core -i../lib numerals.tr
% mv numerals.trc ../../examples/numerals
% cd ../../examples/numerals -- GF/examples/numerals
% gf
> i decimal.gf
> i BinaryDigits.gf
> i numerals.trc
> p -lang=Cncdecimal "123" | at num2bin | l
1 0 0 1 1 0 0 1 1 1 0
Other relevant commands are:
For more information on the commands, see help. Documentation on
the transfer language: to appear.
As a by-product, the probabilistic random generation algorithm is
available for any context-free abstract syntax. Use the flag
gr -cf. This algorithm is much faster than the
old (more general) one, but it may sometimes loop.
The optimization is triggered by the flag optimize=OPT_subs,
where OPT is any of the other optimizations (see h -optimize).
The most aggressive value of the flag is all_subs. In experiments,
the size of a GFC module can shrink by 85% compared to plain all.
(P => T)* = T*
(t ! p)* = t*
(table {p => t ; ...})* = t*
In order for this to be maximally useful, the grammar should be written in such
a way that the first value of every parameter type is the desired one. For
instance, in Peano's case it would be the ablative for noun cases, the singular for
numbers, and the 2nd person singular imperative for verb forms.
1/7 Release of GF 2.3.
cat [C]
or
cat [C]{n}
where C is a category and n is a non-negative integer.
cat [C] is equivalent to cat [C]{0}. List category
syntax can be used whereever categories are used.
cat ListC
fun BaseC : C^n -> ListC
fun ConsC : C -> ListC -> ListC
where C^0 -> X means X, and C^m (where
m > 0) means C -> C^(m-1).
lincat [C] = T
is equivalent to
lincat ListC = T
The linearizations of the list constructors are written
just like they would be if the function declarations above
had been made manually, e.g.:
lin BaseC x_1 ... x_n = t
lin ConsC x xs = t'
gf -examples File.gfe
See
../lib/resource/doc/examples/QuestionsI.gfe
for an example.
M -- inherit everything from M, as before
M [a,b,c] -- only inherit constants a,b,c
M-[a,b,c] -- inherit everything except a,b,c
Caution: there is no check yet for completeness and
consistency, but restricted inheritance can create
run-time failures.