Third Version, 22 May 2005. Completed 1 July.
Second Version, 1 March 2005
First Draft, 7 February 2005
Aarne Ranta
aarne@cs.chalmers.se
Designed to be nice for ordinary programmers to use: by this we mean programmers without training in linguistics.
Mission: to make natural-language applications available for ordinary programmers, in tasks like
cat Prop ; Nat ; fun Even : Nat -> Prop ;Concrete syntax: mapping from abstract syntax trees to strings in a language (English, French, German, Swedish,...)
lin Even x = {s = x.s ++ "is" ++ "even"} ; lin Even x = {s = x.s ++ "est" ++ "pair"} ; lin Even x = {s = x.s ++ "ist" ++ "gerade"} ; lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;We can translate between language via the abstract syntax.
Is it really so simple?
The previous multilingual grammar breaks these rules in many situations:
2 and 3 is even
la somme de 3 et de 5 est pair
wenn 2 ist gerade, dann 2+2 ist gerade
om 2 är jämnt, 2+2 är jämnt
Instead of just strings, we need
parameters, tables, and record types. For instance, French:
param Mod = Ind | Subj ; param Gen = Masc | Fem ; lincat Nat = {s : Str ; g : Gen} ; lincat Prop = {s : Mod => Str} ; lin Even x = {s = table { m => x.s ++ case m of {Ind => "est" ; Subj => "soit"} ++ case x.g of {Masc => "pair" ; Fem => "paire"} } } ;
Which kind of a programmer is easier to find?
In main-stream programming, sorting algorithms are not written by hand but taken from libraries.
In the same way, we want to create grammar libraries that encapsulate basic linguistic facts.
Cf. the Java success story: the language is just a half of the success - libraries are another half.
Even x = let jämn = case <x.n,x.g> of { <Sg,Utr> => "jämn" ; <Sg,Neutr> => "jämnt" ; <Pl,_> => "jämna" } in {s = table { Main => x.s ! Nom ++ "är" ++ jämn ; Inv => "är" ++ x.s ! Nom ++ jämn ; Sub => x.s ! Nom ++ "är" ++ jämn } }To use library functions for syntax and morphology:
Even = predA (regA "jämn") ;
How do we organize and present the library?
Where do we get the data from?
Basic lexicon of structural, common, and irregular words
Basic syntactic structures
Currently,
Semantic coverage: you can express whatever you want.
Usability as library for non-linguists.
(Bonus for linguists:) nice generalizations w.r.t. language families, using the module system of GF.
Semantic correctness: only to produce meaningful expressions.
Example: the following sentences can be generated
colourless green ideas sleep furiously the time is seventy past forty-twoHowever, an applicatio grammar can use a domain-specific semantics to guarantee semantic well-formedness.
(Warning for linguists:) theoretical innovation in syntax is not among the goals (and it would be hidden from users anyway!).
But we do not try to give semantics once and for all for the whole language.
Instead, we expect semantics to be given in application grammars built on semantic models of different domains.
Example application: number theory
fun Even : Nat -> Prop ; -- a mathematical predicate lin Even = predA (regA "even") ; -- English translation lin Even = predA (regA "pair") ; -- French translation lin Even = predA (regA "jämn") ; -- Swedish translationHow could the resource predict that just these translations are correct in this domain?
Application grammars are built by experts of these domains who - thanks to resource grammars - do no more need to be experts in linguistics.
V ; NP ; CN ; Det ; -- verb, noun phrase, common noun, determiner
DetNP : Det -> CN -> NP ; -- combine Det and CN into NP
and_Conj : Conj ;
mkN : Str -> Str -> Str -> Str -> Gender -> N ; -- worst-case nouns mkN : Str -> N ; -- regular nouns
angripa_V = irregV "angripa" "angrep" "angripit" ;
man_N = mkN "man" "mannen" "män" "männen" masculine ;
PassBli : V2 -> NP -> VP ; -- bli överkörd av ngn
Reservations:
Alternative views on sentence formation: Clause, Verbphrase
Finnish paradigms
example use of Finnish oaradigms
French paradigms
example use of French paradigms
French verbs
Italian paradigms
example use of Italian paradigms
Italian verb conjugations
Norwegian paradigms
example use of Norwegian paradigms
Norwegian verbs
Spanish paradigms
example use of Spanish paradigms
Spanish verb conjugations
Swedish paradigms
example use of Swedish paradigms
Swedish verbs
i english/LangEng.gf i swedish/LangSwe.gfTest with random generation, translation, morphological analysis...
i french/VerbsFre.gf mq -cat=VMorpho quiz with phrases (e.g. Swedish clauses):
i swedish/LangSwe.gf mq -cat=ClTranslation quiz with sentences (e.g. sentences from English to Swedish):
i swedish/LangEng.gf i swedish/LangSwe.gf tq -cat=S LangEng LangSwe
concrete AppNor of App = open LangNor, ParadigmsNor in {...}(Note for the users of GF 2.1 and older: the dummy reuse modules and their bulky .gfr versions are no longer needed!)
If you need to convert resource records to strings, and don't want to know the concrete type (as you never should), you can use
Predef.toStr : (L : Type) -> L -> Str ;L must be a linearization type. For instance,
toStr LangNor.CN (ModAP (PositADeg old_ADeg) (UseN car_N)) ---> "gammel bil"
Using the -v option shows if the parser fails because of unknown words.
> p -cat=S -v "jag ska åka till Chalmers" unknown tokens [TS "åka",TS "Chalmers"]Then try to select words that LangX recognizes:
> p -cat=S "jag ska gå till Danmark" UseCl (PosTP TFuture ASimul) (AdvCl (SPredV i_NP go_V) (AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark)))))Use these API structures and extend vocabulary to match your need.
åka_V = lexV "åker" ; Chalmers = regPN "Chalmers" neutrum ;
jgf LangEng.gf LangFre.gfopens the editor with English and French views. The Editor User Manual gives more information on the use of the editor.
A restriction of the editor is that it does not give access to ParadigmsX modules. An IDE environment extending the editor to a grammar programming tool is work in progress.
Who chases mice ? Whom does the lion chase ? The dog chases cats.We build the abstract syntax in two phases:
The concrete syntax of English is built in three phases:
The concrete syntax of Swedish is built upon QuestionsI in a similar way, with the modules QuestionsSwe and. AnimalsSwe.
The concrete syntax of French consists similarly of the modules QuestionsFre and AnimalsFre.
Just issue the following GF commands
i -src AnimalsEng.gf ;; s i -src AnimalsFre.gf ;; s i -src AnimalsSwe.gf ;; s pm | wf animals.gfcmand you get an end-user grammar animals.gfcm.
You can also write the commands in a gfs (GF script) file, say mkAnimals.gfs, and then call GF with
gf <mkAnimals.gfs
You can use the resource grammar as a parser on a special file format, .gfe ("GF examples"). Here is the new source, QuestionsI.gfe, which generates QuestionsI.gf, when you execute the command
gf -examples QuestionsI.gfeOf course, the grammar of any language can be created by parsing any language, as long as they have a common resource API. The use of English resource is generally recommended, because it is smaller and faster to parse than the other languages.
lin Who love_V2 man_N = in Phr "who loves men ?" ;uses as argument variables constants for words that can be found in the lexicon. It is due to this that the example can be parsed. When the resulting rule,
lin Who love_V2 man_N = QuestPhrase (UseQCl (PosTP TPresent ASimul) (QPredV2 who8one_IP love_V2 (IndefNumNP NoNum (UseN man_N)))) ;is read by the GF compiler, the identifiers love_V2 and man_N are not treated as constants, but, following the normal binding rules of functional languages, as bound variables. This is what gives the example method the generality that is needed.
To write linearization rules by examples one thus has to know at least one abstract syntax constant for each category for which one needs a variable.
lin Pope = in NP "the man" {man_N = regN "pope"} ;The resulting linearization rule is initially
lin Pope = DefOneNP (UseN man_N) ;but the substitution changes this to
lin Pope = DefOneNP (UseN (regN "pope")) ;In this way, you do not have to extend the resource lexicon, but you need to open the Paradigms module to compile the resulting term.
Of course, the substituted expressions may come from another language than the main language of the example:
lin Pope = in NP "the man" {man_N = regN "pape" masculine} ;If many substitutions are needed, semicolons are used as separators:
{man_N = regN "pope" ; walk_N = regV "pray"} ;
Language | v0.6 | v0.9 API | Paradigms | Basic lex | Verbs |
Danish | - | X | - | - | - |
English | X | X | X | X | X |
Finnish | X | + | X | X | 0 |
French | X | X | X | X | X |
German | X | - | * | - | - |
Italian | X | X | X | X | X |
Norwegian | - | X | X | X | X |
Russian | X | * | * | - | - |
Spanish | - | X | X | X | X |
Swedish | X | X | X | X | X |
Danish
English:
Finnish: missing many nominal forms of verbs; compiling the heuristic paradigms is slow; the basic lexicon has some erroneous inflectional forms; possessive and interrogative suffixes have no proper lexer.
French: no inverted questions; some verbs in Basic should be reflexive
German
Italian: no omission of unstressed subject pronouns; some verbs in Basic should be reflexive; bad forms of reflexive infinitives
Norwegian: possessives bilen min not included
Russian
Spanish: no omission of unstressed subject pronouns; no switch to dative case for human objects; some verbs in Basic should be reflexive; bad forms of reflexive infinitives; spurious parameter for verb auxiliary inherited from Romance
Swedish: