Morphological Analysis of Hungarian in Noo. J Peter Vajda Hungarian Academy of Sciences Research Morphological Analysis of Hungarian in Noo. J Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics

Summary 1. 2. 3. 4. 5. 6. Hungarian morphology Linguistic resources Some experiments with Summary 1. 2. 3. 4. 5. 6. Hungarian morphology Linguistic resources Some experiments with INTEX/Noo. J The solution Examples Derivation 2

Hungarian morphology l l Agglutinative (and sometimes inflectional) The suffixes l l l Can Hungarian morphology l l Agglutinative (and sometimes inflectional) The suffixes l l l Can have many forms (vowel harmony) Can change the form of the stem (there are groups of variants) l bokor (sg. ) bokr – ok (pl. ); alma (sg. ) almá – k (pl. ) Sometimes begin with a linking vowel l plural: -k / -ak / -ek / -ok / -ök A noun (adj. , num. ) can have ~ 7 -800 forms A verb can have ~ 80 forms Orthography: there are difficulties, when digraphs are doubled l cs cscs ccs, gy gygy ggy

Nominal inflections l l 18 cases (nominative, accusative, dative + grammatical relations which are Nominal inflections l l 18 cases (nominative, accusative, dative + grammatical relations which are expressed by prepositions in French/English) Expression of the possessives by suffixes l Which mark the number, the person, the number of the possessed l l l Anaphorical possessive l l ház-a-m, ház-a-d, ház-a (my/your/his house) ház-a-i-m, ház-a-i-d, ház-a-i (my/your/his houses) A ház Péteré The house is Péter’s; A házak Péteréi The houses are Péter’s The maximal number of inflections can be five l l barát-ai-tok-é-i-t (I can see) those (things) of your friends’

Verbal inflections l l l Two tenses: present, past three modes: indicative, conditional, imperative Verbal inflections l l l Two tenses: present, past three modes: indicative, conditional, imperative definite and indefinite conjugations l l l one special form where the subject is in 1 st person and the object is in the 2 nd: l l Néz-ek egy asztalt Néz-em az asztalt I watch a table I watch the table néz-lek (I watch you) infinitive and „conjugated infinitive” (sometimes subjunctive in French) 5

The resources l Dictionary of Hungarian inflections (Elekfi, ’ 92) l l A traditional The resources l Dictionary of Hungarian inflections (Elekfi, ’ 92) l l A traditional description, profound and exhaustive Two dimensional classification: l l l Vowel harmony (3 classes) and complex features of the stems (stem-types, linking vowel, etc. , 55 classes) Altogether: 1700 different sub-classes (paradigms) systematic differences and similarities are hidden not convenient to use in finite-state transducers We have converted it into a database, where we can retrieve all the forms from 6

The experiments with INTEX/Noo. J l ‘Brute-force’ method l l We created one graph The experiments with INTEX/Noo. J l ‘Brute-force’ method l l We created one graph per sub-class for testing INTEX 1700 sub-graphs 45000 paths in the graphs… Using only dictionaries (. nod) l Dictionary of stems (70000 words) l l Dictionary of suffixes (one million entries) l l l ház, N+C 2 A+stem=1+NW (*)ak, <$1=N+C 2 A+stem=1>{$0, $1 L, N$1 S+ana=PL} (*)am, <$1=N+C 2 A+stem=1>{$0, $1 L, N$1 S+ana=PSe 1} (*)at, <$1=N+C 2 A+stem=1>{$0, $1 L, N$1 S+ana=ACC} (*)at, <$1=N+C 2 A 1+stem=1>{$0, $1 L, N$1 S+ana=ACC} (*)amat, <$1=N+C 2 A+stem=1>{$0, $1 L, N$1 S+ana=PSe 1+ACC} dictionary of lexical forms (which have a zero morpheme as suffix) l ház, N+ana=NOM 7

The linguistic solution l transform the database into a grammar based on morphophonological features The linguistic solution l transform the database into a grammar based on morphophonological features l l The grammatical features of stems and morphemes are in the dictionary The features of the stems and the suffixes can be unified • Grammar • We have to describe the order of the morphemes • Introduce features which select from the allomorphs 8

The order of morphemes for nominals 9 The order of morphemes for nominals 9

The order of morphemes for nominals barát-a-i-tok-é-i-t barát, N +PS +PL +ps_2 +ps_pl +ANAP+i The order of morphemes for nominals barát-a-i-tok-é-i-t barát, N +PS +PL +ps_2 +ps_pl +ANAP+i 10 +ACC

11 11

Morpho-phonological features To introduce features we examine the allomorphs l l l HÁZ - Morpho-phonological features To introduce features we examine the allomorphs l l l HÁZ - A ház, , N+nonj HÁZ - AT ház, , N+nonj+acclink HAJÓ-JA hajó, , N+j HAJÓ - T hajó, , N+j+accnolink 12

The dictionary The dictionary

The plural and the accusative kalap ot kalap - ok - at (hat, SG+ACC) The plural and the accusative kalap ot kalap - ok - at (hat, SG+ACC) (hats, PL+ACC) 15

Derivation l l Can change or leave the category (POS) Introduce new features l Derivation l l Can change or leave the category (POS) Introduce new features l l kosár kosar-as kosar ak (pl. ) basket kosar - as - ok (pl. ) basketball player Simple cases are handled by graphs Others are listed as lemmas in the dictionary

Assimilation and digraphs l some suffixes (eg. val/vel) enforce total assimilation: l LÉC + Assimilation and digraphs l some suffixes (eg. val/vel) enforce total assimilation: l LÉC + VEL PÉCS + VEL PLÉD + VEL l l LÉCCEL PÉCCSEL PLÉDDEL 17

l Conclusion l l We have adapted the traditional description We have described the l Conclusion l l We have adapted the traditional description We have described the inflectional morphology of Hungarian in Noo. J grammars/dictionaries Handled some of the derivational morphology Objectives l Find a simpler method for derivation l Disambiguation Automatic methods to expand the dictionary l l Automatic delegation of features 18

