Скачать презентацию Applying Word Sketches to Russian Máša Khokhlova St Скачать презентацию Applying Word Sketches to Russian Máša Khokhlova St

330281e1d22793cc3b250475ec010f1b.ppt

  • Количество слайдов: 19

Applying Word Sketches to Russian Máša Khokhlova St. Petersburg State University khokhlova. marie@gmail. com Applying Word Sketches to Russian Máša Khokhlova St. Petersburg State University khokhlova. marie@gmail. com

Word Sketches for Russian Grammatical rules that take into account syntactic constructions of the Word Sketches for Russian Grammatical rules that take into account syntactic constructions of the Russian language based on the morphologically tagged corpus; p Regular expressions and query language IMS Corpus Workbench; p The system searches for tags which correspond to word forms. For example, tag Ncfpnn means common noun (Nc) female gender (f) plural (p) noun case (n). p

Word Sketch Rules Below there is an example of grammatical rules for the phrases Word Sketch Rules Below there is an example of grammatical rules for the phrases «adjective+noun» : *DUAL =a_modifier/modifies 2: "A. . n. " (([word=", "]|[word="или"]){0, 1} [tag="A. . n. "]){0, 7} 1: "N. . . n. " 2: "A. . g. " (([word=", "]|[word="или"]){0, 1}[tag="A. . g. "]) {0, 3} 1: "N. . . g. " 2: "A. . d. " (([word=", "]|[word="или"]){0, 1} [tag="A. . d. "]){0, 3} 1: "N. . . d. " 2: "A. . a. " (([word=", "]|[word="или"]){0, 1} [tag="A. . a. "]){0, 3} 1: "N. . . a. " 2: "A. . i. " (([word=", "]|[word="или"]){0, 1} [tag="A. . i. "]){0, 3} 1: "N. . . i. " 2: "A. . l. " (([word=", "]|[word="или"]){0, 1} [tag="A. . j. "]){0, 3} 1: "N. . . l. "

Word Sketch Rules (2) =Verb X/X Verb 2: [tag= Word Sketch Rules (2) =Verb X/X Verb 2: [tag="V. *"] 1: [tag!="SENT"&tag!=", "&tag!="-"] [lemma=”не”]? 2: [tag="V. *"] =Noun X 2: [tag="N. *"&lemma!="). "] 1: [tag!="SENT"&tag!=", "&tag!="-"&lemma!="). "]

Text Corpora Russian Web Corpus – 190 mln tokens p Rbc (Рос. Бизнес. Консалтинг) Text Corpora Russian Web Corpus – 190 mln tokens p Rbc (Рос. Бизнес. Консалтинг) – 22. 5 mln tokens p Romip (Российский семинар по Оценке Методов Информационного Поиска) – 2. 7 mln tokens p Corpus Linguistics – 2. 7 mln tokens p

Word sketches for the word “čaj” (Russian Web Corpus) Word sketches for the word “čaj” (Russian Web Corpus)

Word sketches for the word “čaj” (news) Word sketches for the word “čaj” (news)

Word sketches for the word “zelenyj” (Russian Web Corpus) Word sketches for the word “zelenyj” (Russian Web Corpus)

Word sketches for the word “imet’” (Russian Web Corpus) Word sketches for the word “imet’” (Russian Web Corpus)

Word sketches for the word “korpus” (texts on corpus linguistics) Word sketches for the word “korpus” (texts on corpus linguistics)

Word sketches for the word “korpus” (news) Word sketches for the word “korpus” (news)

Word sketches for the word “korpus” (Web corpus) Word sketches for the word “korpus” (Web corpus)

Word sketches for the word “polucit’” (texts on corpus linguistics) Word sketches for the word “polucit’” (texts on corpus linguistics)

Word sketches for the word “polucit’” (news) Word sketches for the word “polucit’” (news)

Word sketches for the word “polucit’” (Russian Web Corpus) Word sketches for the word “polucit’” (Russian Web Corpus)

Word sketches for the word “dat’” (Russian Web Corpus) Word sketches for the word “dat’” (Russian Web Corpus)

Word sketches for the word “dat’” (texts on corpus linguistics) Word sketches for the word “dat’” (texts on corpus linguistics)

Word sketches for the word “dat’” (news) Word sketches for the word “dat’” (news)

Thank you! Thank you!