9ed6bf56349c3f2614556574829e6990.ppt
- Количество слайдов: 10
Populating the Infrastructure using Standards Daan Broeder CLARIN NL EB TLA - MPI for Psycholinguistics CLARIN Coordinators Meeting June 29, 30 Budapest
CLARIN NL Context 4 Dutch CLARIN centers each with their own interests and traditions § § DANS, Dutch Academy data archiving service INL, Dutch Institute for Lexicography Meertens Institute, Dutch dialects and language variation MPI for Psycholinguistics, Endangered Languages, acquisition corpora § Different cross center relations § Organizational relations § Past and existing project cooperation § Can all lead to different preferences for technical solutions, interoperability approaches and data-formats § All have production environments that need to deliver services, so they tend to be conservative with changes § New technology needs to be understood first and usually parallel systems are created § General adaptations for CLARIN requirements can only be slowly introduced § Although centers made commitments, resources are limited.
CLARIN NL Goals § Build and support relevant central infrastructure services § Guide harmonizing the relevant practices and systems at the centers by long-term funded projects § § Accept and deliver CLARIN metadata (CMDI) for LRT resources Use PIDs to identify resources Federated Identity management as an AAI solution Use CLARIN recommended formats… § Connect these to the Dutch LRT research world § Offering access to resources and technology § Offering infrastructure services: e. g. catalog of LRs § Run LT services as standardized web-services § Therefore: § infrastructure projects for and by the centers § small short-term projects cross-linking research groups with CLARIN centers
Infrastructure Projects § Creating and testing CLARIN metadata components § Two major Dutch Language Resource centers testing CMDI for their resources § Infrastructure Integration Project § Building & maintaining registries: § ISO-Cat, REL-Cat § CMDI Component registry, ARBIL metadata editor § Planning and supporting the AAI for the CLARIN centers and user organizations § For format & tag set standards we look to CLARIN EU documentation, but. . § Archivable format + installed base = ok § Should be reluctant to adopt new formats § Search and Development § Federated content search for the CLARIN centers § In cooperation with the CLARIN EU EDC initiative § Find we have to extend the SRU/CQL standard § CLAVAS, CLARIN Vocabulary Service
CLARIN NL Sub-projects Project AAM-LR Adelheid ADEPT Description Standard. & Interop. Issues Automatic Annotation of Multi-modal ISO-Cat (audio TDG), Web-services Language Resources A Distributed Lemmatizer for Historical Web-services (CLAM) Dutch Assaying Differences via Edit-Distance of Web-app PID (Cool-URI with username) Pronunciation Transcriptions Center MPI MI DUELME-LMF Converting DUELME into LMF format ISO-Cat, LMF INL INTER-VIEWS Curation of Interview Data PIDs (URN resolver, resource fragments) DANS MIMORE Microcomparative Morphosyntax Research Tool Own format development MI Sign. Linc Linking lexical databases and annotated corpora of signed languages ISO-Cat (Gesture TDG), Open/closed metadata, formats (LMF, EAF) MPI TICClops TDS-Curator Text-Induced Corpus Clean-up online processing system A web-services architecture to curate the Typological Database System TQE Transcription Quality Evaluation WFT-GTB Dutch-Flemish project to enable SSH researchers access to existing (STEVIN) HLT tools via web services DANS AAI (CLAMless), Fomats (WAV, Text. Grid) Integrating the Wurdboek fan 'e Fryske Taal into the Geïntegreerde Taalbank TTNWW (Long-term) INL MPI INL Web-services (CLAM), corpus formats & tagsets (D-COI, CGN/So. Na. R, LASSY, proposed Folia format) several
CLARIN standards info § § § CLARIN EU website. CLARIN EU FAQ has a few standard recommendations and a CLARIN Standardization Action Plan. There was some criticism about the ‘too theoretical’ content of this document. CLARIN short guide http: //www. clarin. eu/files/standards-CLARINShort. Guide. pdf. The references in this document are out of date. The CLARIN EU standardization action plan: http: //www. clarin. eu/node/2841 also has a list of recommended standards and best practices and points to open issues and the CLARIN position. CLARIN official documents: there is a document with a very large enumeration of LR&T standards and best practices, but contains no specific recommendation http: //www-sk. let. uu. nl/u/D 5 C-3. pdf CLARIN NL Helpdesk has a FAQ with a standards section: http: //trac. clarin. nl/trac/wiki/Wiki. Start#Formatsandstandards references to known CLARIN docs
CLARIN Standards for LRT v 6 Standards for LRT V 6 -3. pdf (http: //www. clarin. eu/system/files/Standards%20 for%20 LRT-v 6. pdf): Marc Kemps-Snijders, Núria Bel, Peter Wittenburg, Daan Broeder, Dieter van Uytvanck (CLARIN), Laurent Romary (ISOTC 37, TEI), Erhard Hinrichs (CLARIN) and Gerhard Budin (Flarenet) – January 2009 § Each known name of a standard or best-practice guideline is commented according to a few criteria: § Standard indicates whether it is a standard (++), a best practice in the field (+) or simply known (0) § State indicates the state: proven (++), ready (+) or in progress (0) § Pivot indicates whether the guideline is meant as a pivot mechanism § Advise indicates whether in CLARIN the usage should be obligatory (++), recommended (+) or whether CLARIN is neutral (0)
Standards for LRT v 6 example Name Standard State TEI Tags + ISO 16642 TMF Pivot Advise Function Comment ++ + various tag sets defined by TEI (P 5) will be supported by CLARIN when elements are required ++ ++ + Terminology Markup Framework OLAC + ++ + Added refinements on DC elements Should be supported as a simple pivot format IMDI + ++ + More detailed description set for various LRs is a widely used format and will be supported in CLARIN; elements will be in ISOcat TEI Header (header module) + ++ + Specification of a wide number of elements that can be used as metadata elements Selected set wil be supported in CLARIN …. … +
Recommendations § Create a CLARIN EU standard registry of the form as in the “standards for LRT” doc § Setup a governance structure § With adequate representation of the § National CLARIN partners § Kindred organizations & projects as DARIAH, Flarenet, ISOTC 37 § But with emphasis on practicality § Create additional documentation as recipe books to support further uptake and application.
Thank you for your attention CLARIN has received funding from the European Community's Seventh Framework Programme under grant agreement n° 212230
9ed6bf56349c3f2614556574829e6990.ppt