Скачать презентацию Using ESDS data in Linguistics and NLP Dr Скачать презентацию Using ESDS data in Linguistics and NLP Dr

70dbf3eb2695fa86c6772055c1f88729.ppt

  • Количество слайдов: 26

Using ESDS data in Linguistics and NLP Dr. Kakia Chatsiou ESDS/UK Data Archive Language Using ESDS data in Linguistics and NLP Dr. Kakia Chatsiou ESDS/UK Data Archive Language and Computation Group Day 07 Oct 2011 http: //lac. essex. ac. uk/lacday 2011

What is ESDS? • Economic and Social Data Service • national data archiving and What is ESDS? • Economic and Social Data Service • national data archiving and dissemination service (since January 2003) • access and specialist support for key economic and social data resources to UK Higher and Further Education users • brings together centres of expertise in data creation, dissemination, preservation and use in Manchester and Essex • managed by the UK Data Archive (established in 1967); jointly supported by Economic and Social Research Council (ESRC) & Joint Information Systems Committee (JISC)

http: //www. esds. ac. uk http: //www. esds. ac. uk

ESDS in numbers • 6, 000 datasets in the collection • 230 new datasets ESDS in numbers • 6, 000 datasets in the collection • 230 new datasets added each year • over 22, 000 registered users • approximately 60, 000 downloads worldwide p. a. • 3, 000+ user support queries

Data collections we hold Through our dedicated services we provide access to: • • Data collections we hold Through our dedicated services we provide access to: • • surveys government data aggregate statistics censuses international data longitudinal data qualitative data - multimedia data sources historical data

ESDS Linguistics data offers Year Data offers 2004 2 2005 6 2006 11 2007 ESDS Linguistics data offers Year Data offers 2004 2 2005 6 2006 11 2007 15 2008 13 2009 10 2010 7 2011 (Jan) 3 Total 67 • From ESRC grants • 19 accepted • rest unable to accept (due to confidentiality or size reasons) or referred to more suitable archives (e. g. Oxford Text Archive, CHILDES/Talkbank database) • increase in depositing after researcher selfarchive (UKDA-Store) launch

ESDS data holdings on linguistics & related fields • 40 main catalogue data collections ESDS data holdings on linguistics & related fields • 40 main catalogue data collections with language and linguistics subject category, accessible from the main ESDS Data Catalogue (14 qualitative, 18 quantitative, 8 historical) • all qualitative studies comprising of in-depth interview transcripts or audio recordings can be used as corpus material or data sources for secondary analysis e. g. Family Life And Work Experience Before 1918 (Edwardians) (SN 2000), Pioneers interview collections • 13 UKDA-Store data collections with ‘linguistics’ as the primary discipline.

Examples of ESDS data collections with subject term “Language and Linguistics” 6228 Discourse of Examples of ESDS data collections with subject term “Language and Linguistics” 6228 Discourse of the School Dinners Debate, 2004 -2008 6402 Urban Classroom Culture and Interaction, 2005 -2007 6790 Dynamic Variability in Speech: a Forensic Phonetic Study of British English, 2006 -2007 6259 Identities in Neighbour Discourse: Community, Conflict and Exclusion, 2004 -2006 5271 British Migrants in Spain: the Extent and Nature of Social Integration, 2003 -2005 6127 Linguistic Innovators: the English of Adolescents in London, 2004 -2005 5200 Devolution and Identity in Northern Ireland: a Longitudinal Discursive Study, 2003 -2004 4457 Phonological Memory as a Predictor of Language Development in Down Syndrome, 1995 and 2001 4634 Transnational Seafarers, 1999 -2001 4632 Dutch Map Task Corpus, 1999 3991 Profiling Elements of Prosodic Systems in Children (PEPS-C), 1997 -1998 3556 Age of Acquisition, Frequency, Concreteness and Imageability Ratings for Welsh Words and Their English Equivalents, 1995 -1996 5487 Literary Practices and the Mass-Observation Project, 1992 -1993 3435 Welsh Social Survey, 1992; Including Welsh House Condition Survey, 1992 4896 English People, 1965 -1990 4897 Language People, 1965 -1986 2715 Northern Ireland Transcribed Corpus of Speech, 1973 -1980 430 U. K. County Data, 1851 -1966 5251 Study of the Abelam of Papua New Guinea and the Nso of Cameroon, 1939 -1963 2947 Susanne Corpus, 1961 3821 Social History of the Welsh Language : Evidence of the 1891 Census; Project 2

Examples of linguistics data holdings in UKDA-Store Examples of linguistics data holdings in UKDA-Store

Linguist users of ESDS data • 51 self-reported linguists (out of around 22, 000) Linguist users of ESDS data • 51 self-reported linguists (out of around 22, 000) • about 30 of these downloaded ESDS data, the majority of them being survey data, then qualitative interviews and a few historical data downloads • the rest might well have accessed documentation, study methods and instruments about studies (but since these do not require registration, we cannot report usage)

How linguists have used ESDS data • a researcher and their team based at How linguists have used ESDS data • a researcher and their team based at the University of Sheffield used 2 audio collections for analysis of speech patterns (SN 2000 - Edwardians, SN 5407 - Health And Social Consequences Of The Foot And Mouth Disease Epidemic In North Cumbria) • an ESRC joint project between the UK Data Archive and the Language Processing team at the University of Edinnburgh used three classic social science collections to test natural language processing tools. They looked at named entity recognition on typical social science data interviews. Personbased identification enabled the testing of an anonymisation tool.

ESDS data uses by Linguists • a JISC project between EDINA and the UK ESDS data uses by Linguists • a JISC project between EDINA and the UK Data Archive using the HISTPOP collection at the UK Data Archive to augment resource search and discovery methods. – data and metadata were fed to Geo. Dig. Ref and LTG Geo. Parser – the enriched data were embedded in an experimental geographical service by EDINA – allows users to search resource collections via a map-based interface, which provides links back to the reference of the place-name in the original resource

That sounds interesting! Where to look for relevant data ? ESDS data catalogue (homepage) That sounds interesting! Where to look for relevant data ? ESDS data catalogue (homepage) Some of these options can be used to find data: – – search the ESDS Catalogue (simple or advanced search) search variables browse Major Studies list browse the latest releases

Finding data: Searching the Data Catalogue Finding data: Searching the Data Catalogue

Finding data: Sample data catalogue record Finding data: Sample data catalogue record

Finding Data: Sample Documentation Finding Data: Sample Documentation

Where to find more data Where to find more data

Finding Data: our researcher self-archiving UKDA-Store Finding Data: our researcher self-archiving UKDA-Store

Accessing data • Documentation is freely available to anyone • Users must be registered Accessing data • Documentation is freely available to anyone • Users must be registered with ESDS to download access data • You can use your university username & password to register • Access to some data is limited to users at UK Higher or Further Education Institutions • Currently have approx. 22, 000 registered users

How to access data • register with ESDS • agree to the terms & How to access data • register with ESDS • agree to the terms & conditions of the End User Licence • select the dataset from the Data Catalogue and click ‘Download/Order’ • specify a usage/project for which the data are to be used • then: – download data selecting your preferred format (SPSS, Stata, TAB etc. ) or – place an online order for the data • for more see http: //www. esds. ac. uk/support/e 2. asp

How to access data How to access data

Teaching resources • ESDS can help provide support in many areas of teaching and Teaching resources • ESDS can help provide support in many areas of teaching and research methods – teaching datasets – thematic guides, e. g. on health and crime – guides on: • data collection and use • data sharing and data management • confidentiality, consent and ethics issues • survey and research design and analysis • software for analysing data – case studies of re-use – training events and workshops • recently involved in creating formal assessments based on Qualitative data collections (TALIF grant with Dept of Sociology, Essex)

Workshops and training • Thematic data resources events • Help with using data – Workshops and training • Thematic data resources events • Help with using data – specific datasets – data handling skills – methodological issues – analytical skills - introductory and advanced level • We are pro-active and re-active, so ask us, if you want to have a workshop! Forthcoming events: http: //www. esds. ac. uk/news/esdsforthevents. asp

Other UK Data Archive services Other UK Data Archive services

Thank you! Questions? Thank you! Questions?

References • Corti, Louise. (2011, 11 Jan). Report on Linguists’ use of ESDS/UK Data References • Corti, Louise. (2011, 11 Jan). Report on Linguists’ use of ESDS/UK Data Archive.