Скачать презентацию Archiving David Nathan Endangered Languages Archive Hans Rausing Скачать презентацию Archiving David Nathan Endangered Languages Archive Hans Rausing

bd354578122092f6e08de121228d9a9b.ppt

  • Количество слайдов: 121

Archiving David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of Archiving David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London 1

Topics § § § 2 Introducing ELAR and digital language archives Preservation Archive interactions Topics § § § 2 Introducing ELAR and digital language archives Preservation Archive interactions with documentation What and how to archive Protocol Metadata Evaluation of audio Archives and revitalisation Archivism : mobilisation Video Conclusions

Introducing ELAR and digital language archives 3 Introducing ELAR and digital language archives 3

Endangered Languages ARchive (ELAR) § one of 3 semi-autonomous programs of the Hans Rausing Endangered Languages ARchive (ELAR) § one of 3 semi-autonomous programs of the Hans Rausing Endangered Languages Project § staff of 3; archivist, software developer, technician, (research assistants etc) § develop preservation infrastructure, cataloguing and dissemination; policies; facilities; training and advice; materials development and publishing 4

What is a digital language archive? § a trusted repository created and maintained by What is a digital language archive? § a trusted repository created and maintained by an institution with a commitment to the long-term preservation of archived material § will have policies and processes for materials acquisition, cataloguing, preservation, dissemination, migration to new digital formats § a collection of managed materials 5

What is archiving of language materials? § preparing materials in a structured form suitable What is archiving of language materials? § preparing materials in a structured form suitable for long-term preservation § creating long-term relationships § it is not backup § it is not dissemination/publication § it should not impinge on good linguistic practice 6

What can a language archive offer? § Security - keep your electronic materials safe What can a language archive offer? § Security - keep your electronic materials safe § Preservation - store your materials for the long term § Discovery - help others to find out about your materials § Protocols - respect and implement sensitivities, restrictions § Sharing - share results of your work, if appropriate § Acknowledgement - create citable acknowledgement § Mobilisation - create usable language materials for communities § Quality and standards - advice for assuring your materials are of the highest quality and robust standards 7

Kinds of language archives § many cross-cutting classifications: § Indigenous vs outsider, eg. Squamish Kinds of language archives § many cross-cutting classifications: § Indigenous vs outsider, eg. Squamish Nation § regional vs international, eg. AILLA, Paradisec; Do. Be. S, ELAR § associated with research institute, eg. AIATSIS, ANLC § granter-funded, eg. Do. Be. S, ELAR, OTA § digital vs physical vs mixed, eg. Do. Be. S vs Vienna Sound Archive, ANLC 8

Potential users § speakers and their descendants - up to 95% of users of Potential users § speakers and their descendants - up to 95% of users of UCB are community members § depositors - to create or renew materials § other researchers - comparative/historical linguists, typologists, theoreticians, anthropologists, historians, musicologists etc § other “stakeholders”, eg educationalists § journalists and the wider public 9

Archives networks and bodies § Digital Endangered Languages and Archives Network (DELAMAN) § ELAR, Archives networks and bodies § Digital Endangered Languages and Archives Network (DELAMAN) § ELAR, DOBES, ANLC, Paradisec, EMELD, LACITO, AIATSIS, AMPM (Maori) § Open Language Archives Community (OLAC) § others, eg. D-LIB § http: //www. dlib. org/ § Open Archives Initiative 10

Digital archive architectures § OAIS archives define three types of ‘packages’ ingestion, archive, dissemination: Digital archive architectures § OAIS archives define three types of ‘packages’ ingestion, archive, dissemination: afd_34 dfa dfadf fds fdafds afd_34 dfadf Producers 11 Ingestion afd_34 dfadf fds fdafds Archive Dissemination Designated communities

‘Live Archives’ - architecture § Boundary between depositors, users and archive: § users add, ‘Live Archives’ - architecture § Boundary between depositors, users and archive: § users add, update content; customise outputs afd_34 dfa dfadf fds fdafds afd_34 dfadf Producers 12 Ingestion afd_34 dfadf fds fdafds Archive Dissemination Designated communities

The way we were. . . § eg 1993: ASEDA Aboriginal Studies Electronic Data The way we were. . . § eg 1993: ASEDA Aboriginal Studies Electronic Data Archive at AIATSIS Canberra (modelled on Oxford Text Archive) § opportunistically collect and catalogue electronic materials that were at risk or not accessible § § 13 lexica grammars texts etc

How things have changed. . § § § 14 types of data (modalities and How things have changed. . § § § 14 types of data (modalities and some genres) means of storage standardisation and metadata dissemination (most explosive) expanded into practice and workflow of linguists

ELAR’s holdings § ELAR currently holds about 45 deposits with a total volume of ELAR’s holdings § ELAR currently holds about 45 deposits with a total volume of approx 1. 1 TB. § the average deposit is about 25 GB, however, the sizes vary widely, with a few much larger deposits. The median size is around 10 GB § we expect volume to nearly double over the next year § see next slides for distribution of data types 15

ELAR holdings by data type Data type 6, 312 208, 995 895 image 28, ELAR holdings by data type Data type 6, 312 208, 995 895 image 28, 592 2, 221 msword 223 404 pdf 196 134 eaf 16 360, 411 video § data type by volume (MB) and number of files, sorted by volume Files audio § data types for a representative sample (70%) of holdings Volume (MB) 33 176 text 32 781 lex 9 29 trs 5 246 xls 1 19 imdi 1 26

If you are a depositor, ELAR will § § § § § 17 preserve If you are a depositor, ELAR will § § § § § 17 preserve your deposited materials provide for making changes where possible provide web-based metadata management implement your access restrictions etc give feedback about materials provide advice, general and specific assistance, eg data conversion provide some equipment and services on a case by case basis, develop resources

Preservation 18 Preservation 18

Preservation issues § § 19 making materials robust making storage robust organisational, ownership and Preservation issues § § 19 making materials robust making storage robust organisational, ownership and policy issues changing technologies § refreshing § migrating

Changing technologies § advantages of digital preservation § primarily: copying § items no longer Changing technologies § advantages of digital preservation § primarily: copying § items no longer unique § also transmission, dissemination § other implications § § robust formats (standard, open, explicit) formats with long horizons formats easy to refresh formats that don’t require particular software (sometimes software is intrinsic!) § may have to describe software or even archive the software 20

Two preservation models § “preserve the bytestream” § keep the exact original at all Two preservation models § “preserve the bytestream” § keep the exact original at all costs § LOCKSS § “lots of copies keep stuff safe” § http: //lockss. stanford. edu/ § guess which community it came from! 21

Some backup issues § risk management § undetected problems and useless backups § aspects Some backup issues § risk management § undetected problems and useless backups § aspects of professional backup: § scheduled frequencies, eg monthly, weekly, daily § retention § media and locations § naming/versions § proven restoration 22

Top 10 worst ways to collect/manage data § § § § § 23 1. Top 10 worst ways to collect/manage data § § § § § 23 1. No backup 2. Divergent versions of same data 3. Unlabeled disks/media 4. Non-standard or undocumented filenames 5. Master recordings used to review/analyse data 6. Don’t know how characters are encoded 7. Never tried to convert/export data 8. Unprocessed or unedited audio and video 9. Inconsistent recording 10. Unmonitored recording

Archive interactions with documentation 24 Archive interactions with documentation 24

Documenter and archive interactions § § 25 grant formulation and application communications, questions, advice Documenter and archive interactions § § 25 grant formulation and application communications, questions, advice training archiving services

Documenter & archive interactions 26 Documenter & archive interactions 26

Query/interaction topics § analysis of approx 150 queries from documenters/linguists over nearly 2 years Query/interaction topics § analysis of approx 150 queries from documenters/linguists over nearly 2 years 27

What and how to archive 29 What and how to archive 29

What can you archive (at ELAR)? § media - sound, video § graphics - What can you archive (at ELAR)? § media - sound, video § graphics - images, scans § text - fieldnotes, grammars, description, analysis § structured data - aligned annotated transcriptions, databases, lexica § metadata - structured, standardised contextual information about the materials 30

Archive objects § informed by traditions, eg document archives § sometimes called “resources”, bundles Archive objects § informed by traditions, eg document archives § sometimes called “resources”, bundles § it could be a file, a set of files, a directory, a “session” or a coherent item with many parts § should have archival qualities eg Bird & Simons “ 7 Dimensions” (or see Thieberger in LDD 2) § may impose standard structures or formats § need deposit event and processes § § 31 legal and protocol verification accession ongoing processes

Archive objects should be selected § example: video: How much volume allocated? § answer: Archive objects should be selected § example: video: How much volume allocated? § answer: . . . § however, e. g. : § unlikely that linguist is in position to plan and consistently create excellent video, so selection is unavoidable § data has always been selected! 32

(. . . selection) § in your typical work you also: § § § (. . . selection) § in your typical work you also: § § § selected labeled transformed/processed/edited added, corrected, expanded made links made or assumed relationships between “whole” and processed units; invented labels, IDs, scope etc § imposed formats 33

Data portability § Bird and Simons 2003: (for language documentation) our data should have Data portability § Bird and Simons 2003: (for language documentation) our data should have integrity, flexibility, longevity and utility 34

Data portability § § § § § 35 complete explicit documented preservable transferable accessible Data portability § § § § § 35 complete explicit documented preservable transferable accessible adaptable not technology-specific (also appropriate, accurate, useful etc!!)

Formats - media - preferred § sound - WAV § image - BMP, TIFF, Formats - media - preferred § sound - WAV § image - BMP, TIFF, JPEG § video - MPEG 2 36

Formats - documents - preferred § plain text, with or without markup § PDF Formats - documents - preferred § plain text, with or without markup § PDF (PDF/A) § XML, other systematic markup (with description of markup system) § well-structured documents in common Office formats - ELAR will eventually convert them to archive formats § character encoding : 37 § preferred encoding is ASCII or Unicode § clearly document any other encodings used, e. g. ISO 8859 -5 § discuss with us if you use font substitution to handle non. Roman characters

Formats - characters - preferred § character encoding : § ASCII or Unicode (UTF-8) Formats - characters - preferred § character encoding : § ASCII or Unicode (UTF-8) § you must clearly document any other encodings used, e. g. ISO 8859 -9 § discuss with us if you use font substitution to handle non-Roman characters 38

Filenames and directories § characters [A-Z], [a-z], [0 -9], underscore and a single full Filenames and directories § characters [A-Z], [a-z], [0 -9], underscore and a single full stop before the extension § correct MIME extension § favour lower case letters § maximum length 30 characters § maximum directory depth 8 § = ASCII only, no spaces 39

Semantics of filenames § don’t stuff meaningful information into filenames - use metadata instead Semantics of filenames § don’t stuff meaningful information into filenames - use metadata instead § versions § use directory structures wisely 40

Data format duty cycle examples Raw Video DVI Working Interchange Archive Dissemination softwarespecific MPEG-2 Data format duty cycle examples Raw Video DVI Working Interchange Archive Dissemination softwarespecific MPEG-2 MPEG 2, AVI, QT Fieldnotes Shoebox FOSF XML WWW, print dictionary Audio ATRAC WAV BWF MP 3 Complex data multiple FM Pro database RTF, XML Interactive application Multimodal multiple as above Multimedia application page 41

Evaluation and conversion examples 42 Evaluation and conversion examples 42

Characters § did my characters come through? § answer: . . . há ki Characters § did my characters come through? § answer: . . . há ki hená mázaska pa § however: § perhaps ELAR should do it? 43 wikcémna nú iyóphepa wa-ye ks DBW t wóz? a-s? ni yeló DB OK wash things-NEG ASS. M 'he didn't do the wash' wóz a-s yeló DB OK az ni wash things-NEG ASS. M 'he didn't do the wash'

Preservation § Is my file preservable? § Note: § § characters? inconsistent segmentation Text Preservation § Is my file preservable? § Note: § § characters? inconsistent segmentation Text transcription: “Korimáka” data as comments Language: Choguita Rarámuri conventions/metadata Language used for transcription: Spanish Consultant: Luz Elena León Ramírez Linguist: abriela Cabaero Transcription: erth Fuen & Gabrela Cabaero Date recorded: 11/02/2006 Date tranbscribed: 11/02/2006 Recording: rec 6 -LEL. wav 44

Knowledge representation 1 - before wama momol chi naron mon chayako (LB) / wama Knowledge representation 1 - before wama momol chi naron mon chayako (LB) / wama momol chi naron chayako (MD) wama momol chi nan mon chayako (more emphatic(LB) / wama momol chi nan chayako (MD) Why don't you and him do it? + Notes have both of these sentences without the negator mon. OK runon naynangkroy ile ri He ate their sago. * kipin kannangkroy ngolu intended: We ate their cassowary. OK kipin kanangkroy ngolu We ate their cassowary. 45

* kipin kannangkroy ngolu intended:" src="https://present5.com/presentation/bd354578122092f6e08de121228d9a9b/image-46.jpg" alt="Knowledge representation 1 - after * kipin kannangkroy ngolu intended:" /> Knowledge representation 1 - after * kipin kannangkroy ngolu intended: We ate their cassowary. Kipin kannangkroy ngolu * OK kipin kanangkroy ngolu We ate their cassowary. We ate their cassowary. Kipin kanangkroy ngolu OK We ate their cassowary. 46

Knowledge representation 2 § avoid generic software “convert to XML” 47 <? xml version=“ Knowledge representation 2 § avoid generic software “convert to XML” 47 Morly Beeta Interview with Morly Beeta Jan/13/05 Obu history by Morly Beeta

ELAR conversion - original 48 ELAR conversion - original 48

ELAR conversion - XHTML 49 ELAR conversion - XHTML 49

ELAR conversion - XHTML 50 ELAR conversion - XHTML 50

ELAR conversion - in browser 51 ELAR conversion - in browser 51

Delivery of materials § mostly we expect to receive copies on computer-readable media such Delivery of materials § mostly we expect to receive copies on computer-readable media such as hard disks or CD/DVD § DVDs seem consistently unreliable § some digitisation of media may be possible 52

Protocol 53 Protocol 53

Protocol § sensitivities, restrictions: identification, description and implementation 54 Protocol § sensitivities, restrictions: identification, description and implementation 54

Protocol grows naturally with documentation § focus on recorded data » more people, more Protocol grows naturally with documentation § focus on recorded data » more people, more genres, less researcher knowledge § focus on revitalisation » which language to teach? who to host and teach? who can learn? etc § community participation » framework for speakers to shape documentation process and products § mobilisation » selecting, juxtaposing; community participation § time » significance and sensitivities change over time § access » increasing scope for dissemination, control of IP 55

ELAR Deposit Form “Section C” § ELAR pays careful attention to any sensitivities or ELAR Deposit Form “Section C” § ELAR pays careful attention to any sensitivities or restrictions that apply to any part of your deposit. There are four ways that Access Protocol is implemented: § you define permissions for the whole deposit or for individual files (or parts of files) § we provide defaults to protect your data if you do not define permissions § you/we keep permissions up to date § you list other rights holders 56

ELAR Deposit Form “Section C” P 1. Anyone Any person may view/listen to or ELAR Deposit Form “Section C” P 1. Anyone Any person may view/listen to or receive a digital copy of any part of the deposit P 2. Certain people or groups Choose any combination of P 2 A, P 2 B, and P 2 C: P 2 A Research community members What level of access (choose only)? P 2 A 1. They can receive a digital copy of requested material P 2 A 2. They can view/listen but cannot receive a digital copy P 2 B. Language community members See below regarding identifying members What level of access (choose only)? P 2 B 1. They can receive a digital copy of requested material P 2 B 2. They can view/listen but cannot receive a digital copy P 2 C. Particular named people or bodies See below regarding identifying people/bodies P 3. Depositor is asked permission for each request You will be contacted and asked for permission on each request. How do you want to be contacted? P 3 A. Requester is given address to contact you directly P 3 B. ELAR will relay requests to you P 4. Only the depositor has access Persons other than the depositor will not be able to request access. 57

ELAR Deposit Form “Section C” Identifying people/bodies If you chose P 2 B or ELAR Deposit Form “Section C” Identifying people/bodies If you chose P 2 B or P 2 C, tell us how ELAR should determine who is a member of a group (e. g. language community, educational body). Choose one of the following: M 1. You tell ELAR how to determine membership (tell us in Part D) M 2. ELAR will ask you on each occasion M 3. ELAR will make a judgement about membership If you chose P 2 C, then list the names of the people or bodies in Part D. Contacting you If you choose P 3 A or P 3 B, you will be able to decide about each particular request. If the choice is P 3 A, we will send your address to the requester, who can then ask you directly for permission. You then send us your decision. If the choice is P 3 B, ELAR will act as an intermediary, and pass on the request to you, so that your privacy is maintained. However, if you chose one of P 3 A or P 3 B and you (or your delegate) are not contactable, ELAR will need to make the decision or change the access permissions. Similarly, if we need to contact you to ask about group membership, and you (or your delegate) are not contactable, we will need to make the decision or change the access permissions. 58

Other § deposit, file or object-level protocol § depositor-oriented § we will provide means Other § deposit, file or object-level protocol § depositor-oriented § we will provide means to change/manage protocol § delegate § other rights holders § sunset clause 59

Metadata 60 Metadata 60

Metadata § the data about data that enables the management, identification, retrieval and understanding Metadata § the data about data that enables the management, identification, retrieval and understanding of that data § reflects the knowledge and practice of data providers § defines and constrains audiences and usages for data § documentation’s data orientation heightens the importance of metadata 61

Metadata § ELAR metadata set = § selection from IMDI*, OLAC*, EAD, TEI § Metadata § ELAR metadata set = § selection from IMDI*, OLAC*, EAD, TEI § ELAR-specific (e. g. protocol, geographical) § depositor metadata * ie. a set of metadata elements that maps onto both IMDI and OLAC { { Archive Deposit 62 ELAR metadata set Your metadata All other files

Types of metadata § depositor's / delegates' details § descriptive metadata § administrative metadata Types of metadata § depositor's / delegates' details § descriptive metadata § administrative metadata § preservation metadata § access protocols § metadata for individual files 63

Depositors and delegates § § § § 64 name address contact details (telephone, fax, Depositors and delegates § § § § 64 name address contact details (telephone, fax, email, URL) role affiliation date of birth nationality

Descriptive metadata § § § 65 title, description, subject, summary keywords subject Language, Community Descriptive metadata § § § 65 title, description, subject, summary keywords subject Language, Community location time span

Administrative metadata § project details § funding and hosting institutions § details of external Administrative metadata § project details § funding and hosting institutions § details of external copies § modifications and status § details of accession agreement § cf. deposit form 66

Preservation metadata § § carrier media formats, size provenance (source) access § access protocols Preservation metadata § § carrier media formats, size provenance (source) access § access protocols (see elsewhere) § group membership identification 67

File-level metadata § media files § duration, file size § MIME type, content type File-level metadata § media files § duration, file size § MIME type, content type § text files § font, character set, encoding § format, markup § metadata files § schema § scope § validity 68

Metadata formats § common or standard: § IMDI (‘ISLE Metdata Initiative’, from Do. Be. Metadata formats § common or standard: § IMDI (‘ISLE Metdata Initiative’, from Do. Be. S) § OLAC (Open Language Archives Community) § EAD, and others § ELAR: has created its own set, currently in implementation § deposit-scope metadata in deposit form § file level metadata (will be) by web form § also, depositor’s own metadata 69

Metadata formats § each depositor can also have different metadata! § our goal: to Metadata formats § each depositor can also have different metadata! § our goal: to maximise the amount and quality of metadata § quality and extent is more important than standards and comparability § many depositors are sending extensive metadata in a variety of formats including spreadsheets - see examples 70

What’s missing from metadata? § pedagogy has typically been left out of the documentation What’s missing from metadata? § pedagogy has typically been left out of the documentation agenda § linguists are better at problematising languages than teaching them § we should mobilise informed, effective and accountable pedagogy § a Hippocratic imperative 71

Relationships § relationships between documenters/ documentation and pedagogy § nonexistent/poor cousin § by-product § Relationships § relationships between documenters/ documentation and pedagogy § nonexistent/poor cousin § by-product § documentation is a vector of language transmission! 72

Who could be documenters? § § § § § 73 community members audio recordists Who could be documenters? § § § § § 73 community members audio recordists videographers (documentary filmmakers) educators ethnobotanists anthropologists computer experts activists, missionaries linguists

Multipurpose documentation? § § linguists of various specialisations anthropologists, historians, botanists. . . do Multipurpose documentation? § § linguists of various specialisations anthropologists, historians, botanists. . . do any have priority? who are documentation’s main beneficiaries? § can we tell? 74

. . . yes. . . § Metadata § the data about data that . . . yes. . . § Metadata § the data about data that enables the management, identification, retrieval and understanding of that data § reflects the knowledge and practice of data providers § defines and constrains audiences and usages for data 75

The key is metadata § examples: IMDI, tiered morphological glossing etc § standard (or The key is metadata § examples: IMDI, tiered morphological glossing etc § standard (or “best practice”) metadata is strongly oriented to descriptive linguistics and typology (“aggregators”) § How could metadata serve pedagogy? 76

Pedagogically oriented metadata § demarcation, names and descriptions of socially/culturally relevant events such as Pedagogically oriented metadata § demarcation, names and descriptions of socially/culturally relevant events such as songs (great interest to community members, and valuable teaching materials) should enormous amounts of time be spent providing morpheme-by-morpheme glosses if we cannot simply retrieve a song? § § 77 phenomena that provide learning domains, such as “numbers”, “kinship”, “greetings”, “tense” socially important phenomena such as register, code switching

Pedagogically oriented metadata § notes on learner levels § links to associated materials that Pedagogically oriented metadata § notes on learner levels § links to associated materials that have explanations, examples § notes on the previous selection and use of material for teaching § notes on how to use the material for teaching § notes and warnings about restricted materials or materials which are inappropriate for young or certain classes of people (e. g. profane, archaic etc) § and of course easily findable basic information such as name of language or variety, speaker, gender, speaker’s country etc 78

Evaluating audio 79 Evaluating audio 79

Dobbin § software for audio evaluation, processing and reporting 80 Dobbin § software for audio evaluation, processing and reporting 80

Dobbin 81 Dobbin 81

Dobbin 82 Dobbin 82

Dobbin 83 Dobbin 83

Dobbin 84 Dobbin 84

Dobbin 85 Dobbin 85

Dobbin 86 Dobbin 86

Archives and revitalisation 87 Archives and revitalisation 87

Keeping ‘means of transmission’ alive § Romaine: co-ordinated efforts at revitalisation mean that institutions Keeping ‘means of transmission’ alive § Romaine: co-ordinated efforts at revitalisation mean that institutions increasingly become the vector of language transmission, cf intergenerational transmission (Fishman) § at the limit, documentations, and archives that foster, preserve, and disseminate them, become the means of transmission 88

Archives and revitalisation § Penfield: toward a theory of documentation § § collaborative efforts Archives and revitalisation § Penfield: toward a theory of documentation § § collaborative efforts onsite training document for revitalisation community-based protocols for the use of materials § these have implications for the lifecycle of ‘data’ 89

Archivism 90 Archivism 90

What have we missed? § Woodbury: most developments are What have we missed? § Woodbury: most developments are "what's been happening around the emergence of a documentary linguistics", particularly technology, which has raised expectations more than changed practices 91

What have we missed? § Contact with wisdom and experience of established fields e. What have we missed? § Contact with wisdom and experience of established fields e. g. § radio/broadcasting (eg mics, MD) § cinematography (eg quality and specialisation) § journalism (eg equipment handling) § audio archives (linguists had input to IASA before 80 s or so) 92

What did we get? § advice about formats, parameters, what to avoid § 'silver What did we get? § advice about formats, parameters, what to avoid § 'silver bullet' equipment and formats § fundamentalism and format wars 93

Archivism § Archivism: capitulation of language documenters to the agenda and priorities of archives Archivism § Archivism: capitulation of language documenters to the agenda and priorities of archives and information technology § why did this happen? § for historical reasons § rapid changes in technology § we left a vacuum 94

Mobilisation 95 Mobilisation 95

Mobilisation § use of documentation resources to make relevant, useful, effective resources for language Mobilisation § use of documentation resources to make relevant, useful, effective resources for language support and revitalisation 96

Gamilaraay/Yuwaalaraay song player § uses ‘familiar’ data such as from Shoebox, Transcriber § adds Gamilaraay/Yuwaalaraay song player § uses ‘familiar’ data such as from Shoebox, Transcriber § adds genre, functionalities, design etc 97

Song player data 98

newsong 14 [track 33] music verse 1 line 1 verse 1 line 2

Song player data song 34 [track 28] ti Gugan gaaynggul /Brown-skin baby co Words Song player data song 34 [track 28] ti Gugan gaaynggul /Brown-skin baby co Words and music: (c) Bob Randall s Roger Knox ln Gamilaraay verse 1 Dhayndalmuu ngaya dhurriyawaanhi dhayndalmuu ngaya dhurriya-y -waa-y -nhi priest I ride, -moving -Past s 20148 m 1590 m 721 -m 1733 -m 1699 As a preacher I used to ride Yarraamanda binaal yarraaman -ga binaal horse -in, at, on peaceful m 2020 -m 755 m 244 A quiet horse on the plains. 99 Walaaybaaga walaay -baa -ga nhama that, the m 1686 wagibaaga. wagibaa -ga plain -in, at, on s 20467 -m 755 gamila ngaya muurr gigi gamila ngaya muurr gi-gi

Song player data § Chunking data: § verses etc: [2, 4, 6, 8, 10, Song player data § Chunking data: § verses etc: [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24] § labels: [1: "Verse 1", 3: "Chorus", 4: "Verse 2", 6: "Chorus", 7: "Verse 3", 9: "Chorus", 10: "Verse 4", 12: "Chorus"] Play it 100

Other examples of ‘mobilisation’ § Simple or conventional games etc can take on new Other examples of ‘mobilisation’ § Simple or conventional games etc can take on new significance § Memory game play § Crossword play 101

Video in documentation and archiving § “Questioning the role of video in language documentation Video in documentation and archiving § “Questioning the role of video in language documentation & archiving: is a moving picture worth 1, 000 texts? ” 102

The rise and rise of video § increase in claims about video § rise The rise and rise of video § increase in claims about video § rise from about 25% to 75% of ELDP applicants § funders have been demanding that some applicants make video 103

One size fits all? § Himmelmann: the core of a language documentation, then, is One size fits all? § Himmelmann: the core of a language documentation, then, is constituted by a comprehensive and representative sample of communicative events as natural as possible. Given the holistic view of linguistic behaviour, the ideal recording device is video recording. 104

Goals and methodology of documentation § cultural and cognitive aspects can be documented or Goals and methodology of documentation § cultural and cognitive aspects can be documented or augmented by video (examples from Harrison) § counting methods/systems § locative expressions § behaviours or appearances of plants animals etc that are described as part of language-encoded knowledge: • information about plant toxicity and preparation could usefully be video • swimming formations (eg Marovo people of Solomon Islands who have rich set of terms for fish behaviour and its relationships to the calendar and hunting) • Gila Pima (Arizona) name a plum tree "dog's testicles", and an edible banana "looks like an erection" (umm, what will the videos show? ) However, David Crystal estimates that such culturally/environmentally specific aspects are only about 10% of any languages’ content 105

Goals and methodology of documentation § discourse and genre § distinguishing participants (Mc. Convell) Goals and methodology of documentation § discourse and genre § distinguishing participants (Mc. Convell) § transparently capturing “stories” (Wittenburg) § adding or enhancing methodology § § stimulus materials the camera adds theatricality (Jukes) the camera as a participant (Atkins) enhance transcription through motivating community participation § sign language work § treat video as inscription § cameras, lighting, orientation, clothing etc § appreciated by communities 106

Goals and methodology of documentation § documentation can’t aim to capture everything (Austin) § Goals and methodology of documentation § documentation can’t aim to capture everything (Austin) § and the video camera cannot either! § argument for accountability has caused confusion between events and recordings. Result: fantasy that video is what happened and provides empirical evidence for all kinds of claims § argument: § video can do X => we should do video § fails without goals and methodology for X § many pro-video arguments could be equally applied to capturing other phenomena: § e. g. palatography § collecting other text-based metadata eg on social setting 107

Goals and methodology of documentation § there must be different methodologies (linguistic AND video) Goals and methodology of documentation § there must be different methodologies (linguistic AND video) for different purposes (cf. sign) § Himmelmann: [each potential discipline’s usages] influence the recording and presentation of the data inasmuch as certain kinds of information are indispensable for a given analytical procedure (no phonetic analysis is possible without some high-quality sound recording, no analysis of gestures is possible without videotaping, etc. ) 108

Goals and methodology of documentation § so if there are distinct methodologies for different Goals and methodology of documentation § so if there are distinct methodologies for different purposes § how adequate could a generic video be? § how can video serve purposes that documenters don’t have? 109

Goals and methodology of documentation § explicit claimed purposes for video: § in ELDP Goals and methodology of documentation § explicit claimed purposes for video: § in ELDP applications, many applicants request funds for video equipment but have no videorelated documentation goals and § video exponents describe the potential of video but few documenters actually have these goals 110

Goals and methodology of documentation § many phenomena can't be represented on video: 111 Goals and methodology of documentation § many phenomena can't be represented on video: 111 § complex family structures and their terminologies § changes in moon shape and phase (better as still photos or diagrams); other calendric and geographic expressions § time and distance eg Tofa (Siberia) have words for the distance you can cover in a day on reindeer back § morphological, grammatical and most lexical information § (also relationships, staging, motivations, histories. . . )

Video: a community oriented technology § video is good for: § § § community Video: a community oriented technology § video is good for: § § § community oriented content community involvement members will best know what/how to shoot skills transfer creating directly usable materials, including for revitalisation § why should a linguist shoot video at all? 112

Video workflow and workload a disorder of magnitudes. . . § skills, workload, intrusion, Video workflow and workload a disorder of magnitudes. . . § skills, workload, intrusion, volumes - all increase by orders of magnitude § § § § 113 skills - equipment, shooting, editing, production equipment - choice, usage, maintenance power supplies capturing, conversion annotation editing, production data volumes

Workflow and workload § annotation: § could easily involve a time ratio of up Workflow and workload § annotation: § could easily involve a time ratio of up to 100 (1 hour of video may take 100 hours to process) § in practice, most documenters do not annotate the phenomena that they did (or didn’t) identify § fallacy that annotation etc can be done later • video amplifies the value of event-participant knowledge 114

Video: conclusions § video can: § add to the representational methods used by linguistics Video: conclusions § video can: § add to the representational methods used by linguistics § encourage us to look at diverse phenomena § challenge our methodologies § provide new and effective ways of disseminating language and cultural events and knowledge 115

Video: conclusions § video and multimedia § little encouragement to produce multimedia § multimedia: Video: conclusions § video and multimedia § little encouragement to produce multimedia § multimedia: • distinguishes medium from mode of knowledge representation • richer and more explicit interleaving of various types of knowledge • imposes its costs in more appropriate areas 116

Video: conclusions § generic, amateur video fails to respect participants by not recognising linguistic Video: conclusions § generic, amateur video fails to respect participants by not recognising linguistic specialisation, complexity or expertise to the same degree as “real” linguistic work § naive video achieves “authenticity” mainly by not editing (and thereby not producing usable products!) 117

Video: conclusions § there is a lot of tradition in evaluating the descriptive value Video: conclusions § there is a lot of tradition in evaluating the descriptive value of linguistic work, but little in defining the documentation value of video § if video really represents the claimed range of linguistic phenomena, it is a key mode of documentation: documenters (and their teachers) need to pay much closer attention to its methodologies! § it is not clear that it is linguists who should be making video 118

Conclusions 119 Conclusions 119

Conclusion: we ask depositors to § § § manage materials well collect and provide Conclusion: we ask depositors to § § § manage materials well collect and provide protocol information deliver materials, metadata send trial samples etc not withhold materials share/manage/delegate custodianship of materials § maintain relationships with language stakeholders and ELAR 120

Conclusion § digital language archives combine traditional preservation with new ways of supporting creators Conclusion § digital language archives combine traditional preservation with new ways of supporting creators and users of materials § an archive can be more effective if materials are prepared as “portable” § ultimately it is up to documenters to define what good documentation is § ELAR welcomes you to discuss your archiving goals 121