How usable are digital collections for endangered languages? A review


  • Sarah Babinski Yale University
  • Jeremiah Jewell Yale University
  • Kassandra Haakman Yale University
  • Juhyae Kim Cornell University
  • Amelia Lake Yale University
  • Irene Yi Yale University
  • Claire Bowern Yale University



language reclamation, documentation, digital language archives, archival collections, usability and reusability, workflow, forced alignment


Here, we report on pilot research on the extent to which language collections in digital linguistic archives are discoverable, accessible, and usable for linguistic research. Using a test case of common tasks in phonetic and phonological documentation, we evaluate a small random sample of collections and find substantial, striking problems in all domains. Of the original 20 collections, only six had digitized audio files with associated transcripts (preferably phrase-aligned). That is, only 30% of the collections in our sample were even potentially suitable for any type of phonetic work (regardless of quality of recording). Information about the contents of the collection was usually discoverable, though there was variation in the types of information that could be easily searched for in the collection. Though eventually three collections were aligned, only one collection was successfully force-aligned from the archival materials without substantial intervention. We close with recommendations for archive depositors to facilitate discoverability, accessibility, and functionality of language collections. Consistency and accuracy in file naming practices, data descriptions, and transcription practices is imperative. Providing a collection guide also helps. Including useful search terms about collection contents makes the materials more findable. Researchers need to be aware of the changes to collection structure that may result from archival uploads. Depositors need to consider how their metadata is included in collections and how items in the collection may be matched to each other and to metadata categories. Finally, if our random sample is indicative, linguistic documentation practices for future phonetic work need to change rapidly, if such work from archival collections is to be done in future.




How to Cite

Babinski, Sarah, Jeremiah Jewell, Kassandra Haakman, Juhyae Kim, Amelia Lake, Irene Yi, and Claire Bowern. 2022. “How Usable Are Digital Collections for Endangered Languages? A Review”. Proceedings of the Linguistic Society of America 7 (1): 5219.