ZbMultilingual DatabaseDeutsch Project assistant: Thomas Schmidt (thomas.schmidt@uni-hamburg.de) You can find more information about the project on the project homepage. Goal Many of the thirteen projects at the SFB "Mehrsprachigkeit" work on the empirical basis of recordings of spoken language, made accessible to scientific analysis by a computer based transcription. The transcription conventions applied, the software tools used to put them into practice, and the formats in which the transcriptions are stored are as divers as the theoretical backgrounds of and the languages analysed by the respective projects. This results in difficulties in the exchange and long term archiving of the data, described by Bird/Liberman (2001) as follows: "While the utility of existing tools, formats and databases is unquestionable, their sheer variety - and the lack of standards able to mediate among them - is becoming a critical problem. Particular bodies of data are created with particular needs in mind, using formats and tools tailored to those needs, based on the resources and practices of the community involved. Once created, a linguistic database may subsequently be used for a variety of unforeseen purposes, both inside and outside the community that created it. Adapting existing software for creation, update, indexing, search and display of ,foreign' databases typically requires extensive re-engineering. Working across a set of databases requires repeated adaptations of this kind." The project "Multilingual database" has thus two principal objectives: On the one hand, the existing language data at the SFB are to be converted from the numerous project-specific formats to a format that is largely independent of specific theories, languages, software or operating systems. It will thus be suitable for flexible further processing and long term storage. On the other hand, the multilingual database is intended as an instrument that facilitates the handling as well as the quantitative analysis of such large amounts of data. Method Departing from an analysis of the structure of the existing data, we developed EXMARaLDA (EXtensible MARkup Language for Discourse Annotation), an XML language based on the concept of annotation graphs (Bird/Liberman 2001) that allows a content-based encoding of discourse transriptions. Primarily, EXMARaLDA's role is that of an "interlingua" between the existing data formats and of an interface between these and a (relational) database. However, it may, together with several input and output methods also developed in this project, also be regarded as a system for computer based discourse transcription in its own right.
References Bird, Steven / Liberman, Mark (2001): A formal framework for linguistic annotation. In: Speech Communication 33(1,2), pp. 23-60. Schmidt, Thomas (2001): The transcription system EXMARaLDA: An application of the annotation graph formalism as the Basis of a Database of Multilingual Spoken Discourse. In: Proceedings of the IRCS Workshop on Linguistic Databases, Philadelphia, 219-227. Schmidt, Thomas (2002a): Gesprächstranskription auf dem Computer: das System EXMARaLDA. in: Gesprächsforschung (Online-Zeitschrift zur verbalen Interaktion) 3. Freiburg, 1-23. Schmidt, Thomas (2002b): EXMARaLDA - ein System zur Diskurstranskription auf dem Computer. In: Arbeiten zur Mehrsprachigkeit, Serie B (34). Hamburg. Schmidt, Thomas (2002c): Visualizing linguistic annotation as Interlinear Text. In preparation (as an AZM). Schmidt, Thomas (2002d): EXMARaLDA - ein System zur computergestützten Diskurstranskription. To appear in: Mehler, Alexander / Lobin, Henning (2002): Automatische Textanalyse.To appear. |
|||
| Letzte Änderung: |