Hamburg Corpus of Argentinean Spanish
|© Baldur Gabriel|
The Hamburg Corpus of Argentinean Spanish (HaCASpa) was compiled in December 2008 and November/December 2009 within the context of the research project "The intonation of Spanish in Argentina" (H9, director: Christoph Gabriel), part of the Collaborative Research Centre "Multilingalism", funded by the German Research Foundation (Deutsche Forschungsgemeinschaft, DFG) and hosted by the University of Hamburg. It comprises data from two varieties of Argentinean Spanish, i.e. a) the dialect spoken in the capital of Buenos Aires (also called Porteño, derived from puerto 'harbor') and b) the variety of the Neuquén/Comahue area (Northern Patagonia). The seven parts of HaCASpa correspond to the seven tasks described below in more detail: Five experiments were carried out in order to elicit specific data for research in prosody, with a main focus on intonation (Task 1–5); in addition, several speakers took part in a free interview (Task 6) and a map task experiment (Task 7). The Task is encoded as a metadata attribute for each communication.
HaCASpa comprises three different types of spoken data, depending on the Task, i.e. spontaneous, semi-spontaneous, and scripted speech. This information corresponds to the metadata attribute Speech type. The regional dimension of the corpus is represented through the attribute Area (i.e. Buenos Aires or Neuquén/Comahue), its diachronic dimension through the attribute Age group (i.e. Under 25/Over 25).
The subjects are 60 native speakers of the relevant variety of Argentinean Spanish, i.e. Buenos Aires (Porteño) or Nequén/Comahue Spanish. For each speaker, the following information is available: Age, Education, Occupation, Year of school enrollment, Year of school graduation and Parents' mother tongue.
The current version 0.1 contains mainly orthographic transcriptions of verbal behaviour (141,000 transcribed words) and codes that relate utterances to the materials used for the experimental tasks.
For technical questions regarding the corpus, please contact the EXMARaLDA team.
Task (1) consists of two subparts: reading a story (1a) and retelling it (1b). For (1a), the subjects were asked to read the short story "The North Wind and the Sun", which was presented on a computer screen, two times. The fable is well known for its use of phonetic descriptions of different languages (see Handbook of the International Phonetic Association, International Phonetic Association. Cambridge: Cambridge University Press, 2005); the Latin American version we used in our data stems from the Dialectoteca del español, (coordination: C.-E. Piñeros). For (1b), the speakers were instructed to retell the story in their own words without being able to consult the text. With the help of these two parts, data of scripted (part 1a) as well as of semi-spontaneous speech (part 1b) could be collected.
Task (2) was designed to collect data of semi-spontaneous speech by asking the subjects to answer questions pertaining to a given picture story. In a first step, the speakers were familiarized with the story, which was presented as two pictures displayed on a computer screen. In a second step, they were asked to answer specific questions about the story. The questions were also presented on the computer screen and varied in their design in order to elicit answers with different information-structural readings (such as broad vs. narrow focus or different focus types). In general, the speakers were free to answer as they wished. However, in order to avoid single word answers, they were asked to utter complete sentences.
Task (3) consisted of reading question-answer pairs, the content of which was based on the picture stories already familiar from task (2). The answers were given together with the questions on the computer screen (i.e. one question / one answer) and the speakers simply had to read both the question and the answer.
Task (4) was a reading task in which the subjects were asked to utter 10 simple subject-verb-object (SVO) sentences, presented on a computer screen. The speakers were instructed to read them at both normal and fast speech rate. Along the lines proposed in D´Imperio et al. 2005 ("Intonational Phrasing in Romance: The Role of Syntactic and Prosodic Structure.", in: Prosodies: With Special Reference to Iberian Languages, ed. by Frota, S. et al., Berlin: Mouton de Gruyter, 59-97), the subject and object constituents differed in their syntactic and prosodic complexity (e.g. determiner plus noun or determiner plus noun plus adjective and one or three prosodic words, respectively). The participants were instructed to read the sentences as if they contained new information. The complete experiment design is described in Gabriel, C. et al. 2011. ("Prosodic phrasing in Porteño Spanish", in: Intonational Phrasing in Romance and Germanic: Cross-Linguistic and Bilingual Studies, ed. by Gabriel, C. & Lleó, C. Amsterdam: Benjamins, 153-182).
Task (5), the so-called intonation survey, consisted of 48 situations designed to elicit various intonational contours with specific pragmatic meanings. In this inductive method, the researcher confronts the speaker with a series of hypothetical situations to which he or she is supposed to react verbally. In the Argentinean version of the questionnaire, the hypothetical situations were illustrated by appropriate pictures. The experimental design is described in more detail in Prieto, P. & Roseano, P. 2010 (eds). Transcription of Intonation of the Spanish Language. Munich: Lincom; see also the Interactive atlas of Spanish intonation (coordination: P. Prieto & P. Roseano).
Task (6) was conducted to collect spontaneous speech data by conducting free interviews. In this task, the subjects were asked to tell the interviewer something about a past experience, be it a vacation or memories of Argentina as it was decades ago. Even though the interviewer was still part of the conversation, it was mainly the subjects who spoke during the recordings.
Task (7) consists of Map Task dialogs. Map Task is a technique employed to collect data of spontaneous speech in which two subjects cooperate to complete a specified task. It is designed to lead the subjects to produce particular interrogative patterns. Each of the two subjects receives a map of an imaginary town marked with buildings and other specific elements. A route is marked on the map of one of the two participants, who assumes the role of the instruction-giver. The version of the same map given to the other participant, who assumes the role of the instruction-follower, differs from that of the instruction-giver in that it does not show the route to be followed. The instruction-follower therefore must ask the instruction-giver questions in order to be able to reproduce the same route on his or her own map (see also the Interactive atlas of Spanish intonation).
To gain access to the Hamburg Corpus of Argentinean Spanish (HaCASpa), you need to apply to the rights holder, stating your purpose. This is done by filling in this form, signing it and sending it to:
Hamburger Zentrum für Sprachkorpora
Max Brauer-Allee 60
If your purpose is approved, before you obtain a password, you will sign a user agreement where you agree to:
- use the HaCASpa corpus according to the terms of the user agreement
- use the HaCASpa corpus for non-commercial research and teaching purposes only
- not redistribute the HaCASpa corpus or parts of it to third parties
- cite the following source in any published work which is based on the corpus:
With your password, you can use the links in the sections below.
The following documentation is available:
- A PDF document explaining the metadata available for the corpus
- A PDF document explaining the transcription conventions applied
- A HIAT transcription manual explaining the HIAT conventions used to segment the corpus
- A PDF document explaining online and offline use of EXMARaLDA corpora
Online data (password-protected)
The following data can be viewed online:
- A corpus overview which links to all transcriptions, recordings, visualization and export documents
- Used material and codes for Task (1) / Parte 1
- Used material and codes for Task (2) / Parte 2
- Used material and codes for Task (3) / Parte 3
- Used material and codes for Task (4) / Parte 4
- Used material and codes for Task (5) / Parte 5
- The instruction giver's map and the instruction follower's map for the map task (Task (7) / Parte 7)
- A corpus statistics organised by communications
- A corpus statistics organised by the task performed
- A corpus statistics organised by speakers
- A wordlist for the whole corpus
Downloadable data (password-protected)
The following data can be downloaded for offline use:
- A zip archive with all data in EXMARaLDA formats (basic transcriptions, segmented transcriptions, Coma file) and the used material in PDF format
- A zip archive with transcriptions in FOLKER format
- A zip archive with transcriptions in ELAN (*.eaf) format
- A zip archive with transcriptions in TEI format
- A zip archive with transcriptions in Praat TextGrid format
Audio recordings in ogg or wav format can be downloaded separately from the corpus overview.