CEASELESS - Chunk Learning and the Development of Speaking and Listening Fluency: Integrating Experimental and Computational Approaches

07.20–06.21 (Hauptantragstellerin zusammen mit PD Dr rer. nat. Ralf Schlüter, Lehrstuhl Informatik 6: Human Language Technology and Pattern Recognition, RWTH Aachen University.)

Funded by Exploratory Research Space (ERS) Projekt im Rahmen der Exzellenzstra- tegie des Bundes und der Länder. – Der ERS zielt darauf ab, neue interdisziplinäre Forschungsfelder zu identifizieren und zu erschliessen. Es ist das wichtigste Förderinstrument für interdisziplinäre Forschungsprojekte an der RWTH und fördert integrative Forschung, die auf wissenschaftliches und praktisches Wissen zur Lösung gesellschaftlicher Herausforderungen abzielt.
Effective communication skills are key to personal contentment, academic achievement and professional career success. Being able to communicate effectively is even found to facilitate social relationships. Competent language users experience more success in conveying their knowledge and views. However, producing and comprehending fluent and informationally dense speech are highly demanding communication skills that call upon many language-related and general cognitive abilities. The development of these skills in a non-native (second) language is even more challenging due to a lack of automatized procedural knowledge and reliance on additional memory resources. This, among other things, results from the fact that non-native speakers are relatively less exposed to spoken input of the target language compared to their native speakers and the fact that they few opportunities to practice along with limited feedback to target language performance. The difficulty of mastering speaking and listening skills is exacerbated by the fact that they both are subject to real- time constraints. Recent theoretical approaches to the understanding of human language processing posit that to ameliorate the effects of these constraints (the Now-or-Never Bottleneck), humans learn to rapidly and efficiently recode and compress the linguistic input into larger units (‘chunks’) and rely on such chunks to facilitate language production and comprehension. While the body of empirical research on chunking and its role in native language processing has been expanding rapidly in the last few years, there is virtually no research that investigates its role in the processing and learning of language by nonnative speakers. Moreover, the existing body of research has been largely confined to experimental studies conducted under laboratory conditions with stimulus material consisting of isolated sentences rather than on authentic connected speech. This raises the question to what extent the findings obtained in this research extend to real life communicative situations. The available research assessing longer stretches of connected speech has relied on corpora of human transcriptions of speech. The research project CEASELESS is aimed at [1] advancing our understanding of the role of chunking mechanisms in non-native speech production and comprehension under real time constraints and [2] paving the way for the development of an automatic scoring system geared towards assessing speaking and listening competencies that provides individualized feedback based on reliable performance metrics that goes beyond coarse-grained categories typically provided by human ratings based on descriptors on 5-7 point scales (e.g., Common European Framework of Reference, C1 level: “Can understand a wide range of demanding, longer clauses, [...] or “Can express ideas fluently and spontaneously without much obvious search for expression”). The key to the success of the CEASELESS project is a trandisciplinary and multi- methodological approach, along with along with strong theory-driven investigation. This innovative research draws on cutting-edge methods from experimental psychology, psycholinguistics, natural language processing (NLP) techniques, automatic speech recognition (ASR) and machine learning (ML). In pursuit of these overarching aims, CEASELESS launches in parallel five mutually informative research axes: [1] we conduct psychometrically reliable behavioral experiments to assess listening fluency and to determine to what extent inter-individual variation in performance on listening tasks is related to individual differences in chunking ability and working memory capacity; [2] we employ NLP techniques and ML to automatically assess speaking performance in two populations of nonnative speakers – secondary school children and university students – based on fluency and language complexity metrics and optionally derived from automatic speech recognition output; [3] we precisely characterize the statistics of various types of spoken input in the target language to establish benchmarks against which to evaluate performance of the two populations; [4] we explore methods for automatic speech recognition that scale well with age, voice and language development and [5] we investigate the influence of, and potential approaches to exploit the uncertainty introduced by recognition errors from ASR and contrast speaking assessment based on text, speech transcription and automatic transcription.