Empirically examining parsimony and redundancy in usage-based models

Edited Volume/Special issue (in prep.)

Editors:

Contact: wiechmann@anglistik.rwth-aachen.de and kerz@anglistik.rwth-aachen.de

This year’s LSA Annual Conference featured a workshop on Empirically examining parsimony and redundancy in usage-based models organized by Neal Snider (Nuance Communications, Inc.), Daniel Wiechmann (RWTH Aachen University), Elma Kerz (RWTH Aachen University) and T. Florian Jaeger (University of Rochester. The workshop brought together linguists, psycholinguists, and computational linguists and was geared to discuss which methodologies can best shed light on questions pertaining to the representational nature of constructions and the mechanisms involved in their on-line processing. The workshop abstract is given below.

Based on this workshop, we are currently preparing an edited volume, which represents the results of the workshop but also comprises additional work in this domain. Thus the objective of the volume is to present the state-of-the-art of research into parsimony and redundancy in usage-based models drawing from a variety of methodologies and interdisciplinary approaches. To this end, we welcome papers that empirically examine these issues at any level of linguistic description.

Workshop abstract:

Recent years have seen a growing interest in usage-based (UB) theories of language, which assume that language use plays a causal role in the development of linguistic systems over historical time. A central assumption of the UB-framework is the idea that shapes of grammars are closely connected to principles of human cognitive processing (Bybee 2006, Givon 1991, Hawkins 2004). UB-accounts strongly gravitate towards sign- or construction-based theories of language, viz. theories that are committed to the belief that linguistic knowledge is best conceived of as an assembly of symbolic structures (e.g. Goldberg 2006, Langacker 2008, Sag et al. 2003). These constructionist accounts share (1) the postulation of a single representational format of all linguistic knowledge and (2) claim a strong commitment to psychological plausibility of mechanisms for the learning, storage, and retrieval of linguistic units. They do, however, exhibit a considerable degree of variation with respect to their architectural and mechanistic details (cf. Croft & Cruse 2004).

A key issue is the balancing of storage parsimony and processing parsimony: Maximizing storage parsimony is taken to imply greater computational demand and vice versa. The space of logical possibilities ranges from a complete inheritance model (minimal storage redundancy) to a full-entry model (maximal storage redundancy). Currently, the empirical validation of the theoretical situation is not yet conclusive: the representations involved in language processing involve extremely fine-grained lexical-structural co-occurrences, for example frequent four-word phrases are processed faster than infrequent ones (Bannard and Matthews 2008, Arnon and Snider 2010). On the other hand, syntactic exemplar models (Bod 2006) have been argued to overfit and undergeneralize compared to models that do not store all structures in the training data (cf. Post and Gildea 2009, although they found that Tree Substitution Grammar representations induced in a Bayesian framework still split the parsimony continuum towards greater redundancy). Also, experimental work has argued that models of categorization that directly map phonetic dimensions to phonological categories (and therefore more directly reflect the statistics of the training data) do not predict human behavior as well as models that assume independent, intermediate representations (Toscano and McMurray 2010). Additionally, recent work has provided evidence that early evidence for full-entry models from item-based learning in acquisition (e.g. Pine & Lieven 1997) is confounded, reopening this line of research as well (Yang, unpublished manuscript).

References and representative work

Accepted papers: