2011 LSA Annual Meeting Organized Session

Empirically examining parsimony and redundancy in usage-based models

Saturday, 8 January 2010, 2:00-3:30 PM


Contact: nsnider@bcs.rochester.edu

URL: http://www.hlp.rochester.edu/lsa2011


Recent years have seen a growing interest in usage-based (UB) theories of language, which assume that language use plays a causal role in the development of linguistic systems over historical time. A central assumption of the UB-framework is the idea that shapes of grammars are closely connected to principles of human cognitive processing (Bybee 2006, Givon 1991, Hawkins 2004). UB-accounts strongly gravitate towards sign- or construction-based theories of language, viz. theories that are committed to the belief that linguistic knowledge is best conceived of as an assembly of symbolic structures (e.g. Goldberg 2006, Langacker 2008, Sag et al. 2003). These constructionist accounts share (1) the postulation of a single representational format of all linguistic knowledge and (2) claim a strong commitment to psychological plausibility of mechanisms for the learning, storage, and retrieval of linguistic units. They do, however, exhibit a considerable degree of variation with respect to their architectural and mechanistic details (cf. Croft & Cruse 2004).

A key issue is the balancing of storage parsimony and processing parsimony: Maximizing storage parsimony is taken to imply greater computational demand and vice versa. The space of logical possibilities ranges from a complete inheritance model (minimal storage redundancy) to a full-entry model (maximal storage redundancy). Currently, the empirical validation of the theoretical situation is not yet conclusive: the representations involved in language processing involve extremely fine-grained lexical-structural co-occurrences, for example frequent four-word phrases are processed faster than infrequent ones (Bannard and Matthews 2008, Arnon and Snider 2010). On the other hand, syntactic exemplar models (Bod 2006) have been argued to overfit and undergeneralize compared to models that do not store all structures in the training data (cf. Post and Gildea 2009, although they found that Tree Substitution Grammar representations induced in a Bayesian framework still split the parsimony continuum towards greater redundancy). Also, experimental work has argued that models of categorization that directly map phonetic dimensions to phonological categories (and therefore more directly reflect the statistics of the training data) do not predict human behavior as well as models that assume independent, intermediate representations (Toscano and McMurray 2010). Additionally, recent work has provided evidence that early evidence for full-entry models from item-based learning in acquisition (e.g. Pine & Lieven 1997) is confounded, reopening this line of research as well (Yang, unpublished manuscript).

This workshop will bring together linguists, psycholinguists, and computational linguists that commit to a UB-framework to discuss which methodologies can best shed light on questions pertaining to the representational nature of constructions and the mechanisms involved in their on-line processing.

Schedule (all abstracts in order below):

Review and Position Talks:

Short data talks: