In spoken word recognition, the future predicts the past.

03 November 2016
Laura Gwilliams
PhD student, Cognition and Perception Doctoral Program, New York University
In spoken word recognition, the future predicts the past.

Speech is an inherently noisy and ambiguous signal. In order to fluently derive meaning, a listener must integrate top-down contextual information to guide interpretations of the sensory input. While many studies have demonstrated the influence of prior context on bottom-up processing, the neural mechanisms supporting the integration of subsequent information remain unknown. In addressing this issue, I will describe two magnetoencephalography experiments that investigate how later input determines the perception of previously heard speech sounds. Participants (n=25) listened to word pairs that, apart from the initial consonant, have an identical speech stream until point-of-disambiguation (POD) (e.g. “parak-eet”, “barric-ade”). Pre-POD onsets were morphed to create voicing or place of articulation continua (e.g. parak-eet <-> barak-eet). The results illustrate that subphonemic detail of the onset sound —in terms of phonetic features and phoneme ambiguity— is preserved over long timescales (at least 700 ms — the longest duration we tested), and re-evoked at POD. Phonological commitment —identification of discrete phoneme categories— resolves on the shorter time-scale of ~450 ms. Together, the findings suggest that subphonemic information is maintained until it can be optimally integrated with top-down information, which is a distinct computation recruited in parallel to phonological commitment.