Identifying a coffee mug on an otherwise empty desk is easy, even if you aren't looking directly at it. The same mug on a cluttered desk is much harder to identify. Why should this be so? The same visual information about the mug is reaching your eyes in both cases. It's the cluttered desk that makes the difference: when several objects are packed, or crowded, densely enough in the visual periphery, they become harder to recognize. We study the phenomenon of crowding to explore the processes underlying peripheral vision and object recognition.
The world is full of groups. Apples sit in neatly stacked pyramids at the market, and a crowd of people fills an auditorium for a concert. You can notice the "gist" of the group well before you have inspected each member: a shopper sees the apple pile as fresh or stale at a glance; a performer may see the audience's boredom or amusement. How does this happen? It turns out that our visual system is specialized to perceive groups, quickly pooling information about individuals to obtain an ensemble, or summary, representation. This ensemble perception allows us to process a lot of information at once in order to navigate a complex world efficiently.
We frequently encounter crowds of faces. Here we report that, when presented with a group of faces, observers quickly and automatically extract information about the mean emotion in the group. This occurs even when observers cannot report anything about the individual identities that comprise the group. The results reveal an efficient and powerful mechanism that allows the visual system to extract summary statistics from a broad range of visual stimuli, including faces.
From Chen & Whitney (2019) . Emotion recognition is an essential human ability critical for social functioning. It is widely assumed that identifying facial expression is the key to this, and models of emotion recognition have mainly focused on facial and bodily features in static, unnatural conditions. We developed a method called affective tracking to reveal and quantify the enormous contribution of visual context to affect (valence and arousal) perception. When characters’ faces and bodies were masked in silent videos, viewers inferred the affect of the invisible characters successfully and in high agreement based solely on visual context. We further show that the context is not only sufficient but also necessary to accurately perceive human affect over time, as it provides a substantial and unique contribution beyond the information available from face and body. Our method (which we have made publicly available) reveals that emotion recognition is, at its heart, an issue of context as much as it is about faces.