Earlier studies suggested that the visual system processes information at the basic level (e.g., dog) faster than at the subordinate (e.g., Dalmatian) or superordinate (e.g., animals) levels. However, the advantage of the basic category over the superordinate category in object recognition has been challenged recently, and the hierarchical nature of visual categorization is now a matter of debate. To address this issue, we used a forced-choice saccadic task in which a target and a distractor image were displayed simultaneously on each trial and participants had to saccade as fast as possible toward the image containing animal targets based on different categorization levels. This protocol enables us to investigate the first 100-120 msec, a previously unexplored temporal window, of visual object categorization. The first result is a surprising stability of the saccade latency (median RT ?155 msec) regardless of the animal target category and the dissimilarity of target and distractor image sets. Accuracy was high (around 80% correct) for categorization tasks that can be solved at the superordinate level but dropped to almost chance levels for basic level categorization. At the basic level, the highest accuracy (62%) was obtained when distractors were restricted to another dissimilar basic category. Computational simulations based on the saliency map model showed that the results could not be predicted by pure bottom-up saliency differences between images. Our results support a model of visual recognition in which the visual system can rapidly access relatively coarse visual representations that provide information at the superordinate level of an object, but where additional visual analysis is required to allow more detailed categorization at the basic level.
Basic-level categorization has long been thought to be the entry level for object representations. However, this view is now challenged. In particular, Macé et al. [M.J.-M. Macé et al. (2009) PLoS One, 4, e5927] showed that basic-level categorization (such as 'bird') requires a longer processing time than superordinate-level categorization (such as 'animal'). It has been argued that this result depends on the brief stimulus presentation times used in their study, which would degrade the visual information available. Here, we used a go/no-go paradigm to test whether the superordinate-level advantage could be observed with longer stimulus durations, and also investigated the impact of manipulating the target and distractor set heterogeneity. Our results clearly show that presentation time had no effect on categorization performance. Both target and distractor diversity influenced performance, but basic-level categories were never accessed faster or with higher accuracy than superordinate-level categories. These results argue in favor of coarse to fine visual processing to access perceptual representations.
Rapid object visual categorization in briefly flashed natural scenes is influenced by the surrounding context. The neural correlates underlying reduced categorization performance in response to incongruent object/context associations remain unclear and were investigated in the present study using fMRI. Participants were instructed to categorize objects in briefly presented scenes (exposure duration=100ms). Half of the scenes consisted of objects pasted in an expected (congruent) context, whereas for the other half, objects were embedded in incongruent contexts. Object categorization was more accurate and faster in congruent relative to incongruent scenes. Moreover, we found that the two types of scenes elicited different patterns of cerebral activation. In particular, the processing of incongruent scenes induced increased activations in the parahippocampal cortex, as well as in the right frontal cortex. This higher activity may indicate additional neural processing of the novel (non experienced) contextual associations that were inherent to the incongruent scenes. Moreover, our results suggest that the locus of object categorization impairment due to contextual incongruence is in the right anterior parahippocampal cortex. Indeed in this region activity was correlated with the reaction time increase observed with incongruent scenes. Representations for associations between objects and their usual context of appearance might be encoded in the right anterior parahippocampal cortex.
Visual categorization appears both effortless and virtually instantaneous. The study by Thorpe et al. (1996) was the first to estimate the processing time necessary to perform fast visual categorization of animals in briefly flashed (20?ms) natural photographs. They observed a large differential EEG activity between target and distracter correct trials that developed from 150?ms after stimulus onset, a value that was later shown to be even shorter in monkeys! With such strong processing time constraints, it was difficult to escape the conclusion that rapid visual categorization was relying on massively parallel, essentially feed-forward processing of visual information. Since 1996, we have conducted a large number of studies to determine the characteristics and limits of fast visual categorization. The present chapter will review some of the main results obtained. I will argue that rapid object categorizations in natural scenes can be done without focused attention and are most likely based on coarse and unconscious visual representations activated with the first available (magnocellular) visual information. Fast visual processing proved efficient for the categorization of large superordinate object or scene categories, but shows its limits when more detailed basic representations are required. The representations for basic objects (dogs, cars) or scenes (mountain or sea landscapes) need additional processing time to be activated. This finding is at odds with the widely accepted idea that such basic representations are at the entry level of the system. Interestingly, focused attention is still not required to perform these time consuming basic categorizations. Finally we will show that object and context processing can interact very early in an ascending wave of visual information processing. We will discuss how such data could result from our experience with a highly structured and predictable surrounding world that shaped neuronal visual selectivity.
Conceptual abilities in animals have been shown at several levels of abstraction, but it is unclear whether the analogy with humans results from convergent evolution or from shared brain mechanisms inherited from a common origin. Macaque monkeys can access "non-similarity-based concepts," such as when sorting pictures containing a superordinate target category (animal, tree, etc.) among other scenes. However, such performances could result from low-level visual processing based on learned regularities of the photographs, such as for scene categorization by artificial systems. By using pictures of man-made objects or animals embedded in man-made or natural contexts, the present study clearly establishes that macaque monkeys based their categorical decision on the presence of the animal targets regardless of the scene backgrounds. However, as is found with humans, monkeys performed better with categorically congruent object/context associations, especially when small object sizes favored background information. The accuracy improvements and the response-speed gains attributable to superordinate category congruency in monkeys were strikingly similar to those of human subjects tested with the same task and stimuli. These results suggest analogous processing of visual information during the activation of abstract representations in both humans and monkeys; they imply a large overlap between superordinate visual representations in humans and macaques as well as the implicit use of experienced associations between object and context.
We tested rapid-categorization in a patient who was impaired in face and object recognition. Photographs of natural scenes were displayed for 100 ms. Participants had to press a key when they saw an animal among various objects as distractors or human faces among animal faces as distractors. Though the patient was impaired at figure/ground segregation, recognized very few objects and faces, she categorized animals and faces with a performance ranging between 70 and 86% correct. Displaying pictures in isolation did not improve performance. The results suggest that rapid categorization can be accomplished on the basis of coarse information without overt recognition.
An optimal correspondence of temporal information between the physical world and our perceptual world is important for survival. In the current study, we demonstrate a novel temporal illusion in which the cause of a perceptual event is perceived after the event itself. We used a paradigm referred to as motion-induced blindness (MIB), in which a static visual target presented on a constantly rotating background disappears and reappears from awareness periodically, with the dynamic characteristics of bistable perception. A sudden stimulus onset (e.g., a flash) presented during a period of perceptual suppression (i.e., during MIB) is known to trigger the almost instantaneous reappearance of the suppressed target. Surprisingly, however, we report here that although the sudden flash is the cause of the static targets reappearance (the corresponding effect), it is systematically perceived as occurring after this reappearance. Further investigation revealed that this illusory temporal reversal is caused by an approximately 100 ms advantage for the unconscious representation of the perceptually suppressed target to access consciousness, as compared to the newly presented flash. This new temporal illusion therefore reveals the normally hidden delays in bringing new visual events to awareness.
Since the pioneering study by Rosch and colleagues in the 70s, it is commonly agreed that basic level perceptual categories (dog, chair...) are accessed faster than superordinate ones (animal, furniture...). Nevertheless, the speed at which objects presented in natural images can be processed in a rapid go/no-go visual superordinate categorization task has challenged this "basic level advantage".
The ability of monkeys to categorize objects in visual stimuli such as natural scenes might rely on sets of low-level visual cues without any underlying conceptual abilities. Using a go/no-go rapid animal/non-animal categorization task with briefly flashed achromatic natural scenes, we show that both human and monkey performance is very robust to large variations of stimulus luminance and contrast. When mean luminance was increased or decreased by 25-50%, accuracy and speed impairments were small. The largest impairment was found at the highest luminance value with monkeys being mainly impaired in accuracy (drop of 6% correct vs. <1.5% in humans), whereas humans were mainly impaired in reaction time (20 ms increase in median reaction time vs. 4 ms in monkeys). Contrast reductions induced a large deterioration of image definition, but performance was again remarkably robust. Subjects scored well above chance level, even when the contrast was only 12% of the original photographs ( approximately 81% correct in monkeys; approximately 79% correct in humans). Accuracy decreased with contrast reduction but only reached chance level -in both species- for the most extreme condition, when only 3% of the original contrast remained. A progressive reaction time increase was observed that reached 72 ms in monkeys and 66 ms in humans. These results demonstrate the remarkable robustness of the primate visual system in processing objects in natural scenes with large random variations in luminance and contrast. They illustrate the similarity with which performance is impaired in monkeys and humans with such stimulus manipulations. They finally show that in an animal categorization task, the performance of both monkeys and humans is largely independent of cues relying on global luminance or the fine definition of stimuli.
This study aimed to determine the extent to which rapid visual context categorization relies on global scene statistics, such as diagnostic amplitude spectrum information. We measured performance in a Natural vs. Man-made context categorization task using a set of achromatic photographs of natural scenes equalized in average luminance, global contrast, and spectral energy. Results suggest that the visual system might use amplitude spectrum characteristics of the scenes to speed up context categorization processes. In a second experiment, we measured performance impairments with a parametric degradation of phase information applied to power spectrum averaged scenes. Results showed that performance accuracy was virtually unaffected up to 50% of phase blurring, but then rapidly fell to chance level following a sharp sigmoid curve. Response time analysis showed that subjects tended to make their fastest responses based on the presence of diagnostic man-made information; if no man-made characteristics enable to reach rapidly a decision threshold, because of a natural scene display or a high level of noise, the alternative decision for a natural response became increasingly favored. This two-phase strategy could maximize categorization performance if the diagnostic features of man-made environments tolerate higher levels of noise than natural features, as proposed recently.
The processes underlying object recognition are fundamental for the understanding of visual perception. Humans can recognize many objects rapidly even in complex scenes, a task that still presents major challenges for computer vision systems. A common experimental demonstration of this ability is the rapid animal detection protocol, where human participants earliest responses to report the presence/absence of animals in natural scenes are observed at 250-270 ms latencies. One of the hypotheses to account for such speed is that people would not actually recognize an animal per se, but rather base their decision on global scene statistics. These global statistics (also referred to as spatial envelope or gist) have been shown to be computationally easy to process and could thus be used as a proxy for coarse object recognition. Here, using a saccadic choice task, which allows us to investigate a previously inaccessible temporal window of visual processing, we showed that animal - but not vehicle - detection clearly precedes scene categorization. This asynchrony is in addition validated by a late contextual modulation of animal detection, starting simultaneously with the availability of scene category. Interestingly, the advantage for animal over scene categorization is in opposition to the results of simulations using standard computational models. Taken together, these results challenge the idea that rapid animal detection might be based on early access of global scene statistics, and rather suggests a process based on the extraction of specific local complex features that might be hardwired in the visual system.
Complex visual scenes can be categorized at the superordinate level (e.g., animal/non-animal or vehicle/non-vehicle) without focused attention. However, rapid visual categorization at the basic level (e.g., dog/non-dog or car/non-car) requires additional processing time. Such finer categorization might, thus, require attentional resources. This hypothesis was tested in the current study with a dual-task paradigm in which subjects performed a basic-level categorization task in peripheral vision either alone (single-task condition) or concurrently with an attentionally demanding letter discrimination task (dual-task condition). Our results indicate that basic-level categorization of either biological (dog/non-dog animal) or man-made (car/non-car vehicle) stimuli requires more information uptake but can, nevertheless, be performed when attention is not fully available, presumably because it is supported by hardwired, specialized neuronal networks.
Related JoVE Video
Journal of Visualized Experiments
What is Visualize?
JoVE Visualize is a tool created to match the last 5 years of PubMed publications to methods in JoVE's video library.
How does it work?
We use abstracts found on PubMed and match them to JoVE videos to create a list of 10 to 30 related methods videos.
Video X seems to be unrelated to Abstract Y...
In developing our video relationships, we compare around 5 million PubMed articles to our library of over 4,500 methods videos. In some cases the language used in the PubMed abstracts makes matching that content to a JoVE video difficult. In other cases, there happens not to be any content in our video library that is relevant to the topic of a given abstract. In these cases, our algorithms are trying their best to display videos with relevant content, which can sometimes result in matched videos with only a slight relation.