But is this how event detection has been realized in the aforementioned experiments that studied the interpretation of volumetric medical images? The answer to this is mixed. Two studies used only a single image from volumetric cases, thus here fixations were calculated per individual slice.48,54 Conversely, fixations that cover multiple slices were presumably not a problem in experiments where radiologists were only allowed to scroll in one direction, as the time they spent on each slice was relatively long to avoid missing something that they could not get back to.50,51 Five of the studies where scrolling was allowed, distinguished between fixations and saccades,49,52,56–58 but did not explicitly state how fixations were scored. For the majority of them, the relatively short fixation durations that are reported suggest that fixations were calculated per individual slice.49,56–58 Interestingly, six of the studies omitted the calculation of fixations and saccades altogether45–47,60,61 and used raw data instead. While there is no definite right or wrong in the detection of events in eye tracking data, there are problems associated with calculating fixations per slice and the use of raw data: as aforementioned, the use of raw data does not account for saccadic suppression and for this reason, samples are included in the analysis that do not represent the intake of visual information. Furthermore, fixation duration is often used as an indicator of physiological and mental processes, such as fatigue and mental workload. For this end, however, the physiological duration of the fixation is needed, which is only valid when calculated across slices. So what might be the reasons for implementing event detection as it has been done? An important reason for not calculating fixations across slices may be that standard software usually does not allow for the calculation of fixations across slices (see, e.g., OGAMA, SMI BeGaze). All fixations are usually mapped to one of the images or if they are sufficiently long enough to exceed the minimum duration, broken into several fixations that fall on consecutive images. Hence, custom-made software is needed that calculates fixations independent of the imaging material and subsequently maps the proportions of the fixations to the respective slices. This is laborious to implement. Additionally, fixation detection algorithms cannot account for all phenomena that are associated with the interpretation of volumetric medical images. When structures move across the screen, as is the case in fly-through colonography46,47,53,61 or in stack mode slices of large organs as in chest CT,45,60 the eyes perform smooth pursuit movements. Smooth pursuit eye movements are physical movements of the eye, but they are functionally similar to fixations in that they serve to keep visual content stable on the fovea and no suppression occurs. However, unlike fixations, smooth-pursuit eye movements do not have one center as they represent a path. So, while smooth pursuit movements can be detected by velocity-based algorithms that have multiple velocity thresholds, they cannot be mapped to only one location. Hence, if imaging material is used that fosters the performance of smooth pursuit movements, then a possible solution would be to use detection algorithms that allow for the classification of these. For the analysis of temporal characteristics, the detected events can be used. When mapping the eye movements to the specific content, it may, however, be warranted to first exclude all saccades. The raw data of fixations and smooth pursuit movements can subsequently be mapped to the specific image locations to capture visual attention on all structures that are displayed and not just the center of the smooth-pursuit movements.