A suite of spatiotemporal processing methods are presented to analyze human trajectory data, such as that collected using a GPS device, for the purpose of modeling pedestrian space-time activities.
It is well recognized that human movement in the spatial and temporal dimensions has direct influence on disease transmission1-3. An infectious disease typically spreads via contact between infected and susceptible individuals in their overlapped activity spaces. Therefore, daily mobility-activity information can be used as an indicator to measure exposures to risk factors of infection. However, a major difficulty and thus the reason for paucity of studies of infectious disease transmission at the micro scale arise from the lack of detailed individual mobility data. Previously in transportation and tourism research detailed space-time activity data often relied on the time-space diary technique, which requires subjects to actively record their activities in time and space. This is highly demanding for the participants and collaboration from the participants greatly affects the quality of data4.
Modern technologies such as GPS and mobile communications have made possible the automatic collection of trajectory data. The data collected, however, is not ideal for modeling human space-time activities, limited by the accuracies of existing devices. There is also no readily available tool for efficient processing of the data for human behavior study. We present here a suite of methods and an integrated ArcGIS desktop-based visual interface for the pre-processing and spatiotemporal analyses of trajectory data. We provide examples of how such processing may be used to model human space-time activities, especially with error-rich pedestrian trajectory data, that could be useful in public health studies such as infectious disease transmission modeling.
The procedure presented includes pre-processing, trajectory segmentation, activity space characterization, density estimation and visualization, and a few other exploratory analysis methods. Pre-processing is the cleaning of noisy raw trajectory data. We introduce an interactive visual pre-processing interface as well as an automatic module. Trajectory segmentation5 involves the identification of indoor and outdoor parts from pre-processed space-time tracks. Again, both interactive visual segmentation and automatic segmentation are supported. Segmented space-time tracks are then analyzed to derive characteristics of one’s activity space such as activity radius etc. Density estimation and visualization are used to examine large amount of trajectory data to model hot spots and interactions. We demonstrate both density surface mapping6 and density volume rendering7. We also include a couple of other exploratory data analyses (EDA) and visualizations tools, such as Google Earth animation support and connection analysis. The suite of analytical as well as visual methods presented in this paper may be applied to any trajectory data for space-time activity studies.
1. Getting Data
2. Pre-processing
3. Trajectory Segmentation & Activity Space Characterization
4. Density Surface Mapping
5. Density Volume Estimation and Volume Rendering
6. Other Exploratory Data Analyses (EDA) and Visualizations
Trajectory data was collected by volunteering undergraduate students from Kean University (NJ, USA) in spring 2010. The purpose was to study activity patterns of students who caught influenza (diagnosed by doctor or self-diagnosed) in comparison to those who did not. In order to illustrate the methods and procedure presented in this paper we took the trajectories collected within the suburban campus area to generate representative results. Trajectories within the campus area are mostly pedestrian trajectories, with only a small portion resulting from driving between the various parking lots and outside of the campus.
The space-time cube representation of trajectories with reference to buildings on the university campus is shown in Figure 1. Figure 1A is the raw data collected by a student recording one day of his activity on campus using an AGPS device (a commercial child tracker). It is obvious that some long duration of indoor stays has resulted in noisy data (indicated by the spiky portion of the track). This is very common in pedestrian trajectory data. Figure 1B shows the pre-processed and segmented trajectory. Figure 1C shows the pre-processed and segmented trajectory with color-coded indoor and outdoor segments in the space-time cube.
Figures 2 to 4 illustrate density surface mapping of a set of trajectories. Figure 2 shows the raw tracking points involved to perform a ‘Track point density’ mapping option (Figure 2A) and the resulting density map (Figure 2B). Instead of mapping densities of tracking points, Figure 3 maps the densities of traveled paths. Density mapping is particular useful when analyzing large amount of trajectories. Figure 4A displays a total of 470 trajectories. Figures 4B and 4C show the density surface in a 2D (left) and 3D representations (right) using re-sampled points from these trajectories.
In addition to the interactive display of the temporal dimension in a space-time cube, the time variable can be processed through temporal focusing to examine spatial patterns at different time periods. Figure 5 shows examples of such analysis using the sample data set that contains trajectory data collected by students during the flu season. It is obvious that their activities are centered around different locations throughout the day to lead eventually to the composite activity density map on the bottom.
An example of density volume rendering is illustrated in Figure 6. Figure 6A shows that it is hard to detect patterns if all the space-time tracks are visualized in space-time cube because of visual clutters. Figure 6B shows the corresponding density volume rendering results. The four illustrations represent different settings of the transfer function of our density rendering program, thus highlight density volumes at different frequency ranges.
Another way of finding hotspots is through connection analysis. Figure 7 illustrates the result of such analysis with our sample data set. Figure 7A shows the straight line connections among all buildings on campus. The highlighted buildings are those with the highest outbound traffic volume. Figure 7B shows the same connections, with the most trafficked connections highlighted.
Figure 1. Pre-processing and segmentation of trajectory data. A: 2D view of a raw trajectory on the background of campus buildings; B: pre-processed trajectory; C: space-time cube representation of segmented trajectory. Click here to view larger figure.
Figure 2. Density surface mapping. A: raw track points of a trajectory data set; B: density surface derived from track points.
Figure 3. Density surface of trajectory paths.
Figure 4. Colored density surface mapping. A: a total of 470 trajectories; B: colored density surface in 2D; C: colored density surface in 3D. Click here to view larger figure.
Figure 5. Temporal focusing for density mapping: student activity densities on campus at different time periods.
Figure 6. Density volume rendering and visualization. A: visual clutters resulting from raw trajectories; B: spatiotemporal clusters highlighted by visualizing density volumes at different frequency ranges. Click here to view larger figure.
Figure 7. Connection analysis results. A: straight line connections among all buildings on a university campus derived from trajectory data, with the most trafficked buildings highlighted; B: the most trafficked connections among buildings on campus. Click here to view larger figure.
Figure 8. Portion of an activity diary recorded by a student.
Figure 9. Activity density patterns of two groups of students. A: activity density patterns of students with mild flu symptoms during a flu season; B: activity density patterns of students with more notable flu symptoms.
Figure 10. Connection analysis results based on trajectory data of students who had notable flu symptoms during a flu season. A: straight line connections among buildings with the most trafficked buildings highlighted; B: the most trafficked connections taken by students with flu. Click here to view larger figure.
We used add-in mechanism of ArcGIS to develop the interface. All the interactive operations were implemented using C++. All the automatic processing and analysis functions were developed using Python.
AGPS data, or GPS data collected by pedestrian presents unique challenge in preprocessing as the errors can be massive due to adjacency to buildings and frequent indoor stops. Moreover, the focus of preprocessing should not be data reduction as what is usually done for vehicle GPS trajectory data due to the already obvious scarcity of tracking points. The obvious error patterns in pedestrian trajectory data, however, provide unique solution to preprocessing. Instead of using standard preprocessing algorithms14, we developed the heuristic method (2.3) that mimics a manual visual error detection approach mentioned in 2.2) and cleans up errors in the trajectory data. Specifically, it calculates attributes (speed and direction change) for each track point in a trajectory first. Track points with unrealistic high speeds and/or direction changes are removed. It then re-calculates attributes (duration and direction change) for each remaining track point and detects clusters of track points with spiky shapes (a series of track points with abrupt direction changes). Finally the spatiotemporal centroid of each cluster is calculated and the trajectory is reduced and adjusted to go through the centroid.
The automatic pre-processing and trajectory segmentation algorithms have been evaluated using traditional activity diary data. Ten students were recruited to each carry an AGPS to collect the trajectory data and at the same time were asked to actively record their stops and movements. Portion of a typical activity diary is illustrated in Figure 8. A three-day experiment generated 30 trajectories. Pre-processed and segmented trajectories were compared to the diary data. The results indicated that 1) the processed trajectories captured majority of indoor activities; 2) time recorded in the trajectory data is more accurate as diary takers often write down a rough estimate of time; 3) the trajectory data captured all details of walkway paths while only straight line connections could be obtained from diary data; and 4) some activities are missing from the diary data as participants often skip records due to the burden. But one limitation of our approach is that the segmented trajectory data sometimes mislabel an indoor segment in a wrong building, especially when two buildings are connected to each other, which is the case with some buildings in our experiment. Improvement on this aspect of the algorithm is needed.
Density surface mapping is an effective tool to explore activity patterns, especially when large amount of trajectory data is involved. Figure 4 shows that a large number of trajectories leads to apparent visual clutter if displayed in its original form while density mapping reveals interesting patterns. A simple application of this was conducted using the data set collected on Kean university campus during the 2010 flu season. Students who caught the flu and students who didn’t generated two sets of trajectories. Students were also interviewed regarding the severity of their symptoms. Figure 9 illustrates the activity density patterns of two groups of students, one showing only mild symptoms (Figure 9A) and another showing more notable ones (Figure 9B). It is noted that the real sick students’ activity space tends to cluster around a particular building. Further investigation could thus be conducted to determine the causes of such clustering. This experiment indicates that the method has the potential to reveal hidden patterns in trajectory data.
The above density surface maps, however, collapses the temporal dimension. Density volume visualization uses the notion of a space-time cube and represents both spatial and temporal dimensions. Figure 6 indicates that such visualization is effective in dealing with visual cluttering problems. Once made interactive, it allows one to manipulate the rendering to highlight different frequency ranges in the data to detect patterns. One limitation of our current approach, however, is that the rendered volume does not appear to be completely smooth. We are in the process of improving the density estimation algorithm to deal with the issue. One consideration is that Kernel density estimation may improve the visual effect, but the computation time would become much longer. Sequential kernel density estimation15 could be another option that we would investigate.
In addition to detecting activity clusters (hot spots) in time and space, the method we introduced in 7.2 detects another kind of hot spots that is related to connections among places. Again with our data set collected by students during a flu season, connection analysis was conducted to identify strong connections among campus buildings based on all trajectory data (Figure 7) as well as those representing only students who demonstrated notable flu symptoms (Figure 10). Comparing the two figures, we see that one connection (between a building called the University Center and another called the CAS building) appears to be a strong connection in general (Figure 7) but is missing from the set of strong connections identified for sick students (Figure 10). The two strong connections that remain in the latter are one between the University Center and the Science Building and another among Science Building, Hennings Hall and Hutchinson Hall. Knowledge regarding these campus buildings indicates that the University Center is the most heavily trafficked stop on campus with a cafeteria and recreation rooms inside. It could be a potential high risk hub during flu season when students interact with each for a long period of time in a crowded space. It is also learned that the three buildings involved in the second connection are all attached to each other with indoor pathways. These buildings have classrooms where students may spend many hours indoors taking classes without having to go outside of a building. These buildings are also relatively old constructions with aged ventilation systems that could increase risks of respiratory disease transmission. The CAS building that appears in the connection in Figure 7 but not in Figure 10, on the other hand, is a brand new building and stands by itself in large open space. Good ventilation and the fact that student activity has to involve outdoor time period when taking classes elsewhere both could lead to lower risks. These, are of course speculations but proves that such analysis, like other methods presented in this paper can be a useful exploratory analysis tool to reveal hidden patterns. This package, however, by no means includes all possible methods useful for trajectory data analyses. We are keeping our efforts going to develop and incorporate more analytical as well as visual functions into our system.
The authors have nothing to disclose.
This work is funded by NIH grant 1R03AI090465.
Name of the reagent | Company | Catalogue number | Comments (optional) |
WorldTracker GPRS | Tracking The World | ||
A personal computer for running the analysis | |||
ArcGIS software | ESRI | ||
Trajectory Analyzer Extension |