trackdf
can handle multiple types of tracking data (in
particular those generated by GPS units and video-tracking software) and
multiple data frame classes (base::data.frame
,
tibble::tibble
, and data.table::data.table
).
This is a design choice meant to accommodate the data processing
pipelines of a maximum of users. It lets you use your favorite data
manipulation paradigm (base R
,
dplyr
/tidyverse
, or data.table
)
while still standardizing the data format across studies and
applications.
A consequence of that versatility, however, is that building a “track table” (the name we give to the structure that will hold your tracking data) requires a little bit of extra work from you (but just a little bit). This vignette covers building a track table from raw data generated by automated video-tracking software and GPS collars, for instance.
2.1 - Anatomy of a track table
At its core, a track table is just a wrapper around a data frame
structure, as defined by one of the three main data frame classes in
R
: base::data.frame
,
tibble::tibble
, and data.table::data.table
.
The choice of which data frame class is used underneath a track table is
entirely your choice and depends on your preference for one or the other
framework. trackdf
will remember that choice and do its
best to maintain it throughout your data analysis pipeline.
A track table is a specialized version of a data frame structure
aimed at storing specifically tracking data, that is positions over
time, of one or more individuals. In order to do that,
trackdf
imposes a few constraints on the construction of a
track table over a traditional data frame. First, a track table must
have at least the 4 following named columns:
-
id
: which contains the identity of the individual being tracked as character strings; -
t
: which contains the time of each observation as date-timePOSIXct
objects; -
x
andy
: which contains the positions as a numeric values of the observations along each of the axes of an Euclidean space (e.g., GPS coordinates or the pixel coordinates outputted by video-tracking software); -
z
: an optional column similar tox
andy
that can be used in the case of 3-dimensional trajectories.
You can then add as many other columns as you want to store other
data relevant to your work but these 4 columns (+ the optional
z
columns) are required in a track table object.
In addition to these columns, a track table contains two additional attributes that are necessary for certain functions of the package:
-
proj
: which contains information about coordinate reference system in which the coordinates are projected. This is mostly useful for geographic data as captured by GPS units, for instance.trackdf
can use that information to automatically reproject the data into other coordinate reference systems, for instance for working with GIS data. For video-tracking data and other tracking systems that do not output geographic data, this can set toNA
. -
type
: which contains information about the class of data frame stored in the track table object. This is mostly required for maintaining the data frame class when the track table object is manipulated usingdplyr
’s functions. It’s mostly irrelevant from a user’s point of view.
Sounds complicated? Don’t worry, trackdf
provides a
function to build track tables with just a little bit of input from you.
See the rest of the vignette below.
2.2 - Building a track table from video-tracking data
Most video-tracking software generate outputs with information about the identity of each tracked individual, their position in some form of Euclidean space (using pixel coordinates or coordinates relative to the dimensions of the experimental setup), and the time of each observation (e.g., the frame number in a video). They can also contain other forms of information relevant to the work and we will also see here how to import them into a track table.
First, let’s load some data that was generated using the trackR
video-tracking software:
raw <- read.csv(system.file("extdata/video/01.csv", package = "trackdf"))
print(raw, max = 10 * ncol(raw))
## id x y size frame track ignore track_fixed
## 1 1 629.3839 882.4783 1154 1 1 FALSE 1
## 2 2 1056.1692 656.5207 1064 1 2 FALSE 2
## 3 3 508.0092 375.2451 1624 1 3 FALSE 3
## 4 4 1277.6466 373.7491 1443 1 4 FALSE 4
## 5 5 1379.2844 343.0853 1431 1 5 FALSE 5
## 6 6 1137.1378 174.5110 1321 1 6 FALSE 6
## 7 7 737.1931 115.9394 1419 1 7 FALSE 7
## 8 8 921.8634 103.3508 1237 1 8 FALSE 8
## 9 1 629.4024 882.4129 1148 2 1 FALSE 1
## 10 2 1056.1704 656.4691 1068 2 2 FALSE 2
## [ reached 'max' / getOption("max.print") -- omitted 21972 rows ]
This data frame contains 8 columns. The positions are stored in the
x
and y
columns as pixel coordinates. Time is
store in the frame
column as a frame number of the video
the data was collected from. The identity of each tracked individual is
stored in track_fixed
(the track
column
contains the identities before manual inspection and correction;
id
can be ignored for the purpose of this tutorial).
From this raw data, you can create a track table using the
track
function as follows:
##
## Attaching package: 'trackdf'
## The following object is masked from 'package:stats':
##
## filter
tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed)
## No timezone provided. Defaulting to UTC.
## No origin provided. Defaulting to Sys.time().
## No period provided. Defaulting to 1 second.
## Track table [21982 observations]
## Number of tracks: 81
## Dimensions: 2D
## Geographic: FALSE
## Table class: data frame ('data.frame')
## id t x y
## 1 1 2023-03-26 16:15:24 629.3839 882.4783
## 2 2 2023-03-26 16:15:24 1056.1692 656.5207
## 3 3 2023-03-26 16:15:24 508.0092 375.2451
## 4 4 2023-03-26 16:15:24 1277.6466 373.7491
## 5 5 2023-03-26 16:15:24 1379.2844 343.0853
## 6 6 2023-03-26 16:15:24 1137.1378 174.5110
## 7 7 2023-03-26 16:15:24 737.1931 115.9394
## 8 8 2023-03-26 16:15:24 921.8634 103.3508
## 9 1 2023-03-26 16:15:25 629.4024 882.4129
## 10 2 2023-03-26 16:15:25 1056.1704 656.4691
## [ reached 'max' / getOption("max.print") -- omitted 21972 rows ]
track
outputs a few warnings, all related to the time
component that we provided it. Indeed, we provided it with frame numbers
that track
doesn’t know how to convert to date-time
POSIXct
objects and, therefore, defaulted to using now has
the start of the experiment, UTC as the time zone, and 1 second as the
time between two consecutive observations. We can, however, help
track
by provided the missing information into the
origin
(start of the experiment), tz
(the time
zone), and period
(time between two successive
observations) parameter of the function:
tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed,
origin = "2019-03-24 12:55:23",
period = "0.04S", # 1/25 of a second
tz = "America/New_York")
print(tt, max = 10 * ncol(tt))
## Track table [21982 observations]
## Number of tracks: 81
## Dimensions: 2D
## Geographic: FALSE
## Table class: data frame ('data.frame')
## id t x y
## 1 1 2019-03-24 12:55:23 629.3839 882.4783
## 2 2 2019-03-24 12:55:23 1056.1692 656.5207
## 3 3 2019-03-24 12:55:23 508.0092 375.2451
## 4 4 2019-03-24 12:55:23 1277.6466 373.7491
## 5 5 2019-03-24 12:55:23 1379.2844 343.0853
## 6 6 2019-03-24 12:55:23 1137.1378 174.5110
## 7 7 2019-03-24 12:55:23 737.1931 115.9394
## 8 8 2019-03-24 12:55:23 921.8634 103.3508
## 9 1 2019-03-24 12:55:23 629.4024 882.4129
## 10 2 2019-03-24 12:55:23 1056.1704 656.4691
## [ reached 'max' / getOption("max.print") -- omitted 21972 rows ]
If you would like to include in the track table some of the
additional data contained in your raw data, it is as simple as adding
extra columns when creating data frames. For instance, let’s include the
ignore
data from the raw data set:
tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed,
ignore = raw$ignore,
origin = "2019-03-24 12:55:23",
period = "0.04S", # 1/25 of a second
tz = "America/New_York")
print(tt, max = 10 * ncol(tt))
## Track table [21982 observations]
## Number of tracks: 81
## Dimensions: 2D
## Geographic: FALSE
## Table class: data frame ('data.frame')
## id t x y ignore
## 1 1 2019-03-24 12:55:23 629.3839 882.4783 FALSE
## 2 2 2019-03-24 12:55:23 1056.1692 656.5207 FALSE
## 3 3 2019-03-24 12:55:23 508.0092 375.2451 FALSE
## 4 4 2019-03-24 12:55:23 1277.6466 373.7491 FALSE
## 5 5 2019-03-24 12:55:23 1379.2844 343.0853 FALSE
## 6 6 2019-03-24 12:55:23 1137.1378 174.5110 FALSE
## 7 7 2019-03-24 12:55:23 737.1931 115.9394 FALSE
## 8 8 2019-03-24 12:55:23 921.8634 103.3508 FALSE
## 9 1 2019-03-24 12:55:23 629.4024 882.4129 FALSE
## 10 2 2019-03-24 12:55:23 1056.1704 656.4691 FALSE
## [ reached 'max' / getOption("max.print") -- omitted 21972 rows ]
Finally, track
default to using
base::data.frame
as its data frame class for storing the
data. If you prefer to work with tibble::tibble
or
data.table::data.table
, you can specify this in the
track
function as follows.
For tibble::tibble
:
tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed,
ignore = raw$ignore,
origin = "2019-03-24 12:55:23",
period = "0.04S", # 1/25 of a second
tz = "America/New_York",
table = "tbl")
print(tt)
## Track table [21982 observations]
## Number of tracks: 81
## Dimensions: 2D
## Geographic: FALSE
## Table class: tibble ('tbl_df')
## # A tibble: 21,982 × 5
## id t x y ignore
## <chr> <dttm> <dbl> <dbl> <lgl>
## 1 1 2019-03-24 12:55:23 629. 882. FALSE
## 2 2 2019-03-24 12:55:23 1056. 657. FALSE
## 3 3 2019-03-24 12:55:23 508. 375. FALSE
## 4 4 2019-03-24 12:55:23 1278. 374. FALSE
## 5 5 2019-03-24 12:55:23 1379. 343. FALSE
## 6 6 2019-03-24 12:55:23 1137. 175. FALSE
## 7 7 2019-03-24 12:55:23 737. 116. FALSE
## 8 8 2019-03-24 12:55:23 922. 103. FALSE
## 9 1 2019-03-24 12:55:23 629. 882. FALSE
## 10 2 2019-03-24 12:55:23 1056. 656. FALSE
## # ℹ 21,972 more rows
tt <- track(x = raw$x, y = raw$y, t = raw$frame, id = raw$track_fixed,
ignore = raw$ignore,
origin = "2019-03-24 12:55:23",
period = "0.04S", # 1/25 of a second
tz = "America/New_York",
table = "dt")
print(tt)
## Track table [21982 observations]
## Number of tracks: 81
## Dimensions: 2D
## Geographic: FALSE
## Table class: data table ('data.table')
## id t x y ignore
## 1: 1 2019-03-24 12:55:23 629.3839 882.4783 FALSE
## 2: 2 2019-03-24 12:55:23 1056.1692 656.5207 FALSE
## 3: 3 2019-03-24 12:55:23 508.0092 375.2451 FALSE
## 4: 4 2019-03-24 12:55:23 1277.6466 373.7491 FALSE
## 5: 5 2019-03-24 12:55:23 1379.2844 343.0853 FALSE
## ---
## 21978: 82 2019-03-24 12:57:15 580.7614 587.2513 FALSE
## 21979: 34 2019-03-24 12:57:15 493.5477 529.5454 FALSE
## 21980: 47 2019-03-24 12:57:15 498.8001 432.5990 FALSE
## 21981: 58 2019-03-24 12:57:15 562.6123 266.9754 FALSE
## 21982: 67 2019-03-24 12:57:15 1046.3904 146.4723 FALSE
2.3 - Building a track table from GPS data
Building a track table from geographic data follows similar
principles, except that track
also expect to receive
information about the coordinate reference system the data is using. You
can pass that information to track
using the
proj
parameter of the function. But first, let’s load some
data that was generated by a GPS collar worn by a goat in Namibia:
raw <- read.csv(system.file("extdata/gps/02.csv", package = "trackdf"))
print(raw, max = 10 * ncol(raw))
## date time lon lat
## 1 2015-09-10 07:00:00 15.76459 -22.37971
## 2 2015-09-10 07:00:01 15.76459 -22.37971
## 3 2015-09-10 07:00:02 15.76459 -22.37971
## 4 2015-09-10 07:00:03 15.76459 -22.37971
## 5 2015-09-10 07:00:04 15.76459 -22.37971
## 6 2015-09-10 07:00:05 15.76459 -22.37971
## 7 2015-09-10 07:00:06 15.76459 -22.37971
## 8 2015-09-10 07:00:07 15.76459 -22.37971
## 9 2015-09-10 07:00:08 15.76459 -22.37971
## 10 2015-09-10 07:00:09 15.76459 -22.37971
## [ reached 'max' / getOption("max.print") -- omitted 3590 rows ]
track
uses sf::st_crs
to interpret
information about coordinate reference systems. Therefore, you any
format accepted by sf::st_crs
to specify the coordinate
reference system can be used with track
. For data generated
using GPS units, the character string “+proj=longlat” is often all
that’s needed.
We can then create our GPS-based track table as follows:
tt <- track(x = raw$lon, y = raw$lat, t = paste(raw$date, raw$time), id = 1,
proj = "+proj=longlat", tz = "Africa/Windhoek")
print(tt, max = 10 * ncol(tt))
## Track table [3600 observations]
## Number of tracks: 1
## Dimensions: 2D
## Geographic: TRUE
## Projection: +proj=longlat
## Table class: data frame ('data.frame')
## id t x y
## 1 1 2015-09-10 07:00:00 15.76459 -22.37971
## 2 1 2015-09-10 07:00:01 15.76459 -22.37971
## 3 1 2015-09-10 07:00:02 15.76459 -22.37971
## 4 1 2015-09-10 07:00:03 15.76459 -22.37971
## 5 1 2015-09-10 07:00:04 15.76459 -22.37971
## 6 1 2015-09-10 07:00:05 15.76459 -22.37971
## 7 1 2015-09-10 07:00:06 15.76459 -22.37971
## 8 1 2015-09-10 07:00:07 15.76459 -22.37971
## 9 1 2015-09-10 07:00:08 15.76459 -22.37971
## 10 1 2015-09-10 07:00:09 15.76459 -22.37971
## [ reached 'max' / getOption("max.print") -- omitted 3590 rows ]
Note that because our raw data already contains dates and times of
the observations, we can simply combine them with paste
and
pass the result to track
that will interpret them
automatically.
Everything else works similarly to what was shown in the previous section about video-tracking data. The tutorial about manipulating data stored in a track table is provided in a separate vignette.