# Behavioral Change Point Analysis

**IMPORTANT UPDATE: A preliminary version of a BCPA R-package is now available here**
Eli 07:39, 25 July 2013 (PDT)

The **Behavioral Change Point Analysis** (**BCPA**) is a method of identifying hidden shifts in the underlying parameters of a time series, developed specifically to be applied to animal movement data which is irregularly sampled.

The original paper on which it is based is: E. Gurarie, R. Andrews and K. Laidre A novel method for identifying behavioural changes in animal movement data (2009) *Ecology Letters* 12:5 395-408. Most of the material is also present in Chapter 5 of my PhD dissertation (click to access).

I have received numerous requests for the R code behind the BCPA, so (after sending out more or less the same email with the same attachments several dozen times) have decided that it might be more efficient to post the code and a sample work flow of using the BCPA on this wiki. My apologies in advance for poor annotation.

Please feel free to ask questions or comment on the discussion page of the wiki-page (if you are logged on) or via email at: *eliezg (at) uw.edu*.

Eli 03:27, 19 July 2011 (PDT)

## Contents |

## Brief Introduction

Briefly, there are several hierarchically layered parts to the method. For location and time data {*Z*,*T*}, the analysis is performed on velocity (estimated as *V* = Δ*Z* / Δ*T* components *V*cos(θ) and *V*sin(θ). These components are (importantly) assumed to be observations from a continuous time, Gaussian process, (i.e. an Ornstein-Uhlenbeck process), with mean μ(*t*), variance σ^{2}(*t*) and autocorrelation ρ(*t*). The values of these parameters are assumed to change gradually or abruptly. The purpose of the BCPA is to identify the locations where changes are abrupt (assumed to correspond to discrete changes in an animal's behavior).

The distribution function of this process is given by:

- First, we identifying a likelihood function for ρ, given estimates of and , in a behaviorally homogenous region using the distribution function above:

- Second, we identify a "most likely change point" (MLCP) in a time series where the behavior may have changed by taking the product of the likelihoods of the estimates to the left and to the right of all possible change points in a time series. We identify which of the parameters (if any) have changed by comparing the BIC of eight possible models: M0 - no changes in any parameter, M1 - change in μ only, M2 - change in σ only, M3 - change in ρ only ... etc ... M7 - chance in all three parameters.

- Third, we sweep the MLCP changepoint across a complete data set, recording at every point what the parameter values are to the left and right of all MLCP's under the model with the highest BIC, and record the paraemters.

- Fourth, we somehow present this mass of analysis.

I present here the code for all these steps, as they were applied in the original paper, and apply them to a simulated dataset. Naturally, implementation of this type of analysis should be specific to the relevant application.

## Code pieces

### Likelihood of ρ parameter

*Usage*: GetRho(x, t)

*Description*: This function works first by estimating the mean and standard deviation directly from x, using these to standardize the time series, and then optimizes for the likelihood of the value of rho. The equation for the likelihood is given above (and in equations 10 and 11 in the BCPA paper).

*Value*: Returns a vector with two values (again - not a list or dataframe for greater speed). The first value is "rho" estimate, the second is the log likelihood of the estimate.

*Arguments*

- x Values of time series.
- t Times of measurements associated with x.

GetRho <- function (x, t) { getL <- function(rho) { dt <- diff(t) s <- sd(x) mu <- mean(x) n <- length(x) x.plus <- x[-1] x.minus <- x[-length(x)] Likelihood <- dnorm(x.plus, mean = mu + (rho^dt) * (x.minus - mu), sd = s * sqrt(1 - rho^(2 * dt))) logL <- sum(log(Likelihood)) if (!is.finite(logL)) logL <- -10^10 return(-logL) } o <- optimize(getL, lower = 0, upper = 1, tol = 1e-04) return(c(o$minimum, o$objective)) }

### Total likelihood within a behaviorally homogenous section

*Usage*: GetLL(x, t, mu, s, rho)

*Description*: Returns log-likelihood of a given parameter set for a gappy time series.

*Value*: Returns value of the log likelihood

*Arguments*:

- x Time series data.
- t Times at which data is obtained
- mu Mean estimate
- s Sigma (standard deviation) estimate (>0)
- rho Rho estimate (between 0 and 1)

GetLL <- function (x, t, mu, s, rho) { dt <- diff(t) n <- length(x) x.plus <- x[-1] x.minus <- x[-length(x)] Likelihood <- dnorm(x.plus, mean = mu + (rho^dt) * (x.minus - mu), sd = s * sqrt(1 - rho^(2 * dt))) LL <- -sum(log(Likelihood)) return(LL) }

### Likelihood of single change point

*Usage*: GetDoubleL(x, t, tbreak)

*Description*: Takes a time series with values "x" obtained at time "t" and a time break "tbreak" and returns the estimates of "mu", "sigma" and "rho" as well as the negative log-likelihood of those estimates (given the data) both before and after the break.

*Value*: Returns a labeled matrix (more efficient than a data frame) with columns: "mu", "sigma", "rho" and "LL" corresponding to the estimates and 2 rows for each side of the break point.

*Arguments*:

- x Values of time series.
- t Times of measurements associated with x.
- tbreak Breakpoint (in terms of the INDEX within "t" and "x", not actual time value).

GetDoubleL <- function(x,t,tbreak) { x1 <- x[1:tbreak] x2 <- x[tbreak:length(x)] t1 <- t[1:tbreak] t2 <- t[tbreak:length(t)] o1<-GetRho(x1,t1) o2<-GetRho(x2,t2) mu1 <- mean(x1) sigma1 <- sd(x1) rho1 <- o1[1] mu2 <- mean(x2) sigma2 <- sd(x2) rho2 <- o2[1] LL1 <- -o1[2] LL2 <- -o2[2] m <- matrix(c(mu1,mu2,sigma1,sigma2,rho1,rho2,LL1,LL2),2,4) colnames(m) <- c("mu","sigma","rho","LL") return(m) }

### Sweeping breaks

*Usage*: SweepBreaks(x, t, range=0.6)

*Description*: Finds a single change point within a time series.

*Arguments*:

- x Values of time series.
- t Times of measurements associated with x.
- range Range of possible breaks. Default (0.6) runs approximately from 1/5th to 4/5ths of the total length of the time series.

*Value*: Returns a matrix (not a data.frame for greater speed) with column headings: "breaks","tbreaks","mu1","sigma1","rho1","LL1","mu2","sigma2","rho2","LL2","LL". This is calculated for every possible break - which extends from 0.2l to 0.8l (where l is the length of the time series). The output of this function feeds WindowSweep.

SweepBreaks <- function(x,t,range=0.6) { n<-length(t) start <- (1-range)/2 breaks<-round((start*n):((1-start)*n)) Ls<-breaks*0 l<-length(breaks) BreakParams <- matrix(NA,l,8) #BreakParams <- data.frame(mu1=NA,s1=NA,rho1=NA,LL1=NA,mu2=NA,s2=NA,rho2=NA,LL2=NA) for(i in 1:l) { myDoubleL <- GetDoubleL(x,t,breaks[i]) BreakParams[i,] <- c(myDoubleL[1,],myDoubleL[2,]) } # remember: LL1 and LL2 are columns 4 and 8 BreakMatrix<- cbind(breaks,t[breaks], BreakParams, BreakParams[,4]+BreakParams[,8]) colnames(BreakMatrix) <- c("breaks","tbreaks","mu1","sigma1","rho1","LL1","mu2","sigma2","rho2","LL2","LL") return(BreakMatrix[2:nrow(BreakMatrix),]) }

### Choosing a model

There are eight model functions, named M0 to M7, with redundant code which I have placed here: BCPA/Model Specification. Each of these function take data "x", at times "t", and breakpoint "tbreak" and return a named dataframe of parameters, log-likelihoods and BIC values: "data.frame(LL,bic,mu1,s1,rho1,mu2,s2,rho2)"

The ouput of all the models is obtained using this function, which returns a data.frame including columns: "Model", followed by the estimate output of each model function M0-M7 (i.e. data.frame(LL,bic,mu1,s1,rho1,mu2,s2,rho2)):

GetModels <- function(x,t,tbreak,K=2) { for(i in 0:7) { f<-get(paste("M",i,sep="")) myr<-data.frame(Model=i,f(x,t,tbreak,K)) ifelse(i==0, r<-myr, r<-rbind(r,myr)) } return(r) }

### Window sweeping

This is the key function which sweeps the analysis above using windows of size "windowsize", stepping by "windowstep" and selecting the best model according to BIC and recording the estimated parameters.

The "K" is the coefficient in front of the likelihood in the BIC, which we allow to be tweaked. Note that: , where little *k* is the number of parameters. Thus, according to the definition of BIC, *K* should be equal to 2. However, if it is lower then the penalty of more complex models is greater, so it can be used to tweak the sensitivity of the analysis.

Note also that is "plotme" is "TRUE", a plot will appear showing the sweeping window and the selected MLCP. Along with letting you know what the algorithm is up to, it is also (I think) rather pleasantly hypnotic to watch the change points jump around.

Finally, this function returns what I call a "window sweep" object, which is basically a data.frame that includes the MLCP for each window location with all the parameter estimates, likelihoods, bic values, and selected models.

WindowSweep <- function (x, t, windowsize = 50, windowstep = 1, sine = 0, K = 2, plotme = TRUE) { low <- seq(1, (length(t) - windowsize), windowstep) hi <- low + windowsize for (i in 1:length(low)) { myx <- x[low[i]:hi[i]] myt <- t[low[i]:hi[i]] bp <- SweepBreaks(myx, myt) myestimate <- bp[bp[, 11] == max(bp[, 11]), ] breakpoint <- myestimate[1] tbreak <- myestimate[2] ifelse(sine, allmodels <- GetModelsSin(myx, myt, breakpoint, K), allmodels <- GetModels(myx, myt, breakpoint, K)) mymodel <- allmodels[allmodels$bic == min(allmodels$bic),] mymodel <- data.frame(mymodel, Break = tbreak) ifelse(i == 1, estimates <- mymodel, estimates <- rbind(estimates, mymodel)) if (plotme) { plot.ts(t, x, type = "l", col = "grey") lines(t, x, type = "l") lines(myt, myx, col = "green") abline(v = tbreak) print(estimates[i, ]) } } return(data.frame(estimates)) }

### Partitioning parameters

Finally, the output of the windowsweep function (which we call "ws"), is converted to parameter estimates over the entire complex timeseries using the following function:

*Usage*: PartitionParameters(ws, t, windowsize = 50, windowstep = 1)

*Description*: Estimation of all parameters as a rolling average of the window-sweep output.

*Arguments*:

- ws Output of WindowSweep
- t Time values of time-series measurements.
- windowsize Window size
- windowstep Increment of window step

*Value*: Returns a data frame with columns "mu.hat", "s.hat" and "rho.hat" for each location in the time-series (i.e., all of the time series minus the first and last range of windowsize/2).

PartitionParameters <- function(ws,t,windowsize=50,windowstep=1) { n.col<-length(t) n.row<-dim(ws)[1] mu.M <- matrix(NA,n.row,n.col) s.M <- matrix(NA,n.row,n.col) rho.M <- matrix(NA,n.row,n.col) for(i in 1:n.row) { myws<-ws[i,] dts <- abs(t-myws$Break) tbreak <- match(min(dts),dts) max <- min(n.col,i+windowsize) mu.M[i,i:tbreak] <- myws$mu1 mu.M[i,(tbreak+1):max] <- myws$mu2 s.M[i,i:tbreak] <- myws$s1 s.M[i,(tbreak+1):max] <- myws$s2 rho.M[i,i:tbreak] <- myws$rho1 rho.M[i,(tbreak+1):max] <- myws$rho2 } adjust <- colSums(!is.na(mu.M)) mu.hat<-colSums(mu.M,na.rm=1)/adjust s.hat<-colSums(s.M,na.rm=1)/adjust rho.hat<-colSums(rho.M,na.rm=1)/adjust return(data.frame(mu.hat,s.hat,rho.hat)) }

## Sample work flow

Here is a simulated movement track from a continuous auto-correlated process of total duration 60 time units with four behavioral phases that switch at times T=10, 20 and 50 from, variously, higher or lower velocity and longer or shorter characteristic time scale of autocorrelation. The original data was simulated with time intervals of 0.01 (leading to a complete track with 6000 data points). I randomly sampled 200 points from the "true" movement track, such that the intervals between locations are random with mean interval 0.3 units. The track is illustrated at right, and the data are available here: Simp.csv (for those of you curious about what a "Simp" is ... I think it might be a Simulated Chimp, something like this). While some periods of more or less intensive movement are clear, it is difficult to easily pick out the change points in this data set. So we apply the BCPA.

Load the data

Simp <- read.csv("http://wiki.cbr.washington.edu/qerm/sites/qerm/images/a/a1/Simp.csv")

Here is what the beginning of the file looks like:

"T", "X", "Y" 0.18, 23.74, -9.06 0.22, 26.74, -11.32 0.74, 13.25, 2.94 0.88, 28.46, 26.63 1.4 , 96.77, 121.78 ...

X, Y, and T are the essential ingredients of any movement analysis!

In order to extract the estimated velocity and turning angle vectors, we are going to convert the X and Y to complex numbers. Why? Because it is *extremely* convenient to work with complex numbers when analyzing movement data. Hopefully this will be made clear as we proceed:

Z <- Simp$X + 1i*Simp$Y # note that this command is equivalent to: Z <- complex(re=Simp$X, im=Simp$Y)

Very briefly, complex numbers contain two compenents (the "real" and "imaginary" component, which is multiplied by ). A single complex number can therefore represent a point on a surface, or a vector with magnitude and direction. The real and imaginary parts of a complex number are obtained with "Re(Z)" and "Im(Z)". More special: the magnitude of a complex number is (called the "Modulus") is obtained via: "Mod(Z)", and the direction (referred to as the "Argument") is called via: "Arg(Z)". The ability to immediately access lengths and angles of 2D vectors, and to manipulate vectors (add, substract, rotate, etc.) is very powerful. Thankfully, R is very comfortable with complex numbers. For example:

plot(Z)

works just as well as

plot(Simp$X, Simp$Y)

but more compactly.

Anyways, the step vectors, step lengths, absolute orientations, turning angles and velocities are obtained quickly via:

# step vectors dZ <- diff(Z) # orientation of each step Phi <- Arg(dZ) # turning angles Theta <- diff(Phi)

Note that there one fewer turning angles than absolute orientations. That is because we do not know the initial orientation of the trajectory.

# step lengths S <- Mod(dZ) # time intervals dT <- diff(Simp$T) # Magnitude of linear velocity between points V <- S/dT # We don't have the turning angle for the first velocity measurement, so we throw it out. V <- V[-1]

Now we can create the Gaussian time series we want to analyze:

VC <- V*cos(Theta) VS <- V*sin(Theta)

Finally, we want a time stamp for each velocity element. This is the average of the times associated with the locations before and after a step. Thus:

T <- (Simp$T[-1] + Simp$T[-nrow(Simp)])/2 T <- T[-1]

Again, we eliminate the first step because it is not associated with a known turning angle.

Once all the functions above are loaded (e.g. copy/paste all the code from here: BCPA/All BCPA Functions, an analysis run is simply:

vc.sweep <- WindowSweep(VC, T, windowsize = 50, windowstep = 1) vc.output <- PartitionParameters(vc.sweep, T, windowsize=50, windowstep = 1)

The first function performs the windowsweep and returns all the possible breakpoints and their respective "model" (M0-M7) based on BIC, and the second function uses that output to produce estimates of the parameter values across the time series.

I wrote up a plotting function that draws the model output of a complete BCPA here: PlotBCPA. None of the "M0" changepoints are drawn. There is an additional "threshold" parameter that is the number of times that a change points must be identified before it is deemed "significant" (or worth drawing). This is basically just another filtering tweak for the complex output. Thus:

PlotBCPA(T, VC, vc.sweep, vc.output, threshold=10)

Yields this plot:

We see that the change points at 10, 30 and 50 were identified very robustly, and that there is an additional spurious change point identified around T=20. Though if we had chosen a threshold of "20" it would have disappeared (but that is, of course, a very cheaty thing to do considering we knew from the beginning where/when the true change points where located). Also, if we had set the "K" in the WindowSweep function to some lower number (e.g. 1 or 0.5), the analysis would have been far more conservative at choosing complex models for the changepoints.

Finally, the function here: BCPA/PathPlot, applied as follows:

PathPlot(T, Z, vc.sweep, vc.output, threshold = 10)

yields a plot that looks like this:

In this plot, the colors reflect the value of the and the width of the line is proportional to the estimated persistence velocity . The red "X"s are located where the "notable" change points occur.

All around, it seems the BCPA performed fairly well for modelling the Simp's movements.

## Some concluding thoughts

Clearly, there are a few degrees of freedom to fiddle with. Also, the output of the analysis is rather complex. More than anything, the BCPA allows an analyst to have a visual summary for the complexities of a behaviorally dynamic dataset. Because this is perhaps its main purpose, I feel a little funny about the several less mathematically rigorous, hand-wavy steps. There is room for the implementer to extract

There are several ways the BCPA could be improved. The model selection could be adjusted. For example, one colleague suggested using the breaks that lead to the largest change in BIC compared to the null-model, and using the dBIC to pick out which of the changes are most significant, rather than the number of times the breakpoint is selected. I've also noticed that the likelihood profile of a single changepoint within a window can be rather rough. In practice, a smoothing spline of the likelihood often gives more precise results, though I've never heard of anyone smoothing a likelihood to find its maximum. Finally, what I personally consider to be the most significant drawback of the BCPA as implemented here is that the parameter values themselves are somewhat difficult to interpret. The most satisfying development would be to estimate meaningful parameters, for example the mean true velocity and characteristic time scale of auto-correlation, directly from the data. This is the focus of ongoing research.

Anyways, thanks to anyone who visits this page, and best of luck with implementation! And please send all comments, questions, critiques, possible improvements, or (heaven forbid) identification of errors via e-mail to: **eliezg (at) uw.edu**.