An Introduction to postCP

Malith Jayaweera

2016-08-19

This vignette documents the mechanism provided in postCP 2.0. The document will first explain the theoretical basis and it will move on to explain the usage of the package.

change-point problem

The postCP Package is based on the change point model \[ \boldsymbol y[I_k] \sim \boldsymbol X[I_k] \times \boldsymbol \beta^k \] where \(\textbf{y} \in \mathbb{R} ^{n\times d}\) is the response variable and \(\textbf{X} \in \mathbb{R}^{n\times p}\) is the covariate matrix. The response variable is generated with \(K>1\) segments according to any Generalized Linear Model (ex: Gaussian or binomial regressions).

\(I_k = [cp_{k−1} + 1, cp_k ]\) is the index of the individuals of the k th segment, and with the convention that \(cp_0 = 0\) and that \(cp_K = n\)

Posterior change-point distribution

For a given parameter \(\boldsymbol \theta\), our objective is to compute the posterior change-point distribution: \[ \mathbb{P}(\textbf{cp} | \boldsymbol y,\boldsymbol X ; \boldsymbol \theta) \] or, equivalently, the posterior segmentation distribution: \[ \mathbb{P}(\boldsymbol S | \boldsymbol y,\boldsymbol X ; \boldsymbol \theta) \]

The posterior distributions are calculated through a constrained HMM (Hidden Markov Model) coded in C++ with the aim of achieving efficiency in computations.

Let’s look at an example,

require(postCP)
## Loading required package: postCP
## Loading required package: MASS
## Loading required package: Segmentor3IsBack
## Segmentor3IsBack v1.8 Loaded
#prepare data
sigma=1.3
#Change point estimates
bp=c(7,10)
#Obtain data from longley dataset
data = longley
#Apply postcp function
res = postcp(Employed ~ GNP + Armed.Forces,family=gaussian(),data=data,bp=c(7,10),sigma)

#Plot the results
plot.postCP(res,main="Posterior Change Point Probability Distribution")

#Apply postcp function with maxFB=TRUE to obtain marginal distribution
res = postcp(Employed ~ GNP + Armed.Forces,family=gaussian(),data=data,bp=c(7,10),sigma,maxFB=TRUE)

#Plot the results
plot.postCP(res,main="Posterior Change Point Probability Distribution")

#plot.postCP also provides support to display the response variable easily
plot.postCP(res,show.response = TRUE)

The following example shows a more noticeable change.

require(postCP)
data = longley
plot(data$Armed.Forces)

res = postcp(Armed.Forces ~ 1,family=gaussian(),data=data,bp=c(10),sigma=1)
plot.postCP(res,main="Posterior Change Point Probability Distribution")

Some simulated examples are presented below for verification.

## change in the mean
data = data.frame(signal=rnorm(150) + rep(c(1, 4), each=75), position=1:150)
plot(data$signal)

res = postcp(signal ~ 1 ,family=gaussian(),data=data,bp=c(40),sigma=1)
plot.postCP(res,main="Posterior Change Point Probability Distribution")

## change in the slope
position <- 1:150
data = data.frame(signal = rep(c(1, 2), c(50, 100))*position + rnorm(150), position=position)
plot(data$signal)

res = postcp(signal ~ position ,family=gaussian(),data=data,bp=c(30),sigma=1)
plot.postCP(res,main="Posterior Change Point Probability Distribution")

Also the results can be observed for large simulated profiles. Consider the above mean model with number = 10000

## change in the mean with number = 10000
number = 10000
data = data.frame(signal=rnorm(number) + rep(c(2, 5), each=number/2), position=1:number)
plot(data$signal)

res = postcp(signal ~ 1 ,family=gaussian(),data=data,bp=c(3000),sigma=1)
plot.postCP(res,main="Posterior Change Point Probability Distribution")