This vignette documents the mechanism provided in postCP 2.0. The document will first explain the theoretical basis and it will move on to explain the usage of the package.
The postCP Package is based on the change point model \[ \boldsymbol y[I_k] \sim \boldsymbol X[I_k] \times \boldsymbol \beta^k \] where \(\textbf{y} \in \mathbb{R} ^{n\times d}\) is the response variable and \(\textbf{X} \in \mathbb{R}^{n\times p}\) is the covariate matrix. The response variable is generated with \(K>1\) segments according to any Generalized Linear Model (ex: Gaussian or binomial regressions).
\(I_k = [cp_{k−1} + 1, cp_k ]\) is the index of the individuals of the k th segment, and with the convention that \(cp_0 = 0\) and that \(cp_K = n\)
For a given parameter \(\boldsymbol \theta\), our objective is to compute the posterior change-point distribution: \[ \mathbb{P}(\textbf{cp} | \boldsymbol y,\boldsymbol X ; \boldsymbol \theta) \] or, equivalently, the posterior segmentation distribution: \[ \mathbb{P}(\boldsymbol S | \boldsymbol y,\boldsymbol X ; \boldsymbol \theta) \]
The posterior distributions are calculated through a constrained HMM (Hidden Markov Model) coded in C++ with the aim of achieving efficiency in computations.
Let’s look at an example,
require(postCP)
## Loading required package: postCP
## Loading required package: MASS
## Loading required package: Segmentor3IsBack
## Segmentor3IsBack v1.8 Loaded
#prepare data
sigma=1.3
#Change point estimates
bp=c(7,10)
#Obtain data from longley dataset
data = longley
#Apply postcp function
res = postcp(Employed ~ GNP + Armed.Forces,family=gaussian(),data=data,bp=c(7,10),sigma)
#Plot the results
plot.postCP(res,main="Posterior Change Point Probability Distribution")
#Apply postcp function with maxFB=TRUE to obtain marginal distribution
res = postcp(Employed ~ GNP + Armed.Forces,family=gaussian(),data=data,bp=c(7,10),sigma,maxFB=TRUE)
#Plot the results
plot.postCP(res,main="Posterior Change Point Probability Distribution")
#plot.postCP also provides support to display the response variable easily
plot.postCP(res,show.response = TRUE)
The following example shows a more noticeable change.
require(postCP)
data = longley
plot(data$Armed.Forces)
res = postcp(Armed.Forces ~ 1,family=gaussian(),data=data,bp=c(10),sigma=1)
plot.postCP(res,main="Posterior Change Point Probability Distribution")
Some simulated examples are presented below for verification.
## change in the mean
data = data.frame(signal=rnorm(150) + rep(c(1, 4), each=75), position=1:150)
plot(data$signal)
res = postcp(signal ~ 1 ,family=gaussian(),data=data,bp=c(40),sigma=1)
plot.postCP(res,main="Posterior Change Point Probability Distribution")
## change in the slope
position <- 1:150
data = data.frame(signal = rep(c(1, 2), c(50, 100))*position + rnorm(150), position=position)
plot(data$signal)
res = postcp(signal ~ position ,family=gaussian(),data=data,bp=c(30),sigma=1)
plot.postCP(res,main="Posterior Change Point Probability Distribution")
Also the results can be observed for large simulated profiles. Consider the above mean model with number = 10000
## change in the mean with number = 10000
number = 10000
data = data.frame(signal=rnorm(number) + rep(c(2, 5), each=number/2), position=1:number)
plot(data$signal)
res = postcp(signal ~ 1 ,family=gaussian(),data=data,bp=c(3000),sigma=1)
plot.postCP(res,main="Posterior Change Point Probability Distribution")