--- title: "An Introduction to postCP" author: "Malith Jayaweera" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{An Introduction to postCP} %\VignetteEngine{knitr::rmarkdown} \usepackage[utf8]{inputenc} --- This vignette documents the mechanism provided in postCP 2.0. The document will first explain the theoretical basis and it will move on to explain the usage of the package. ##change-point problem The postCP Package is based on the change point model $$ \boldsymbol y[I_k] \sim \boldsymbol X[I_k] \times \boldsymbol \beta^k $$ where $\textbf{y} \in \mathbb{R} ^{n\times d}$ is the response variable and $\textbf{X} \in \mathbb{R}^{n\times p}$ is the covariate matrix. The response variable is generated with $K>1$ segments according to any Generalized Linear Model (ex: Gaussian or binomial regressions). $I_k = [cp_{k−1} + 1, cp_k ]$ is the index of the individuals of the k th segment, and with the convention that $cp_0 = 0$ and that $cp_K = n$ ### Posterior change-point distribution For a given parameter $\boldsymbol \theta$, our objective is to compute the posterior change-point distribution: $$ \mathbb{P}(\textbf{cp} | \boldsymbol y,\boldsymbol X ; \boldsymbol \theta) $$ or, equivalently, the posterior segmentation distribution: $$ \mathbb{P}(\boldsymbol S | \boldsymbol y,\boldsymbol X ; \boldsymbol \theta) $$ The posterior distributions are calculated through a constrained HMM (Hidden Markov Model) coded in C++ with the aim of achieving efficiency in computations. Let's look at an example, ```{r, fig.width=5, fig.align='center'} require(postCP) #prepare data sigma=1.3 #Change point estimates bp=c(7,10) #Obtain data from longley dataset data = longley #Apply postcp function res = postcp(Employed ~ GNP + Armed.Forces,family=gaussian(),data=data,bp=c(7,10),sigma) #Plot the results plot.postCP(res,main="Posterior Change Point Probability Distribution") #Apply postcp function with maxFB=TRUE to obtain marginal distribution res = postcp(Employed ~ GNP + Armed.Forces,family=gaussian(),data=data,bp=c(7,10),sigma,maxFB=TRUE) #Plot the results plot.postCP(res,main="Posterior Change Point Probability Distribution") #plot.postCP also provides support to display the response variable easily plot.postCP(res,show.response = TRUE) ``` The following example shows a more noticeable change. ```{r, fig.width=5, fig.align='center'} require(postCP) data = longley plot(data$Armed.Forces) res = postcp(Armed.Forces ~ 1,family=gaussian(),data=data,bp=c(10),sigma=1) plot.postCP(res,main="Posterior Change Point Probability Distribution") ``` Some simulated examples are presented below for verification. ```{r, fig.width=5, fig.align='center'} ## change in the mean data = data.frame(signal=rnorm(150) + rep(c(1, 4), each=75), position=1:150) plot(data$signal) res = postcp(signal ~ 1 ,family=gaussian(),data=data,bp=c(40),sigma=1) plot.postCP(res,main="Posterior Change Point Probability Distribution") ## change in the slope position <- 1:150 data = data.frame(signal = rep(c(1, 2), c(50, 100))*position + rnorm(150), position=position) plot(data$signal) res = postcp(signal ~ position ,family=gaussian(),data=data,bp=c(30),sigma=1) plot.postCP(res,main="Posterior Change Point Probability Distribution") ``` Also the results can be observed for large simulated profiles. Consider the above mean model with number = 10000 ```{r, fig.width=5, fig.align='center'} ## change in the mean with number = 10000 number = 10000 data = data.frame(signal=rnorm(number) + rep(c(2, 5), each=number/2), position=1:number) plot(data$signal) res = postcp(signal ~ 1 ,family=gaussian(),data=data,bp=c(3000),sigma=1) plot.postCP(res,main="Posterior Change Point Probability Distribution") ```