Simultaneous CI/PIs and RTTO Models

Simultaneous CI/PIs
Regression through the origin.

Last updated: 2019-11-15

Checks: 7 0

Knit directory: STA_463_563_Fall2019/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Environment: empty

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

Seed: set.seed(20190905)

The command set.seed(20190905) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Session information: recorded

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Cache: none

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

File paths: relative

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Repository version: 1c951d9

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
Rmd	1c951d9	dleelab	2019-11-15	created

Simultaneous CI/PIs

Fit a simple linear regression model, estimate the expected number of minutes spent when there are 3,5 and 7 copiers to be serviced, respectively. Use a 90% family confidence coefficient, calculate the simultaneous confidence interval, based on Working-Hotelling procedure and Bonferroni respectively.

(a) W-H Method.

data0120=read.table("http://www.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Chapter%20%201%20Data%20Sets/CH01PR20.txt")
colnames(data0120)=c("minutes","copiers_number")
fit=lm(minutes~copiers_number,data=data0120)

alphaF=1-0.9
n=nrow(data0120)
W=sqrt(2*qf((1-alphaF),2,(n-2)))#Or W=sqrt(2*qf((1-alphaF),length(fit$coefficients),fit$df.residual))
Yh_hat=predict(fit,newdata=data.frame(copiers_number=c(3,5,7)),data=data0120)
MSE=sum((fit$residuals)^2)/(n-2)
xbar=mean(data0120$copiers_number)
Sxx=sum((data0120$copiers_number-xbar)^2)
s_Yh_hat=sqrt(MSE*(1/n+(c(3,5,7)-xbar)^2/Sxx))
simulCI=cbind(Yh_hat,Yh_hat-W*s_Yh_hat,Yh_hat+W*s_Yh_hat)
colnames(simulCI)=c("fitted","lower","upper")
rownames(simulCI)=c("3","5","7")
simulCI

     fitted     lower     upper
3  44.52559  40.83265  48.21853
5  74.59608  71.66417  77.52800
7 104.66658 101.11278 108.22038

The 90% simultaneous CI using W-H method is as above.

(b) Bonferroni method.

g=3
alpha=alphaF/g
predict(fit,new=data.frame(copiers_number=c(3,5,7)),interval = "conf",level=1-alpha,data=data0120)

        fit       lwr       upr
1  44.52559  40.84285  48.20832
2  74.59608  71.67227  77.51989
3 104.66658 101.12260 108.21056

The 90% simultaneous CI using Bonferroni method is as above.

Discussion.

From result above, which simultaneous confidence interval calculation method do you prefer? Why? (Therefore, in practice, before compute the simultaneous CI or PI, you would compare the multiplier first, and use whichever method that yields a smaller multiplier)

Compare the results above, I would prefer the Bonferroni method. Because it provides a shorter confidence interval. Note: Therefore, before we compute the simultaneous CIs (PIs), we will compare the multipliers of different methods, then do the calculation. Here, in this example, multiplier for Bonferroni method is slightly smaller than the W-H method.

t=qt((1-alphaF/g/2),(n-2))
W

[1] 2.204725

[1] 2.198632

Regression through the origin.

View the simple linear regression fit, and recall the discussion we had in class, adjust your model and fit a regression through the origin model (RTTO) instead. Obtain the estimated regression function.

summary(fit)


Call:
lm(formula = minutes ~ copiers_number, data = data0120)

Residuals:
     Min       1Q   Median       3Q      Max 
-22.7723  -3.7371   0.3334   6.3334  15.4039 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -0.5802     2.8039  -0.207    0.837    
copiers_number  15.0352     0.4831  31.123   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.914 on 43 degrees of freedom
Multiple R-squared:  0.9575,    Adjusted R-squared:  0.9565 
F-statistic: 968.7 on 1 and 43 DF,  p-value: < 2.2e-16

fit2=lm(minutes~copiers_number-1,data=data0120)
summary(fit2)


Call:
lm(formula = minutes ~ copiers_number - 1, data = data0120)

Residuals:
     Min       1Q   Median       3Q      Max 
-22.4723  -3.6306   0.2111   6.3694  15.2639 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
copiers_number  14.9472     0.2264   66.01   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.816 on 44 degrees of freedom
Multiple R-squared:   0.99, Adjusted R-squared:  0.9898 
F-statistic:  4358 on 1 and 44 DF,  p-value: < 2.2e-16

The estimated regression function is \(\hat{Y_i}=14.9472X_i\), (or estimated minutes=14.9472* copiers_number)

Check the residuals.

Check \(\sum_{i=1}^{n}e_i\neq0\) and \(\sum_{i=1}^{n}e_iX_i=0\).

res=fit2$residuals
sum(res)

[1] -5.862797

sum(data0120$copiers_number*res)

[1] -1.461054e-13

Based on the calculation above, \(\sum_{i=1}^{n}e_i=-5.86\) is not equal to 0 and \(\sum_{i=1}^{n}e_iX_i=-1.46*10^{-13}\), almost equal to 0.

Confidence interval.

Estimate \(\beta_1\) with a 95% confidence interval.

confint(fit2,,level=0.95)

                 2.5 %   97.5 %
copiers_number 14.4909 15.40356

The 95% confidence interval for \(\beta_1\) is (14.49,15.40).

Prediction interval.

Predict the service time on a call in which six copiers are serviced. Use a 98% confidence level.

predict(fit2,new=data.frame(copiers_number=6),interval="prediction",level=0.98,data=data0120)

       fit     lwr      upr
1 89.68338 68.1491 111.2177

The 98% prediction interval for the service time on a call in which 6 copiers are serviced is (68.15,111.22)

Simultaneous prediction interval.

Estimate the number of minutes spent when there are 2,3,4,5 copiers to be serviced. Use a 95% family confidence coefficient and use Bonferroni method.

alphaF=1-0.95
g=4
alpha=alphaF/g
predict(fit2,new=data.frame(copiers_number=c(2,3,4,5)),interval = "pred",level=1-alpha,data=data0120)

       fit      lwr      upr
1 29.89446  6.90242 52.88650
2 44.84169 21.81186 67.87151
3 59.78892 36.70630 82.87154
4 74.73615 51.58583 97.88647

The 95% simultaneous PI for the number of minutes spent when there’re 2,3,4,5 copiers to be serviced is as above.

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] workflowr_1.4.0 Rcpp_1.0.2      digest_0.6.20   rprojroot_1.3-2
 [5] backports_1.1.4 git2r_0.26.1    magrittr_1.5    evaluate_0.14  
 [9] stringi_1.4.3   fs_1.3.1        whisker_0.3-2   rmarkdown_1.15 
[13] tools_3.6.1     stringr_1.4.0   glue_1.3.1      xfun_0.9       
[17] yaml_2.2.0      compiler_3.6.1  htmltools_0.3.6 knitr_1.24