Last updated: 2019-11-15
Checks: 7 0
Knit directory: STA_463_563_Fall2019/
This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20190905)
was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.
Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish
or wflow_git_commit
). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:
Ignored files:
Ignored: .DS_Store
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: analysis/.DS_Store
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote
), click on the hyperlinks in the table below to view them.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 1c951d9 | dleelab | 2019-11-15 | created |
data0120=read.table("http://www.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Chapter%20%201%20Data%20Sets/CH01PR20.txt")
colnames(data0120)=c("minutes","copiers_number")
fit=lm(minutes~copiers_number,data=data0120)
alphaF=1-0.9
n=nrow(data0120)
W=sqrt(2*qf((1-alphaF),2,(n-2)))#Or W=sqrt(2*qf((1-alphaF),length(fit$coefficients),fit$df.residual))
Yh_hat=predict(fit,newdata=data.frame(copiers_number=c(3,5,7)),data=data0120)
MSE=sum((fit$residuals)^2)/(n-2)
xbar=mean(data0120$copiers_number)
Sxx=sum((data0120$copiers_number-xbar)^2)
s_Yh_hat=sqrt(MSE*(1/n+(c(3,5,7)-xbar)^2/Sxx))
simulCI=cbind(Yh_hat,Yh_hat-W*s_Yh_hat,Yh_hat+W*s_Yh_hat)
colnames(simulCI)=c("fitted","lower","upper")
rownames(simulCI)=c("3","5","7")
simulCI
fitted lower upper
3 44.52559 40.83265 48.21853
5 74.59608 71.66417 77.52800
7 104.66658 101.11278 108.22038
The 90% simultaneous CI using W-H method is as above.
g=3
alpha=alphaF/g
predict(fit,new=data.frame(copiers_number=c(3,5,7)),interval = "conf",level=1-alpha,data=data0120)
fit lwr upr
1 44.52559 40.84285 48.20832
2 74.59608 71.67227 77.51989
3 104.66658 101.12260 108.21056
The 90% simultaneous CI using Bonferroni method is as above.
From result above, which simultaneous confidence interval calculation method do you prefer? Why? (Therefore, in practice, before compute the simultaneous CI or PI, you would compare the multiplier first, and use whichever method that yields a smaller multiplier)
Compare the results above, I would prefer the Bonferroni method. Because it provides a shorter confidence interval. Note: Therefore, before we compute the simultaneous CIs (PIs), we will compare the multipliers of different methods, then do the calculation. Here, in this example, multiplier for Bonferroni method is slightly smaller than the W-H method.
t=qt((1-alphaF/g/2),(n-2))
W
[1] 2.204725
t
[1] 2.198632
View the simple linear regression fit, and recall the discussion we had in class, adjust your model and fit a regression through the origin model (RTTO) instead. Obtain the estimated regression function.
summary(fit)
Call:
lm(formula = minutes ~ copiers_number, data = data0120)
Residuals:
Min 1Q Median 3Q Max
-22.7723 -3.7371 0.3334 6.3334 15.4039
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.5802 2.8039 -0.207 0.837
copiers_number 15.0352 0.4831 31.123 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.914 on 43 degrees of freedom
Multiple R-squared: 0.9575, Adjusted R-squared: 0.9565
F-statistic: 968.7 on 1 and 43 DF, p-value: < 2.2e-16
fit2=lm(minutes~copiers_number-1,data=data0120)
summary(fit2)
Call:
lm(formula = minutes ~ copiers_number - 1, data = data0120)
Residuals:
Min 1Q Median 3Q Max
-22.4723 -3.6306 0.2111 6.3694 15.2639
Coefficients:
Estimate Std. Error t value Pr(>|t|)
copiers_number 14.9472 0.2264 66.01 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.816 on 44 degrees of freedom
Multiple R-squared: 0.99, Adjusted R-squared: 0.9898
F-statistic: 4358 on 1 and 44 DF, p-value: < 2.2e-16
The estimated regression function is \(\hat{Y_i}=14.9472X_i\), (or estimated minutes=14.9472* copiers_number)
Check \(\sum_{i=1}^{n}e_i\neq0\) and \(\sum_{i=1}^{n}e_iX_i=0\).
res=fit2$residuals
sum(res)
[1] -5.862797
sum(data0120$copiers_number*res)
[1] -1.461054e-13
Based on the calculation above, \(\sum_{i=1}^{n}e_i=-5.86\) is not equal to 0 and \(\sum_{i=1}^{n}e_iX_i=-1.46*10^{-13}\), almost equal to 0.
Estimate \(\beta_1\) with a 95% confidence interval.
confint(fit2,,level=0.95)
2.5 % 97.5 %
copiers_number 14.4909 15.40356
The 95% confidence interval for \(\beta_1\) is (14.49,15.40).
Predict the service time on a call in which six copiers are serviced. Use a 98% confidence level.
predict(fit2,new=data.frame(copiers_number=6),interval="prediction",level=0.98,data=data0120)
fit lwr upr
1 89.68338 68.1491 111.2177
The 98% prediction interval for the service time on a call in which 6 copiers are serviced is (68.15,111.22)
Estimate the number of minutes spent when there are 2,3,4,5 copiers to be serviced. Use a 95% family confidence coefficient and use Bonferroni method.
alphaF=1-0.95
g=4
alpha=alphaF/g
predict(fit2,new=data.frame(copiers_number=c(2,3,4,5)),interval = "pred",level=1-alpha,data=data0120)
fit lwr upr
1 29.89446 6.90242 52.88650
2 44.84169 21.81186 67.87151
3 59.78892 36.70630 82.87154
4 74.73615 51.58583 97.88647
The 95% simultaneous PI for the number of minutes spent when there’re 2,3,4,5 copiers to be serviced is as above.
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.6
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] workflowr_1.4.0 Rcpp_1.0.2 digest_0.6.20 rprojroot_1.3-2
[5] backports_1.1.4 git2r_0.26.1 magrittr_1.5 evaluate_0.14
[9] stringi_1.4.3 fs_1.3.1 whisker_0.3-2 rmarkdown_1.15
[13] tools_3.6.1 stringr_1.4.0 glue_1.3.1 xfun_0.9
[17] yaml_2.2.0 compiler_3.6.1 htmltools_0.3.6 knitr_1.24