Due: 8 pm, Feb. 2
Each of you receive 500 numbers, denoted as $X_1,\dots,X_{500}$, all of which follow a normal distribution with an unknown mean and an unknown variance. Please read following questions carefully. Note that not all numbers will be used!!!
The goals include finding a point estimator and a confidence interval for $\mu$ with good accuracy.
x
and save them as a new vector y
.y=x[1:10]
Recall that sum( (y-mean(y))^2 )/(length(y)-1)
gives your the sample variance. So does var(y)
. Check both commands to see if they match.
To find the percentile point of a standard normal distribution, do not use the SOA normal table or Table 4 in the text book. Instead, you can use command z=qnorm(0.995)
, z=qnorm(0.975)
, z=qnorm(0.95)
, etc, to find the percentile point. This request is to help us grade your answer numerically. For example, qnorm(0.975)
gives 1.959964, which corresponds to 1.96 in the normal tables. If you use 1.96 instead of 1.959964, your answer may be mistakenly graded as incorrect. Type in ?qnorm
in R for more information on the function qnorm()
.
To find the percentile point of a t distribution, do not use Table 5 in the text book. Instead, you can use command t=qt(0.995,9)
, z=qt(0.975,9)
, z=qt(0.95,9)
, etc, to find the percentile point. The first argument is the left-tail (not right-tail) probability and the second argument is the degrees of freedom (which is 9 here). For example, qt(0.995,9)
gives 3.249836, which corresponds to 3.250 in Table 5, row 9, last column. Compare qt(0.995,9)
, qt(0.99,9)
, qt(0.975,9)
, or qt(0.95,9)
with row 9 of Table 5. Type in ?qt
in R for more information on the function qt()
.
x[1:n]
. Here $n$ is the minimum sample size that you obtained in the last question.setwd("C:/448wd") dat = read.csv('data_3.txt',header=FALSE) dat <- as.matrix(dat) x <- dat[1,] pilot = x[1:10] ans1 = var(pilot) ans2a = mean(pilot) - qnorm(0.95) * sqrt(ans1/10) ans2b = mean(pilot) + qnorm(0.95) * sqrt(ans1/10) ans3a = mean(pilot) - qt(0.95,9) * sqrt(ans1/10) ans3b = mean(pilot) + qt(0.95,9) * sqrt(ans1/10) nsize = ceiling( ( qnorm(0.995)/0.5*sqrt(ans1) )^2 ) # note: ceiling takes a single numeric argument x and returns a numeric vector # containing the smallest integers not less than the corresponding elements of x. ans4 <- nsize*12 newdata = x[1:nsize] ans5 = mean(newdata) ans6a = mean(newdata) + qnorm(0.95) * sd(newdata)/sqrt(nsize) ans6b = mean(newdata) + qnorm(0.95) * sd(newdata)/sqrt(nsize) print( c(ans1,ans2a,ans2b,ans3a,ans3b,ans4,ans5,ans6a,ans6b) )
Round to at least 3 decimal places unless otherwise stated.
Each of you receive 225 numbers, denoted as $X_1,\dots,X_{n}$, where $n=225$. It is known that $X_i\sim Unif(0,\theta)$ independently with $\theta$ unknown.
sd(x)
to find the sample standard deviation, but you may want to use sqrt(sum( (x-mean(x) )^2)/(length(x)-1))
to help you familiarize with the calculation (they should give you the same answer). Moreover, the “2-standard-error bound” you find here should be reasonably close to that in the last question. Round to 5 decimal places for this question.setwd("C:/448wd") dat = read.csv('data_2.txt',header=FALSE) dat <- as.matrix(dat) x <- dat[1,] ans1 = mean(x) ans2 = 2*mean(x) ans3 = 2*2*mean(x)/sqrt(12)/sqrt(length(x)) ans4 = 2*sd(x)/sqrt(length(x)) ans5 = max(x) ans6 = max(x)*(length(x)+1)/length(x) ans7a = max(x)/(0.975^(1/length(x))) ans7b = max(x)/(0.025^(1/length(x))) print( c(ans1,ans2,ans3,ans4,ans5,ans6,ans7a,ans7b) )
Each of you are given a different data set of 64 observations drawn from an unknown distribution. Please submit your answers to https://docs.google.com/a/binghamton.edu/forms/d/16jhF5eUpPY7pXXmzaAHAgiFZY7S8JypBYKcEfSUcBjE/viewform
Note that you need to login your Bmail account to submit the answers. Please do this by 7 pm on Feb. 2.
The following R code may be able to help you get started. Copy each line to the console of R and press “enter”.
##### Assume that you have a Windows machine. First create a folder called "448wd" under C drive. ##### I trust that you can do this on your own. If not, search a solution on Google or Youtube. ##### Set the R working directory setwd("C:/448wd") ### Read the data file. Make sure that your data file has been copied to the folder. dat = read.csv('data_1.txt',header=FALSE) ### The variable named "dat" that you just read into R is a data frame. ### We need to convert it to a matrix dat <- as.matrix(dat) ##dat now is a 1x64 matrix (1 row and 64 columns) x <- dat[1,] ## Take the first row of this matrix as your sample # Try the following. mean(x) # sample mean median(x) # sample median max(x) # maximum min(x) # minimum y = (x > 2) y # We can see that Y is a logical vector of TRUE and FALSE. ### We can operate directly on a logical vector with the convention that TRUE = 1 and FALSE = 0. For example mean(y) sum(y) ## Ok. You are ready to answer the questions.
setwd("C:/448wd") dat = read.csv('data_1.txt',header=FALSE) dat <- as.matrix(dat) x <- dat[1,] ans1 <- mean(x) ans2 <- max(x) ans3 <- min(x) ans4 <- mean( x > 4 ) print( c(ans1,ans2,ans3,ans4) )