Selecting a sample in R is done by simply using the sample()
function. There are two arguments that go into this function: the data, denoted by x, and the sample size, denoted by size. The default method of sampling with this function is sampling without replacement. In order to use sampling with replacement, add the argument replacement = "TRUE". Once the sample is chose, we can do some simple inference such as point estimation by using functions we're already familiar with. We can use the mean()
function, the sample mean, to get the point estimate of the population mean. In a similar way, we can use the sd()
function.
> sample(x, n) # Sampling
> sample(x, n, replacement = TRUE) # Sampling with Replacement
> mean(x) # Sample Mean
> sd(x) # Sample Standard Deviation
> mean(condition) # Sample Proportion
The first step in calculating a confidence interval is setting 1 - $\alpha$ equal to the confidence cofficient, the decimal form of the confidence level. Then, solving for $\alpha/2$, we get $\alpha/2$ is equal to (1 - confidence coefficient)/2. For the sigma known case, the value of $z_{\alpha/2}$ can be found by using $\alpha/2$ in the qnorm()
function. Set the argument lower.tail to FALSE because $z_{\alpha/2}$ is the z-value with an area of $\alpha/2$ in the upper tail. Then the margin of error can be found manually by using the formula. The confidence interval can be found by adding and substracting the margin of error from the mean.
> ahalf <- (1 - conf) / 2 # α/2
> z <- qnorm(ahalf, lower.tail = FALSE) # z_α/2
> z * sigma / sqrt(length(x)) # Margin of Error
> mean(x) - z * sigma / sqrt(length(x)) # Lower Bound of Confidence Interval
> mean(x) + z * sigma / sqrt(length(x)) # Upper Bound of Confidence Interval
The steps for finding confidence intervals in the sigma unknown is nearly identical to the steps for the sigma known case. The only difference is that you use the t-distribution instead of the z-distribution. So, instead of using the qnorm() function, use the qt() function.
> ahalf <- (1 - conf) / 2 # α/2
> t <- qt(ahalf, df, lower.tail = FALSE) # t_α/2
> t * sd(x) / sqrt(length(x)) # Margin of Error
> mean(x) - t * sd(x) / sqrt(length(x)) # Lower Bound of Confidence Interval
> mean(x) + t * sd(x) / sqrt(length(x)) # Upper Bound of Confidence Interval
To calculate a confidence interval for the population mean when the population standard deviation is know, we will need four things. First, we need the point estimate of the population mean, which was shown above. Second, we need the
> z <- (mean(x) - mu) / (sigma / sqrt(length(x))) # Test Statistic
> pnorm(z) # Lower Tail p-Value
> pnorm(z, lower.tail = FALSE) # Upper Tail p-Value
> 2 * pnorm(z) # Two-Tailed p-Value (Negative z)
> 2 * pnorm(z, lower.tail = FALSE) # Two-Tailed p-Value (Positive z)
To calculate a confidence interval for the population mean when the population standard deviation is know, we will need four things. First, we need the point estimate of the population mean, which was shown above. Second, we need the
> qnorm(alpha) # Lower Tail Critical Value
> qnorm(alpha, lower.tail = FALSE) # Upper Tail Critical Value
> qnorm(alpha / 2) # Two-Tailed Critical Value (Lower)
> qnorm(alpha / 2, lower.tail = FALSE) # Two-Tailed Critical Value (Upper)
Hypothesis tests in the sigma unknown case are very similar to the sigma known case. The only difference is you use the t-distribution instead of the z-distribution. So, instead of using the pnorm() function, use the pt()
function. Instead of using the qnorm() function, use the qt()
function. Also, since the population standard deviation is unknown, the sample standard deviation must be used. It can be calculated using the sd() function. As well, you need to add an argument to the functions since the t-distribution requires a degrees of freedom.
> t <- (mean(x) - mu) / (sd(x) / sqrt(length(x))) # Test Statistic
> pt(t, df) # Lower Tail p-Value
> pt(t, df, lower.tail = FALSE) # Upper Tail p-Value
> 2 * pt(t, df) # Two-Tailed p-Value (Negative t)
> 2 * pt(t, df, lower.tail = FALSE) # Two-Tailed p-Value (Positive t)
To calculate a confidence interval for the population mean when the population standard deviation is know, we will need four things. First, we need the point estimate of the population mean, which was shown above. Second, we need the
> qt(alpha, df) # Lower Tail Critical Value
> qt(alpha, df, lower.tail = FALSE) # Upper Tail Critical Value
> qt(alpha / 2, df) # Two-Tailed Critical Value (Lower)
> qt(alpha / 2, df, lower.tail = FALSE) # Two-Tailed Critical Value (Upper)