Simple Regression Calculator with Steps

In simple linear regression, the starting point is the estimated regression equation: ŷ = b₀ + b₁x. It provides a mathematical relationship between the dependent variable (y) and the independent variable (x). Furthermore, it can be used to predict the value of y for a given value of x. There are two things we need to get the estimated regression equation: the slope (b₁) and the intercept (b₀). The formulas for the slope and intercept are derived from the least squares method: min Σ(y - ŷ)². The graph of the estimated regression equation is known as the estimated regression line.

After the estimated regression equation, the second most important aspect of simple linear regression is the coefficient of determination. The coefficient of determination, denoted r², provides a measure of goodness of fit for the estimated regression equation. Before we can find the r², we must find the values of the three sum of squares: Sum of Squares Total (SST), Sum of Squares Regression (SSR) and Sum of Squares Error (SSE). The relationship between them is given by SST = SSR + SSE. So, given the value of any two sum of squares, the third one can be easily found.

Sum of Squares
Error	$ \text{SSE} = \sum (y-\hat{y})^{\color{Black} 2} $
Regression	$ \text{SSR} = \sum (\hat{y}-\bar{y})^2 $
Total	$ \text{SST} = \sum (\hat{y}-\bar{y})^2 $

Now that we know the sum of squares, we can calculate the coefficient of determination. The r² is the ratio of the SSR to the SST. It takes a value between zero and one, with zero indicating the worst fit and one indicating a perfect fit. A perfect fit indicates all the points in a scatter diagram will lie on the estimated regression line. When interpreting the r², the first step is to convert its value to a percentage. Then it can be interpreted as the percentage of the variability in y explained by the estimated regression equation.

Coefficient of Determination

$ r^2 = \dfrac{\text{SSR}}{\text{SST}} $

The sample correlation coefficient can be calculated using the coefficient of determination, indicating a close relationship between regression and correlation. Regression can be thought of as a stronger version of regression. While correlation tells us the sign and strength of a relationship, regression quantifies the relationship to facilitate prediction. To get the sample correlation coefficient, simply take the square root of the coefficient of determination, with the sign being the same sign as the slope.

The next step in regression analysis is to test for significance. That is, we want to determine whether there is a statistically significant relationship between x and y. There are two ways of testing for significance, either with a t-Test or an F-Test. The first step in both tests is to calculate the Mean Square Error (MSE), which provides an estimate of the variance of the error. The square root of the MSE is called the Standard Error of Estimate and provides an estimate of the standard deviation of the error.

Mean Square Error	Standard Error of Estimate
$ \text{MSE} = \dfrac{\text{SSE}}{n-2}$	$ s = \sqrt{\text{MSE}} $

The t test is a hypothesis test about the true value of the slope, denoted $\beta_1$. The test statistic for this hypothesis test is calculated by dividing the estimated slope, b₁, by the estimated standard deviation of b₁, $ s_{b_1}$. The latter is calculated using the formula $ s_{b_1} = \frac{s}{\sqrt{\sum (x-\bar{x})^2}} $. The test statistic is then used to conduct the hypothesis, using a t distribution with n-2 degrees of freedom. In simple linear regression, the F test amounts to the same hypothesis test as the t test. The only difference will be the test statistic and the probability distribution used.

t Test	Test Statistic
$ H_0: \beta_1 = 0 $ $ H_a: \beta_1 \neq 0 $	$ t = \dfrac{b_1}{s_{b_1}} $

Confidence intervals and predictions intervals can be constructed around the estimated regression line. In both cases, the intervals will be narrowest near the mean of x and get wider the further they move from the mean. The differennce between them is that a confidence interval gives a range for the expected value of y. A prediction interval gives a range for the predicted value of y. Confidence intervals will be narrower than prediction intervals.

In a simple linear regression, there is only one independent variable (x). However, we may want to include more than one independent vartiable to improve the predictive power of our regression. This is known as multiple regression, which can be solved using our Multiple Regression Calculator. One of the most important parts of regression is testing for significance. The two tests for signficance, t test and F test, are examples of hypothesis tests. Hypothesis testing can be done using our Hypothesis Testing Calculator.