Numerical Summary Calculator


$\bar{x}=$ $s=$   $\text{ to }$
$\text{Class}$ $\text{Frequency}$
$-$
$-$
$-$
$-$
$-$
$-$
$-$
$-$
$-$
$-$
Example 1Example 2
Example 1Example 2
Example 1Example 2
Example 1Example 2
Example 1Example 2

How it Works:

Numerical summaries are used to describe characteristics of the data such as the location or variability of the values. The most important measure of central location is the mean, or average. If the mean is computed from a sample of data, it is referred to as the sample mean and if its computed from a population its referred to as the population mean. If there are extreme values in the data, the median is the preferred measure of central location. The median is the middle value if there are an odd number of values and the average of the two middle values if there are an even number of values. The mode is the most common value.

Sample Mean Population Mean
$\bar{x} = \dfrac{\sum x}{n}$ $\mu = \dfrac{\sum x}{N}$

Percentiles describe the location of a value relative to the rest of the data. The pth percentile is the value such that p% of the data is less than that value and (1-p)% of the data is greater than that value. Percentiles are calculated by first calculating an index using the formula i=$\left(\frac{p}{100}\right)$n. Then, if the index is not an integer, round up and take the value in that position. If the index is an integer, average the values in positions i and i+1. Quartiles are the name given to some common percentiles. Quartile 1 (Q1) is the 25th percentile, Quartile 2 (Q2) is the 50th percentile and Quartile 3 (Q3) is the 75th percentile. All three quartiles along with the minimum and maximum values are called a five-number summary.

Aside from knowing the location of the data values, it is also useful to know their spread, or variability. The simplest measure of variability is the range as it only uses two values. The formula is Range = Largest Value - Smallest Value. When there are extreme values in the data, the interquartile range is preferred to the range. The formula for the interquartile range is IQR = Q3 - Q1. Both the range and interquartile range are limited by the fact that they only use two values from the entire dataset. The variance is a measure of variability that uses all data values.

Sample Variance Population Variance
$ s^2 = \dfrac{\sum (x - \bar{x})^2}{n-1}$ $ \sigma^2 = \dfrac{\sum (x - \mu)^2}{N}$

The standard deviation is simply the square root of the variance. When computing the variance, the squared deviations about the mean, $(x - \bar{x})^2$, are used. So, the resulting value is not in the same units of the data, but rather the squared units of the data. By taking the square root, the standard deviation brings the result back to the same units as the data. The coefficient of variation describes how large the standard deviation is relative to the mean. The formula for this is $\left(\frac{\text{Standard deviation}}{\text{Mean}} \times 100 \right) $%.


A z-score is a measure of relative location, as it tells you where a data value is located relative to the mean. A z-score can be interpreted as the number of standard deviations a value is from the mean. One use of z-scores is that they can be used to identify extreme values (or outliers). An outlier can be identified as any value with a z-score greater than three in absolute value. Chebyshev's Theorem uses z-scores to determine what percentage of the data must be within a certain range. It says that at least (1-1/z2) of the values must be within z standard deviations of the mean.

z-Score
$ z = \dfrac{x - \bar{x}}{s} $

The Empirical Rule is similar to Chebyshev's Theorem in that it tells you the percentage of values within a certain range. However, the Empirical Rule only works for data having a bell-shaped distribution while Chebyshev's Theorem works for any distribution. It says that for data having a bell-shaped distribution, approximately 68% of the values are within one standard deviation of the mean. Approximately 95% of the values are within two standard deviations of the mean. Almost all of the values are within three standard deviations of the mean.

Sometimes, instead of summarizing just one variable, we're interested in summarizing two variables at the same time. The purpose of this is usually to determine the relationship between the variables. The covariance is a measure of the linear relationship between two variables. When interpreting the covariance, it's important to focus on the sign and ignore the size of the value. If the covariance is positive, it indicates a positive linear relationship between x and y. A negative value indicates a negative linear relationship. A value of zero indicates no linear relationship.

Sample Covariance Population Covariance
$ s_{xy} = \dfrac{\sum (x - \bar{x})(y - \bar{y})}{n-1}$ $ \sigma_{xy} = \dfrac{\sum (x - \mu_x)(y - \mu_y)}{N}$

The correlation coefficient, like the covariance, measures the linear relationship between two variables. However, both the sign and size of the correlation coefficient are interpretable. Like the covariance, a positive sign indicates a positive relationship, a negative sign indicates a negative relationship and zero indicates no relationship. A value close to one, whether negative or positive, indicates a strong relationship. a value close to zero indicates a weak relationship. A value of one indicates a perfect relationship.