Numerical Summary Calculator


$\bar{x}=$ $s=$
$\text{Range: }$$\text{ to }$
$\text{Class}$ $\text{Frequency}$
$-$
$-$
$-$
$-$
$-$
$-$
$-$
$-$
$-$
$-$
Example 1Example 2
Example 1Example 2
Example 1Example 2
Example 1Example 2
Example 1Example 2

Numerical summaries are used to describe characteristics of the data such as the location or variability of the values. The most important measure of central location is the mean, or average. If the mean is computed from a sample of data, it is referred to as the sample mean and if its computed from a population its referred to as the population mean. If there are extreme values in the data, the median is the preferred measure of central location. The median is the middle value if there are an odd number of values and the average of the two middle values if there are an even number of values. The mode is the most common value.

Sample Mean Population Mean
$\bar{x} = \dfrac{\sum x}{n}$ $\mu = \dfrac{\sum x}{{\color{Black}N}}$

Percentiles describe the location of a value relative to the rest of the data. The pth percentile is the value such that p% of the data is less than that value and (1-p)% of the data is greater than that value. Percentiles are calculated by first calculating an index using the formula i=$\left(\frac{p}{100}\right)$n. Then, if the index is not an integer, round up and take the value in that position. If the index is an integer, average the values in positions i and i+1. Quartiles are the name given to some common percentiles. Quartile 1 (Q1) is the 25th percentile, Quartile 2 (Q2) is the 50th percentile and Quartile 3 (Q3) is the 75th percentile. All three quartiles along with the minimum and maximum values are called a five-number summary.

Aside from knowing the location of the data values, it is also useful to know their spread, or variability. The simplest measure of variability is the range as it only uses two values. The formula is Range = Largest Value - Smallest Value. When there are extreme values in the data, the interquartile range is preferred to the range. The formula for the interquartile range is IQR = Q3 - Q1. Both the range and interquartile range are limited by the fact that they only use two values from the entire dataset. The variance is a measure of variability that uses all data values.

Sample Variance Population Variance
$ s^2 = \dfrac{\sum (x - \bar{x})^2}{n-1}$ $ \sigma^2 = \dfrac{\sum (x - \mu)^2}{N}$

The standard deviation is simply the square root of the variance. When computing the variance, the squared deviations about the mean, $(x - \bar{x})^2$, are used. So, the resulting value is not in the same units of the data, but rather the squared units of the data. By taking the square root, the standard deviation brings the result back to the same units as the data. The coefficient of variation describes how large the standard deviation is relative to the mean. The formula for this is $\left(\frac{\text{Standard deviation}}{\text{Mean}} \times 100 \right) $%.

A z-score is a measure of relative location, as it tells you where a data value is located relative to the mean. A z-score can be interpreted as the number of standard deviations a value is from the mean. One use of z-scores is that they can be used to identify extreme values (or outliers). An outlier can be identified as any value with a z-score greater than three in absolute value. Chebyshev's Theorem uses z-scores to determine what percentage of the data must be within a certain range. It says that at least (1-1/z2) of the values must be within z standard deviations of the mean.

z-Score
$ z = \dfrac{x - \bar{x}}{s} $

The Empirical Rule is similar to Chebyshev's Theorem in that it tells you the percentage of values within a certain range. However, the Empirical Rule only works for data having a bell-shaped distribution while Chebyshev's Theorem works for any distribution. It says that for data having a bell-shaped distribution, approximately 68% of the values are within one standard deviation of the mean. Approximately 95% of the values are within two standard deviations of the mean. Almost all of the values are within three standard deviations of the mean.

Sometimes, instead of summarizing just one variable, we're interested in summarizing two variables at the same time. The purpose of this is usually to determine the relationship between the variables. The covariance and correlation coefficiennt are measures of the linear relationship between two variables. When interpreting the covariance, it's important to focus on the sign and ignore the size of the value. For the correlation coefficient, both the sign and size are interpretable.

Sample Covariance Population Covariance
$ s_{xy} = \dfrac{\sum (x - \bar{x})(y - \bar{y})}{n-1}$ $ \sigma_{xy} = \dfrac{\sum (x - \mu_x)(y - \mu_y)}{N}$

An alternative to using numbers to summarize data is to use tables and graphs. Tables, such as a frequency distribution, and graphs, such as the bar chart, can be found using the Table and Graph Calculator. Summarizing data using methods such as tables, graphs and numbers is one of the foundations of statistical analysis. Another important building block of statistics is probability theory. Basic probabilities, such as those of the union and intersection, can be computed using the Basic Probability Calculator.