Numerical Summary Lessons


The mean, or average, is a measure of central location. It's the most commonly used measure of central location. There are other measures of central location, such as the median, but the mean is usually the preferred one. There are two means, depending on whether one is working with a sample or population of data. If the data is a sample, then the sample mean is used. If the data is a population, then the population is used. In statistics, we are usually working with a sample.

The median, like the mean, is also a measure of central location. Although the mean is more popular, the median is preferred in certain situations. It is the preferred measure of central location when there are extreme values, or outliers, in the data. That is because while the mean uses all the values from the data, the median is calculated in a way that is ignores the largest and smallest values. Calculation of the median depends on whether there are an odd or even number of values. If there are an odd number of values, the median is simply the middle values when the data is arranged in ascending order. If there are an even number of values, the median is the average of the two middle values when the data is arranged in ascending order.

The mode is the value that occurs with the highest frequency. Depending on the data, there may be no mode, one mode, two modes or more than two modes. If there are two modes, the data is said to be bimodal. If there are more than two modes, the data is said to be multimodal. The relative values of the mean, median and mode provide information about how the data are distributed. For example, in a normal, or bell-shaped distribution, the mean, median and mode are all equal. If the data is skewed right, then the mean will be larger than the median and mode. If the data is skewed left, then the mean be less than the median and mode.

Percentiles provide information about how the data is distributed when arranged in ascending order. The way percentiles work is that approximately p% of the data is less than or equal to the p-th percentile and (100-p)% of the data is greater than or equal to the p-th percentile. As an example, if a value is in the 60th percentile that means that approximately 60% of the observations are less than or equal that value and approximately 40% of the observations are greater than or equal to that value.

Quartiles is a word that refers to some commonly used percentiles. There are three quartiles: the first, second and third. The first quartile corresponds to the 25th percentile, the second quartile corresponds to the 50th percentile and the third quartile corresponds to the fourth quartile. So, if you are asked to find the first quartile, simply take the steps needed to calculate the 25th percentile. If asked to find the second quartile, simply take the steps to calculate the 50th percentile. If asked to find the third quartile, simply take the steps needed to calculate the 75th percentile.

The range is the simplest measure of variability or spread. This is because it only uses two values from the entire dataset: the largest and smallest. The range is calculated by taking the difference between these two values. Because of the way the range is calculated, its value is very susceptive to extreme values or outliers. If there is an extremely large or extremely small value (or both) in the data, then it will have a big impact on the value of the range. So despite the benefit of its simplicity, the range is not a good measure of variability when there are outliers in the data.

The interquartile range (IQR) is similar to the range but it performs better as a measure of variability when there are outliers in the data. Like the range, the interquartile range only uses two values. These two values are the first quartile (25th percentile) and third quartile (75th percentile). The IQR is the difference between these two values. Since the interquartile range ignores the lowest and highest 25% of the data, it is not as susceptible to outliers as the range is. So it is preferred to the range as a measure of variability when there are outliers.

The variance is a measure of variability that uses all of the data. Note that both the range and interquartile range only use two values. The variance is calculated by taking the average of the squared deviations about the mean. That is, the difference between each value and the mean is taken. Then these values are squared so that we have all positive values. Finally, these values are averaged. The sample variance is calculated slightly differently than the population variance. In the sample variance, the squared deviation are divided by one less than the number of elements.

The standard deviation is a measure of variability that, unlike the variance, is in the same units as the original data. Since the variance is calculated by squaring the deviation about the mean, its value is in the squared units of the original data. The standard deviation is the square root of the variance. So the population standard deviation is the square root of the population variance. Similarly, the population standard deviation is the square root of the population variance.