$\text{First Class: }$ $ - $ |
$\text{First Class for X: }$ $ - $ |
$\text{First Class for Y: }$ $ - $ |
Tables and graphs are two out of three ways to summarize data, with the third being numerical summaries. A frequency distribution is a table that is used to summarize either quantitative or qualitative data. It is made up of two columns, with the first column being the different classes (or categories). The second column in a frequency distribution is the frequency, or the number of times that class occurs in the data. At the bottom of a frequency distribution, the total of the frequencies are usually given. The frequencies should sum to equal the sample size.
$ \text{Class} $ | $ \text{Frequency} $ |
$ \text{Pepsi} $ | $ 3 $ |
$ \text{Coke} $ | $ 5 $ |
$ \text{Sprite} $ | $ 2 $ |
$ \text{Total} $ | $ 10 $ |
There are two tables that we can derive from a frequency distribution: relative frequency distribution and percent frequency distribution. A relative frequency distribution provides the proportion of observations belonging to each class. Relative frequencies can be computed by dividin the frequencies by the total number of observations. A percent frequency distribution gives the percentage of observations beloning to each class. Percent frequencies can be calculated by multiplying the relative frequencies by 100%. Note that the relative frequencies will always sum to 1 and the percent frequencies will sum to 100%.
$ \text{Class} $ |
$ \text{Relative} $ $ \text{Frequency} $ |
$ \text{Percent} $ $\text{Frequency} $ |
$ \text{Pepsi} $ | $ .30 $ | $ 30 $ |
$ \text{Coke} $ | $ .50 $ | $ 50 $ |
$ \text{Sprite} $ | $ .20 $ | $ 20 $ |
$ \text{Total} $ | $ 1 $ | $ 100 $ |
We can use the tables above to create graphs that allow us to visualize the data. A bar chart is a graph that is created using a frequency distribution. In a bar chart, we put the classes on the horizontal axis and the frequencies on the vertical axis. Then we draw a bar for each class up to its frequency. A pie chart is a graph that is created using a frequency distribution. In a pie chart, the pie is divided up according to its percent frequencies.
The tables we use for quantitative data are a bit different than the ones we use for qualitative data. We can still use frequency, relative frequency and percent frequency distribution. However, we also have cumulative version of each of these table. That is, for quantiative data, we also have cumulative frequency, cumulative relative and cumulative percent frequency distributions. The graphs we use for quantitative data are different. For quantitative data, we use a histogram instead of a bar chart.
When creating frequency distributions for quantitative data, we run into the problem that the classes are not predefined. Given a set of numerical values, the first step is to separate them into different class. It is important that the classes are all the same size and that they don't overlap. Once the classes are defined for quantitative data, we can construct frequency, relative frequency and percent frequency distributions as we do for qualitative data. For quantitative data, we can also construct cumulative distributions. A cumulative frequency distribution gives the number of observations less than or equal to the upper limit of each class.
$ \text{Class} $ | $ \text{Frequency} $ |
$ 10-19 $ | $ 3 $ |
$ 20-29 $ | $ 4 $ |
$ 30-39 $ | $ 3 $ |
$ \text{Total} $ | $ 10 $ |
The graph we use to summarize quantitative data is different from the ones used to summarize qualitative data. For qualitative data, we use a bar chart or a pie chart. For quantitative data, we use a histogram. A histogram is similar to a bar chart in that we draw bars up to the frequency for each class. The only difference is that in a histogram, the bars are connected while in a bar chart, the bar are separated. The reason for this is to show the continuity in quantitative data and separation in qualitative data.
Sometimes we're interested in summarizing data for two variables. The purpose of this is usually to describe the relationship between the variables. The graph used to summarize data for two variables is known as a scatter diagram (or scatter plot). A scatter diagram will indicate whether there is a positive, negative or no relationship between the two variables. Note that a scatter diagram only works when the two variables are both quantitative. he table used to summarize data for two variables is called a crosstabulation. While a scatter diagram works for only quantitative variables, a crosstabulation works for any combination of quantitative and qualitative variables.
An alternative to using tables and graphs to summarize data is to use numerical summaries. Numerical summaries, such as the mean and standard deviation, can be found using the Numerical Summary Calculator. Summarizing data using methods such as tables, graphs and numbers is one of the foundations of statistical analysis. Another important building block of statistics is probability theory. Basic probabilities, such as those of the union and intersection, can be computed using the Basic Probability Calculator.