Frequency distributions can be made in R using the `table()`

function. There is no functions that directly give you relative frequency distributions, but it can easily be calculated by dividing the frequency distribution by the november of observations. The number of observations in a dataset can be computed by using the `length()`

function. To get the percent frequency distribution, simply use the same code that was used to get relative frequency distribution and multiply it by 100. A pie chart can be made by simply using the `pie()`

function and a bar chart can be created by using the `barplot()`

function. The input for both of these functions is the frequency distribution.

> table(x) # Frequency Distribution

> table(x) / length(x) # Relative Frequency Distribution

> table(x) / length(x) * 100 # Percent Frequency Distribution

> pie(table(x))

> barplot(table(x))

Creating frequency distributions for quantitative data is tricker as you have to define the classes, or groupings. One quick way of defining the classes is by using the ones R chooses when creating a histogram. A histogram can be created in R using the `hist()`

function. Use the `seq()`

function to define the classes and the `cut()`

function to place each observation into one of hte classes. Once the classes are determined, a frequency distribution for quantitative data can be created by using the same function as was used for qualitative data. A cumulative frequency distribution can be created by using the `cumsum()`

function.

> hist(x) # Histogram

> seq(from, to, by) # Define Classes

> cut(x, breaks) # Place Data in Classes

> cumsum(x) # Cumulative Frequencies

> table(x)

A crosstabulation can be created by using the same function that was used for a frequency distribution, `table()`

. However, instead of only inputing one variable, input two variables. Row percentages can be calculated by dividing the crosstabulation by the sum of the rows, using the `rowSums()`

function, and multiplying by 100. Similarly, column percentages can be calculated by dividing the crosstabulation by the sum of the columns, using the `rowSums()`

function, and multiplying by 100. A scatter diagram can be created by using the `plot()`

function and a trend line can be created using the `abline()`

function.

> table(x, y) # Crosstabulation

> table(x, y) / rowSums(table(x, y)) * 100 # Row Percentages

> table(x, y) / colSums(table(x, y)) * 100 # Column Percentages

> plot(x, y) # Scatter Diagram

> abline(lm(y ~ x)) # Trend Line

Many basic numerical summaries can be computed in R by simple using their name. For example, the function for calculating the mean is simply `mean()`

. Similarly, the function for calculating the median is `median()`

. Computing the mode is more difficult as there is no function for directly calculating the mode. To get the mode, use the `table()`

function and identify which value has the highest frequency. The function for calculating percentiles is `quantile()`

. While the previous functions only require one argument, the data, this function requires an additional argument. The second argument in this function is the desired percentile, expressed as a decimal. For example, if you want the 30th percentile, you set the second argument as .30. Quartiles can be computed using the same function by inputing the appropriate value for the second argument: either .25, .50 or .75.

> mean(x)

> median(x)

> table(x) # Mode

> quantile(x, probs) # Percentiles

> quantile(x, probs = c(.25, .50, .75)) # Quartiles

Although there is a function called `range()`

, this does not calculate the range of data. Instead, it gives you the maximum and minimum values of the data. Since the range is simply the difference of these two values, you can apply the `diff()`

function afterwards to compute the range. Alternative, you can use the `max()`

and `min()`

functions. To compute the interquartile range, use the function `IQR()`

. It is important to note that all the letters in this function are capitalized as R is a case-sensitive programming language. To compute the variance, simply use the function `var(x)`

. Similarly, to compute the standard deviation, use the function `sd()`

. There is no function for calculating the coefficient of variation in R, so it must be done manually using the functions for mean and standard deviation.

> diff(range(x)) # Range

> IQR(x) # Interquartile Range

> var(x) # Variance

> sd(x) # Standard Deviation

> sd(x) / mean(x) * 100 # Coefficient of Variation

Calculating skewness requires the use of a package as it is not part of base R. Install the "moments" package using the `install.packages()`

function and then load it by using the `library()`

function. Note that quotations are required around the package name in the former function but not the latter. Then simply use the `skewness()`

function to calculate the skewness. There is no function in R for calculating z-scores so they need to be calculated manually using the functions for mean and standard deviation. To calculate the covariance, use the function `cov()`

. Similarly, to calculate the correlation coefficient, use the function `cor()`

.

> library(moments) # Load Package

> skewness(x)

> (x - mean(x)) / sd(x) # z-Scores

> cov(x, y) # Covariance

> cor(x, y) # Correlation Coefficient