Probability Theory



The foundation of probability theory is set theory. Basics concepts in set theory are complement, union and intersection. The complement of an event A is the event containing all points that are not in A. The union of events A and B is the event containing all points in A or B. The intersection of events A and B is the event containing all points in A and B. So the term "not" refers to complement, "or" refers to union, while "and" refers to intersection. In probability theory, it is often useful to split an event into disjoint unions.

Union: Intersection:
A∪B = {s: s∈A or s∈B} A∩B = {s: s∈A and s∈B}

There are several laws of set theory that are commonly used in probability theory. A few of these are associativity, commutativity, distributive and DeMorgan's. Associativity says that when you're taking the union or intersection of two sets, it doesn't matter which one you take first. Commutativity says that if you're taking the union or intersection of three sets, it doesn't matter which two you take the union or intersection of first. The distributive laws and DeMorgan's laws are shown below.

Distributive Laws: DeMorgan's Laws:
A∪(B∩C)=(A∪B)∩(A∪C) (A∪B)ᶜ = Aᶜ∩Bᶜ
A∩(B∪C)=(A∩B)∪(A∩C) (A∩B)ᶜ = Aᶜ∪Bᶜ

Many of the properties of probability can be derived from a set of just three axioms. If these three axioms are satisfied for a function P, then P is called a probability function. The probability function, P, is defined on the set of subsets of the sample space, denoted B. The first axiom is that the probability of of any event is at least zero. The second is that the probability of the sample space is equal to 1. The third axiom is that for an infinite sequence of disjoint events, the probability of their union is equal to the sum of their probabilities.

Axioms of Probability:
1. P(A) ≥ 0
2. P(S) = 1
3. P(∪A) = ∑P(A)

There are a few basic probabilities in probability that are frequently useful. The first property is that the probability of any event is equal to one minus the probability of its complement. This follow from the second axiom and the fact that the union of any event and its complement is equal to the entire sample space. The second property is that the probability of the empty set is equal to zero, which follows from the first property and the fact that the complement of the empty set is the sample space. The rest of the properties of probability can be derived in a similar fashion.

Properties of Probability:
P(A) = 1 - P(Aᶜ)
P(∅) = 0
0 ≤ P(A) ≤ 1

Along with these basic properties, there are also some more advanced properties which are often quite useful. One of these is called continuity of probability, which says that the limit of the probability of a sequence of events is equal to the probability of the limit. In other words, limit and probability can be interchanged. Another useful advanced property is known as Boole's inequality, which says that the probability of the union of events is less than or equal to the sum of their probabilities. Note that Boole's inequality makes a similar yet different statement than the third axiom. This is because third axioms applies to only disjoint sets while Boole's inequality applies to any sets.

Advanced Properties:
lim(P(∪A)) = P(lim(∪A))
P(∪A) ≤ ∑P(A)

It is often the case that the probability of each outcome of an experiment can be assumed to be equally likely. Examples of this include tossing a coin, rolling a die and picking a card from a deck. When it is reasonable to assume that outcomes are equally likely, probabilities can be assigned simply by determining how many outcomes are possible. The number of outcomes possible in an event, denoted |A|, can be calculated using combinatorial analysis, or counting rules. Important counting rules include the mn-rule (or multiplication rule), the extended mn-rule, permutations and combinations.

Equilikely Probability:
P(A) = |A| / |S|

Two events, A and B, are said to be independent if the probability of their intersection is equal to the product of their probabilities. That is, A and B are independent if P(A∩B) = P(A)P(B). Then if A and B are independent, any combination involving their complements are also independent. That is, if A and B are independent, then Aᶜ and B, A and Bᶜ and Aᶜ and Bᶜ are independent as well. So if P(A∩B) = P(A)P(B), we also have P(Aᶜ∩B) = P(Aᶜ)P(B), P(A∩Bᶜ) = P(A)P(Bᶜ) and P(Aᶜ∩Bᶜ) = P(Aᶜ)P(Bᶜ).

Independence:
P(A∩B) = P(A)P(B)

References:
Casella, George, and Roger L. Berger. "Statistical Inference." (2002)
Hoel, Paul Gerhard, Sidney C. Port, and Charles Joel Stone. "Introduction to Probability Theory." (1971)
Hogg, Robert V., Joseph W. McKean, and Allen T. Craig. "Introduction to Mathematical Statistics." (2019)
Wasserman, Larry. "All of Statistics: A Concise Course in Statistical Inference." (2013)



$$ \text{Left Side:} $$

$$ \text{Right Side:} $$


$$ \text{Left Side:} $$


$$ \text{Right Side:} $$