Written by Luka Kerr on August 2, 2019
Introduction
Statistics is the science of the collection, processing, analysis, and interpretation of data.
Population
- Population: total collection of elements
- Individuals (or units): elements of a population
- Variable: a characteristic, which could be quantitative or qualitative
Sample
- Sample: subset of the population which is effectively observed
- Data: the measurements that are collected over the sample
- Random sampling: the only sampling scheme that guarantees the sample to be representative of the population
Descriptive Statistics
Data should be presented in ways that facilitate their interpretation and subsequent analysis.
Types Of Variables
- Categorical (or qualitative)
- Take a value that is one of several possible categories
- E.g. gender, color, status
- Types:
- Ordinal
- There is a clear ordering of the categories
- E.g. salary class
- Nominal
- There is no intrinsic ordering to the categories
- E.g. gender, color
- Ordinal
- Numerical (or quantitative)
- Naturally measured as a number for which meaningful arithmetic operations make sense
- E.g. height, age, temperature
- Types:
- Discrete
- The variable can only take a finite number of distinct values
- E.g. number of courses enrolled in, number of people in a household
- Continuous
- The variable can take any value in an entire interval on the real line
- E.g. height, weight, temperature
- Discrete
Descriptive Measures
Measures Of Centre
The Sample Mean
\[\bar{x} = \dfrac{1}{n} \sum_{i = 1}^n x_i\]The Sample Median
The median, usually denoted $m$ (or $\overset{\sim}{x}$), is the value which divides the data into two equal parts, half below the median and half above.
\[\begin{cases} \text{even}(n) & \dfrac{1}{2} (x_{\frac{n}{2}} + x_{\frac{n}{2} + 1}) \\ \text{odd}(n) & x_{\frac{n + 1}{2}} \end{cases}\]Quartiles & Percentiles
The sample $(100 \times p)$th percentile (or quantile) is the value such that $100 \times p$% of the observations are below this value (or equal to it), and the other $100 \times (1 - p)$% are above this value (or equal to it).
Five Number Summary
\[\{ x_1, q_1, m, q_3, x_n \}\]where
- $x_1$ is the 0th percentile
- $q_1$ is the 25th percentile
- $m = q_2$ is the 50th percentile
- $q_3$ is the 75th percentile
- $x_n$ is the 100th percentile
Measure Of Variability
The Sample Variance
\[\begin{array}{rcl} s^2 & = & \dfrac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})^2 \\ & \equiv & \dfrac{1}{n - 1} (\sum_{i = 1}^n x_1^2 - n \bar{x}^2) \end{array}\]The number $n - 1$ is the number of degrees of freedom for $s^2$.
The Sample Standard Deviation
\[s = \sqrt{s^2} = \sqrt{\dfrac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})^2}\]Interquartile Range
\[iqr = q_3 - q_1\]Elements Of Probability
A random experiment (or chance experiment) is any experiment whose exact outcome cannot be predicted with certainty.
The set of all possible outcomes of a random experiment is called the sample space of the experiment. It is usually denoted $S$.
An event $E$ is a subset of the sample space of a random experiment.
Events
- Union: $E_1 \cup E_2$, event either $E_1$ or $E_2$ occurs
- Intersection: $E_1 \cap E_2$, event bot $E_1$ and $E_2$ occur
- Complement: $E^c$, event $E$ does not occur ($= E^{\prime}$)
- Implication: $E_1 \subseteq E_2$, $E_1$ implies $E_2$
- Mutual Exclusion: $E_1 \cap E_2 = \emptyset$, $E_1$ and $E_2$ cannot occur together
Probabilities
$P(E)$ denotes how likely $E$ is to occur.
- $0 \le P(E) \le 1$ for any event $E$
- $P(S) = 1$
- For any (infinite) sequence of mutually exclusive events $E_1, E_2, \dots$ $P(\bigcup_{i = 1}^\infty E_i) = \sum_{i = 1}^\infty P(E_i)$
- $P(E^c) = 1 - P(E)$
- $P(\emptyset) = 0$
- $E_1 \subseteq E_2 \implies P(E_1) \le P(E_2)$
- $P(E_1 \cup E_2) = P(E_1) + P(E_2) - P(E_1 \cap E_2)$
Combinatorics Rules
- Multiplication Rule: $n_1 \times n_2 \times \cdots \times n_k$
- Permutations: $P_n = n \times (n - 1) \times (n - 2) \times \cdots \times 2 \times 1 = n!$
- Combinations: $C_r^n = \dfrac{n!}{r!(n - r)!}$
Conditional Probabilities
The conditional probability of $E_1$, conditional on $E_2$, is defined as
\[P(E_1 | E_2) = \dfrac{P(E_1 \cap E_2)}{P(E_2)}.\]- $P(E_1 | S) = P(E_1)$
- $P(E_1 | E_1) = 1$
- $P(E_1 | E_2) = 1$ if $E_2 \subseteq E_1$
- $P(E_1 | E_2) \times P(E_2) = P(E_1 \cap E_2)$
Bayes’ First Rule: If $P(E_1) > 0$ and $P(E_2) > 0$ then $P(E_1 | E_2) = P(E_2 | E_1) \times \dfrac{P(E_1)}{P(E_2)}$.
Multiplicative Law Of Probability: $P(\bigcup_{i = 1}^n E_i) = P(E_1) \times P(E_2 | E_1) \times P(E_3 | E_1 \cap E_2) \times \cdots \times P(E_n | \bigcap_{i = 1}^{n - 1} E_i)$.
Independent Events
Two events $E_1$ and $E_2$ are said to be independent if and only if
\[P(E_1 \cap E_2) = P(E_1) \times P(E_2).\]The events $E_1$, $E_2, \dots, E_n$ are said to be independent iff for every subset ${ i_1, i_2, \dots, i_r : r \le n }$ of ${ 1, 2, \dots, n }$,
\[P(\bigcap_{j = 1}^r E_{i_j}) = \prod_{j = 1}^r P(E_{i_j})\]Bayes’ Formula
A sequence of events $E_1, E_2, \dots, E_n$ such that
- $S = \bigcap_{i = 1}^n E_i$ and
- $E_i \cap E_j = \emptyset$ for all $i \ne j$
is called a partition of $S$.
Law Of Total Probability
Given a partition ${ E_1, E_2, \dots, E_n }$ of $S$ such that $P(E_i) > 0$ for all $i$, the probability of any event $A$ can be written
\[P(A) = \sum_{i = 1}^n P(A | E_i) \times P(E_i).\]Bayes’ Second Rule
Given a partition ${ E_1, E_2, \dots, E_n }$ of $S$ such that $P(E_i) > 0$ for all $i$, we have, for any event $A$ such that $P(A) > 0$,
\[P(E_i | A) = \dfrac{P(A | E_j) P(E_i)}{\sum_{j = 1}^n P(A | E_j) P(E)j)}\]Random Variables
A random variable is a real-valued function defined over the sample space
\[\begin{array}{rcl} X : S & \to & \mathbb{R} \\ \omega & \to & X(\omega) \end{array}\]$S_x$ is the domain of variation of $X$, that is the set of possible values taken by $X$.
Cumulative Distribution Function
A random variable is often described by its cumulative distribution function (cdf) (or just distribution).
The cdf of the random variable $X$ is defined for any real number $x$ by $F(x) = P(X \le x)$.
Discrete Random Variables
A random variable is said to be discrete if it can only assume a finite (or at most countably infinite) number of values.
The probability mass function (pmf) of a discrete random variable $X$ is defined for any real number $x$, by $p(x) = P(X = x)$.
Bernoulli Random Variable
- Only assumes two values, $S_x = { 0, 1 }$
- Its pmf is given by \(\begin{array}{rcl} p(1) & = & \pi \\ p(0) & = & 1 - \pi \end{array}\) for some value $\pi$
Continuous Random Variables
A random variable $X$ is said to be continuous if there exists a nonnegative function $f(x)$ defined for all real $x \in \mathbb{R}$ such that for any set $B$ of real numbers, $P(X \in B) = \int_B f(x) dx$.
Discrete Vs Continuous Random Variables
Discrete | Continuous | |
---|---|---|
Domain | $S_x = { x_1, x_2, \dots }$ | |
PMF | $\forall x \in \mathbb{R}, p(x) = P(X = x) \ge 0$ | $p(x) \equiv 0$ |
Doesn’t exist | $\forall x \in \mathbb{R}, f(x) = F^\prime(x) \ge 0$ |
Expectation Of A Random Variable
The expectation or the mean of a random variable $X$, denoted $E(X)$, or $\mu$, is given by
Discrete Random Variable: $\mu = E(X) = \sum_{x \in S_x} x p(x)$
Continuous Random Variable: $\mu = E(X) = \int_{S_x} x f(x) \ dx$
Expectation Of A Function Of A Random Variable
Discrete Random Variable: $E(g(X)) = \sum_{x \in S_x} g(x) p(x)$
Continuous Random Variable: $E(g(X)) = \int_{S_x} g(x) f(x) \ dx$
Variance Of A Random Variable
The variance of a random variable $X$, usually denoted by $Var(X)$, or $\sigma^2$, is defined by $Var(X) = E((X - \mu)^2)$.
Discrete Random Variable: $\sigma^2 = Var(X) = \sum_{x \in S_x} (x - \mu)^2 p(x)$
Continuous Random Variable: $\sigma^2 = Var(X) = \int_{S_x} (x - \mu)^2 f(x) \ dx$
An alternative formula for $Var(X)$ is $\sigma^2 = Var(X) = E(X^2) - (E(X))^2 = E(X^2) - \mu^2$, where $E(X^2) = \sum_{x \in S_x} x^2 p(x)$.
Standardisation
Suppose you have a random variable $X$ with mean $\mu$ and variance $\sigma^2$. Then, the associated standardised random variable, often denoted $Z$, is given by
\[Z = \dfrac{X - \mu}{\sigma}\]that is, $Z = \dfrac{1}{\sigma} X - \dfrac{\mu}{\sigma}$.
A standardised random variable has always mean 0 and variance 1.
Joint Distribution Function
The joint cumulative distribution function of $X$ and $Y$ is given by
\[F_{XY}(x, y) = P(X \le x, Y \le y) \quad \forall (x, y) \in \mathbb{R} \times \mathbb{R}\]where $P(X \le x, Y \le y) \equiv (X \le x) \cap (Y \le y)$.
Discrete
If $X$ and $Y$ are both discrete, the joint probability mass function is defined by
\[p_{XY}(x, y) = P(X = x, Y = y).\]Continuous
$X$ and $Y$ are said to be jointly continuous if there exists a function $f_{XY}(x, y) : \mathbb{R} \times \mathbb{R} \to \mathbb{R}^+$ such that for any sets $A$ and $B$ of real numbers
\[P(X \in A, Y \in B) = \int_A \int_B f_{XY}(x, y) \ dy \ dx.\]Expectation Of A Function Of Two Random Variables
For any function $g : \mathbb{R} \times \mathbb{R} \to \mathbb{R}$, the expectation of $g(X, Y)$ is given by
Discrete Random Variable: $E(g(X, Y)) = \sum_{x \in S_x} \sum_{y \in S_y} g(x, y) p_{XY}(x, y)$
Continuous Random Variable: $G(g(X, Y)) = \int_{S_x} \int_{S_y} g(x, y) f_{XY}(x, y) \ dy \ dx$
Independent Random Variables
The random variables $X$ and $Y$ are said to be independent if, for all $(x, y) \in \mathbb{R} \times \mathbb{R}$,
\[P(X \le x, Y \le y) = P(X \le x) \times P(Y \le y).\]If $X$ and $Y$ are independent, then for any functions $h$ and $g$,
\[E(h(X) g(Y)) = E(h(X)) \times E(g(Y)).\]Covariance Of Two Random Variables
The covariance of two random variables $X$ and $Y$ is defined by
\[Cov(X, Y) = E((X - E(X))(Y - E(Y))).\]Special Random Variables
Binomial Random Variables
The Binomial Distribution
- $S = \{ \text{Success}, \text{Failure} \}$
- We observe a Success with probability $\pi$
- $n$ independent repetitions of this experiment are performed
Define $X =$ number of successes observed over the $n$ repetitions. We say that $X$ is a binomial random variable with parameters $n$ and $\pi$:
\[X \sim Bin(n, \pi)\]The binomial probability mass function is given by $p(x) = C^n_x \pi^x (1 - \pi)^{n - x}$ for $n \in S_x$.
The Bernoulli Distribution
If $n = 1$ then there is a Bernoulli distribution.
\[X \sim Bern(\pi)\]The Bernoulli probability mass function is given by
\[p(x) = \begin{cases} x = 0 & 1 - \pi \\ x = 1 & \pi \\ \text{otherwise} & 0 \end{cases}\]Mean & Variance Of The Binomial Distribution
If $X \sim Bin(n, \pi)$,
\[E(X) = n \pi, \qquad Var(X) = n \pi (1 - \pi).\]Poisson Random Variables
The Poisson Distribution
Define $X =$ number of occurrences. We say that $X$ is a Poisson random variable with parameter $\lambda$, i.e. $X \sim \mathcal{P}(\lambda)$, if
\[p(x) = e^{-\lambda} \dfrac{\lambda^x}{x!}\]for $x \in S_x = \{ 0, 1, 2, \dots \}$.
The Poisson distribution is suitable for modelling the number of occurrences of a random phenomenon satisfying some assumptions of continuity, stationarity, independence and non-simultaneity.
Mean & Variance Of The Poisson Distribution
If $X \sim \mathcal{P}(\lambda)$,
\[E(X) = \lambda, \qquad Var(X) = \lambda.\]Uniform Random Variables
The Uniform Distribution
A random variable is said to be uniformly distributed over an interval $[\alpha, \beta]$, i.e. $X \sim U_{[\alpha, \beta]}$ if its probability density function is given by
\[f(x) = \begin{cases} x \in [\alpha, \beta] & \dfrac{1}{\beta - \alpha} \\ \text{otherwise} & 0 \end{cases}\]Mean & Variance Of The Uniform Distribution
If $X \sim U_{[\alpha, \beta]}$,
\[E(X) = \dfrac{\alpha + \beta}{2}, \qquad Var(X) = \dfrac{(\beta - \alpha)^2}{12}.\]Exponential Random Variables
The Exponential Distribution
A random variable is said to be an Exponential random variable with parameter $\mu (\mu > 0)$, i.e. $X \sim Exp(\mu)$ if its probability density function is given by
\[f(x) = \begin{cases} x \ge 0 & \frac{1}{\mu} e^{-\frac{x}{\mu}} \\ \text{otherwise} & 0 \end{cases}\]This distribution is often useful for representing random amounts of time.
Mean & Variance Of The Exponential Distribution
If $X \sim Exp(\mu)$,
\[E(X) = \mu, \qquad Var(X) = \mu^2.\]Normal Random Variables
The Normal Distribution
A random variable is said to be normally distributed with parameters $\mu$ and $\sigma (\sigma > 0)$, i.e. $X \sim \mathcal{N}(\mu, \sigma)$ if its probability density function is given by
\[f(x) = \dfrac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}\]Mean & Variance Of The Normal Distribution
If $X \sim \mathcal{N}(\mu, \sigma)$,
\[E(X) = \mu, \qquad Var(X) = \sigma^2\]The Standard Normal Distribution
The Standard Normal distribution is the Normal distribution with $\mu = 1$ and $\sigma = 1$.
Standardisation
If $X \sim \mathcal{N}(\mu, \sigma)$, then
\[Z = \dfrac{X - \mu}{\sigma} \sim \mathcal{N}(0, 1).\]Quantiles
A value $z_\alpha$ is called the quantile of level $\alpha$ of the standard normal distribution.
Common quantiles include $0.95, 0.975$ and $0.995$.
Sampling Distributions
Random Sampling
The set of observations $X_1, X_2, \dots, X_n$ constitutes a random sample if
- The $X_i$’s are independent random variables
- Every $X_i$ has the same probability distribution
Estimation & Sampling Distribution
- A numerical measure calculated from the sample is called a statistic
An estimator of the unknown parameter of interest $\theta$ must be a statistic, i.e. a function of the sample $\hat{\Theta} = h(X_1, X_2, \dots, X_n)$.
Inference Concerning A Mean
Point Estimation
An estimator is a random variable, which has its mean, its variance and its probability distribution, known as the sampling distribution.
Properties
Unbiasedness
An estimator $\hat{\Theta}$ of $\theta$ is said to be unbiased iff its mean is equal to $\theta$, whatever the value of $\theta$, i.e. $E(\hat{\Theta}) = \theta$.
If an estimator is not unbiased, then the difference $E(\hat{\Theta}) - \theta$ is called the bias of the estimator.
The property of unbiasedness is one of the most desirable properties of an estimator.
Efficiency
A logical principle of estimation is to choose the unbiased estimator that has minimum variance. Such an estimator is said to be efficient among the unbiased estimators.
Consistency
An estimator is consistent if the probability that the estimator is ‘close’ to $\theta$ increases to one as the sample size increases.
Standard Error
The standard error of an estimator $\hat{\Theta}$ is its standard deviation $sd(\hat{\Theta})$.
Interval Estimation
Instead of giving a point estimate $\hat{\theta}$, that is a single value that we know not to be equal to $\theta$ anyway, we give an interval in which we are very confident to find the true value.
Confidence Intervals
A confidence interval is an interval for which we can assert with a reasonable degree of certainty (or confidence) that it will contain the true value of the population parameter under consideration.
A confidence interval of level $100 \times (1 - \alpha)$% means that we are $100 \times (1 - \alpha)$% confident that the true value of the parameter is included into the interval ($\alpha$ is a real number in $[0, 1]$).
The most frequently used confidence levels are 90%, 95% and 99%.
if $\hat{x}$ is the sample mean of an observed random sample of size $n$ from a normal distribution with known variance $\sigma^2$, a confidence interval of level $100 \times (1 - \alpha)$ for $\mu$ is given by
\[\left[ \hat{x} - z_{1 - \alpha/2} \dfrac{\sigma}{\sqrt{n}}, \hat{x} + z_{1 - \alpha/2} \dfrac{\sigma}{\sqrt{n}} \right].\]The Central Limit Theorem
The sum of a large number of independent random variables has a distribution that is approximately normal.
If $X_1, X_2, \dots, X_n$ a random sample taken from a population with mean $\mu$ and finite variance $\sigma^2$, and if $\hat{X}$ the sample mean, then the limiting distribution of $\sqrt{n} \frac{\hat{X} - \mu}{\sigma}$ as $n \to \infty$, is the standard normal distribution.
The standard normal distribution provides a reasonable approximation to the distribution of $\sqrt{n} \frac{\hat{X} - \mu}{\sigma}$ when $n$ is large. This is usually denoted $\sqrt{n} \frac{\hat{X} - \mu}{\sigma} \overset{a}{\sim} \mathcal{N} (0, 1)$.
Confidence Interval On The Mean Of A Normal Distribution, Variance Unknown
Sample Variance
From the random sample $X_1, X_2, \dots, X_n$ we have a natural estimator of the unknown $\sigma^2$, the sample variance:
\[S^2 = \dfrac{1}{n - 1} \sum_{i = 1}^n (X_i - \hat{X})^2.\]A natural procedure is thus to replace $\sigma$ with the sample standard deviation $S$, and to work with the random variable
\[T = \sqrt{n} \dfrac{\hat{X} - \mu}{S}.\]The Student’s $t$-distribution
In a normal population, the exact distribution of $T$ is the so-called $t$-distribution with $n - 1$ degrees of freedom: $T \sim t_{n - 1}$.
A random variable, say $T$ , is said to follow the Student’s $t$-distribution with $v$ degrees of freedom, i.e. $T \sim t_v$.
Mean & Variance Of The Student’s $t$-distribution
If $T \sim t_v$,
\[E(T) = 0, \qquad Var(T) = \dfrac{v}{v - 2}\]for $v > 2$.
Quantiles
Let $t_{v; \alpha}$ be the value such that $P(T > t_{v; \alpha}) = 1 - \alpha$ for $T \sim t_v$. Like the standard normal distribution the symmetry of any $t_v$-distribution implies that $t_{v; 1 - \alpha} = -t_{v; \alpha}$. $t_{v; \alpha}$ is also referred to as $t$ critical value.
$t$-confidence Interval
If $\hat{x}$ and $s$ are the sample mean and sample standard deviation of an observed random sample of size $n$ from a normal distribution, a confidence interval of level $100 \times (1 - \alpha)$% for $\mu$ is given by
\[\left[ \hat{x} - t_{n - 1; 1 - \alpha / 2} \dfrac{s}{\sqrt{n}}, \hat{x} + t_{n - 1; 1 - \alpha / 2} \dfrac{s}{\sqrt{n}} \right].\]This confidence interval is sometimes called $t$-confidence interval.
Large Sample Confidence Interval
If $\hat{x}$ and $s$ are the sample mean and standard deviation of an observed random sample of large size $n$ from any distribution, an approximate confidence interval of level $100 \times (1 - \alpha)$% for $\mu$ is
\[\left[ \hat{x} - z_{1 - \alpha / 2} \dfrac{s}{\sqrt{n}}, \hat{x} + z_{1 - \alpha / 2} \dfrac{s}{\sqrt{n}} \right]\]Summary
- Is population normal?
- Is $\sigma$ known?
- Use an exact $z$-confidence interval: \(\left[ \hat{x} - z_{1 - \alpha / 2} \dfrac{\sigma}{\sqrt{n}}, \hat{x} + t_{1 - \alpha / 2} \dfrac{\sigma}{\sqrt{n}} \right]\)
- Else
- Use an exact $t$-confidence interval: \(\left[ \hat{x} - t_{n - 1; 1 - \alpha / 2} \dfrac{s}{\sqrt{n}}, \hat{x} + t_{n - 1; 1 - \alpha / 2} \dfrac{s}{\sqrt{n}} \right]\)
- Is $\sigma$ known?
- Else
- Use an approximate large sample confidence interval: \(\left[ \hat{x} - z_{1 - \alpha / 2} \dfrac{s}{\sqrt{n}}, \hat{x} + z_{1 - \alpha / 2} \dfrac{s}{\sqrt{n}} \right]\)
Prediction Intervals
The predictor of $X_{n + 1}$, say $X^\ast$, should be a statistic.
We desire the predictor to have expected prediction error equal to 0:
\[E(X_{n + 1} - X^\ast) = 0 \quad \iff \quad E(X^\ast) = \mu.\]Confidence Interval On A Proportion
We would like to learn about $\pi$, the proportion of the population that has a characteristic of interest. In this situation, the random variable to study is
\[X = \begin{cases} 1 & \text{if the individual has the characteristic of interest} \\ 0 & \text{if not} \end{cases}\]The number $Y$ of individuals of the sample with the characteristic is
\[Y = \sum_{i = 1}^n X_i \sim Bin(n, \pi)\]and the sample proportion is
\[\hat{P} = \dfrac{Y}{n}.\]If $\hat{p}$ is the sample proportion in an observed random sample of size $n$, an approximate two-sided confidence interval of level $100 - (1 - \alpha)$% for $\pi$ is given by
\[\left[ \hat{p} - z_{1 - \alpha / 2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}, \hat{p} + z_{1 - \alpha / 2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \right]\]Tests Of Hypotheses
Many problems in engineering require that we decide which of two competing statements about some parameters are true. The statements are called hypotheses.
The null hypothesis is usually denoted $H_0$. It is the default hypothesis we will assume is true unless we have enough evidence to compel us to change our minds.
The alternative hypothesis is usually denoted $H_a$.
Null Hypothesis
The value $\mu_0$ of the population parameter specified in the null hypothesis $H_0 : \mu = \mu_0$ is usually determined in one of three ways:
- It may result from past experience or knowledge of the process
- It may be determined from some theory or model
- It may result from external considerations
Alternative Hypothesis
The alternative hypothesis can essentially be of two types.
A two-sided alternative is when $H_a$ is of the form $H_a : \mu \ne \mu_0$. This is the exact negation of the null hypothesis.
A one-sided alternative is when we may wish to favour a given direction for the alternative and is of the form $H_a : \mu < \mu_0$ or $H_a : \mu > \mu_0$.
Hypothesis Testing
The numerical value which is computed from a sample and used to decide between $H_0$ and $H_a$ is called the test statistic.
The values for which we reject $H_0$ are called the rejection regions for the test, the limiting values are called the critical values.
Errors
- Rejcting $H_0$ when it is true: type I error
- Failing to reject $H_0$ when it is false: type II error
The probability of type I error error is usually denoted $\alpha$, and the probability of type II error is usually denoted $\beta$. The power of the test is $1 - \beta$.
$p$-value
The $p$-value is the smallest level of significance that would lead to rejection of $H_0$ with the observed sample.
Procedures In Hypothesis Testing
- State the null and alternative hypotheses $H_0$ and $H_a$
- Determine the rejection criterion
- Compute the appropriate test statistic and determine its distribution
- Calculate the $p$-value using the test statistics computed
- Reject/do not reject $H_0$, relate back to the research question
Confidence Intervals Vs Hypothesis Tests
Generally speaking, there is always a close relationship between the test of hypothesis about any parameter, say $\theta$, and the confidence interval for $\theta$.
If $[\ell, u]$ is a $100 \times (1 - \alpha)$% confidence interval for a parameter $\theta$ then the hypothesis test for $H_0 : \theta = \theta_0$ against $H_0 : \theta \ne \theta_0$ will reject $H_0$ at significance level $\alpha$ iff $\theta_0$ is not in $[\ell, u]$.
Inferences Concerning A Difference Of Means
Independent Samples
Hypothesis Test For The Difference In Means
The general situation is as follows:
- Population 1 has mean $\mu_1$ and standard deviation $\sigma_1$
- Population 2 has mean $\mu_2$ and standard deviation $\sigma_2$
Inferences will be based on two random samples of sizes $n_1$ and $n_2$ with $X_n$ being a sample from population $n$. We will first assume that the samples are independent. We would like to know whether $\mu_1 = \mu_2$ or not.
Hypothesis Test For $\mu_1 = \mu_2$
We can formalise this by stating the null hypothesis as $H_0 : \mu_1 = \mu_2$.
We observe two samples for which we compute the sample means $\bar{x_1}$ and $\bar{x_2}$.
- If $\bar{x_1} \simeq \bar{x_2}$ then $H_0$ is probably acceptable
- If $\bar{x_1}$ considerably different to $\bar{x_2}$ we reject $H_0$
We deduce the sampling distribution of $\bar{X_1} - \bar{X_2}$:
\[\bar{X_1} - \bar{X_2} \overset{(a)}{\sim} \mathcal{N} \left( \mu_1 - \mu_2, \sqrt{\frac{\sigma_1^2}{n_1}}{\frac{\sigma_2^2}{n_2}} \right).\]Confidence Interval For $\mu_1 - \mu_2$
A $100 \times (1 - \alpha)$% two-sided confidence interval for $\mu_1 - \mu_2$ is
\[\left[ (\bar{x_1} - \bar{x_2}) - z_{1 - \alpha / 2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}, (\bar{x_1} - \bar{x_2}) + z_{1 - \alpha / 2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \right].\]Similarly $100 \times (1 - \alpha)$% one-sided confidence intervals for $\mu_1 - \mu_2$ are
\[\left( -\infty, (\bar{x_1} - \bar{x_2}) + z_{1 - \alpha} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \right] \quad \text{and} \quad \left[ (\bar{x_1} - \bar{x_2}) - z_{1 - \alpha} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}, +\infty, \right)\]Inferences Concerning A Variance
The $\chi^2$-Distribution
A random variable, say $X$, is said to follow the chi-square-distribution with $v$ degrees of freedom, i.e. $X \sim \chi_v^2$ if its probability density function is given by
\[f(x) = \dfrac{1}{2^{v / 2} \Gamma \left( \frac{v}{2} \right)} x^{v / 2 - 1} e^{-x / 2}\]where
- $x > 0$
- $v \in \mathbb{Z}$
- $\Gamma(y) = (y - 1)!$
Mean & Variance Of The $\chi^2$-distribution
If $X \sim \chi_v^2$,
\[E(X) = v, \qquad Var(X) = 2v.\]Quantiles
If $s$ is the observed sample variance in a random sample of size $n$ drawn from a normal population, then a two-sided $100 \times (1 - \alpha)$% confidence interval for $\sigma^2$ is
\[\left[ \dfrac{(n - 1)s^2}{\chi_{n - 1; 1 - \alpha / 2}^2}, \dfrac{(n - 1)s^2}{\chi_{n - 1; \alpha / 2}^2} \right].\]Regression Analysis
The collection of statistical tools that are used to model and explore relationships between variables that are related is called regression analysis, and is one of the most widely used statistical techniques.
Simple Linear Regression
The model $Y = \beta_0 + \beta_1 X + \epsilon$ is called the simple linear regression model, where
- $X, Y$ are random variables
- $\beta_1$ (the slope) and $\beta_0$ (the intercept) are regressio coefficients
- $\epsilon$ is the random error
The linear function $\beta_0 + \beta_1 x$ is the regression function, and will be denoted $\mu_{Y|X = x} = \beta_0 + \beta_1$.
Least Squares Estimators
Introducing
\[\begin{array}{lcl} S_{XX} = \sum_{i = 1}^n (X_i - \bar{X})^2 & & \left( = \sum_{i = 1}^n X_i^2 - \frac{(\sum_i X_i)^2}{n} \right) \\ S_{XY} = \sum_{i = 1}^n (X_i - \bar{X})(Y_i - \bar{Y}) & & \left( = \sum_{i = 1}^n X_i Y_i - \frac{(\sum_i X_i)(\sum_i Y_i)}{n} \right) \end{array}\]we have
\[\hat{\beta_1} = \frac{S_{XY}}{S_{XX}} \quad \text{and} \quad \hat{\beta_0} = \bar{Y} - \frac{S_{XY}}{S_{XX}} \bar{X}.\]The estimated or fitted regression line is therefore $\hat{\beta_0} + \hat{\beta_1} x$ which is an estimate of $\mu_{Y|X = x}$. This is also used for predicting the future observation of $Y$ when $X$ is set to $x$, and is often denoted $\hat{y}(x) = \hat{\beta_0} + \hat{\beta_1} x$.
Estimating $\sigma^2$
An unbiased estimate of $\sigma^2$ is
\[s^2 = \frac{1}{n - 1} \sum_{i = 1}^n \hat{e_i^2}\]where $\hat{e_i} = y_i - \hat{y}(x_i), i = 1, 2, \dots, n$.
Inferences In Simple Linear Regression
Inferences Concerning $\beta_1$
An important hypothesis to consider regarding the simple linear regression model is that $\beta_1 = 0$ is equivalent to stating that the response does not depend on the predictor $X$ (as $Y = \beta_0 + \epsilon$).
At significance level $\alpha$, the rejection criterion for $H_0 : \beta_1 = 0$ against $H_a : \beta_1 \ne 0$ is
\[\text{reject} \ H_0 \ \text{if} \ \hat{b_1} \not\in \left[ -t_{n - 2, 1 - \alpha / 2} \frac{s}{\sqrt{s_{XX}}}, t_{n - 2, 1 - \alpha / 2} \frac{s}{\sqrt{s_{XX}}} \right]\]with the estimated standard deviation $s = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^n \hat{e}_i^2}$.
From the observed value of the test statistic under $t_0 = \sqrt{s_{XX}} \frac{\hat{b}_1}{s}$ we can compute the $p$-value
\[p = 1 - P(T \in [-|t_0|, |t_0|]) = 2 \times P(T > |t_0|)\]where $T$ is a random variable with distribution $t_{n - 2}$.
Confidence Interval On The Mean Response
A confidence interval may be constructed on the mean response at a specified value of $X$, say, $x$.
\[E(\hat{\mu}_{Y|X = x}) = \mu_{Y|X = x}, \qquad Var(\hat{\mu}_{Y|X = x}) = \sigma^2 \left( \frac{1}{n} + \frac{(x - \bar{x})^2}{s_{XX}} \right)\]The sampling distribution of the estimator $\hat{\mu}_{Y|X = x}$ is
\[\hat{\mu}_{Y|X = x} \sim \mathcal{N} \left( \mu_{Y|X = x}, \sigma \sqrt{\frac{1}{n} + \frac{(x - \bar{x})^2}{s_{XX}}} \right).\]From an observed sample for which we find $s$ and $\hat{y}(x)$ from the fitted model $\hat{y}(x) = \hat{b}_0 + \hat{b}_1 x$, a two-sided $100 \times (1 - \alpha)$% confidence interval for the parameter $\mu_{Y|X = x}$ is
\[\left[ \hat{y}(x) - t_{n - 2; 1 - \alpha} s \sqrt{\frac{1}{n} + \frac{(x - \hat{x})^2}{s_{XX}}}, \hat{y}(x) + t_{n - 2; 1 - \alpha} s \sqrt{\frac{1}{n} + \frac{(x - \hat{x})^2}{s_{XX}}} \right]\]Analysis Of Variance (ANOVA)
Comparing the between-group variability and the within-group variability is the purpose of the Analysis of Variance.
Suppose that we have $k$ different groups that we wish to compare. Often, each group is called a treatment or treatment level.
The response for each of the $k$ treatments is the random variable of interest, say $X$. Denote $X_{ij}$ the $j$th observation ($j = 1, \dots, n_i$) taken under treatment $i$.
We have $k$ independent samples where $\hat{X}_i$ and $S_i$ are the sample mean and standard deviation of the $i$th sample. The total number of observations is $n = n_1 + n_2 + \dots + n_k$, and the grand mean of all the observations, denoted $\bar{\bar{X}}$ is
\[\bar{\bar{X}} = \frac{1}{n} \sum_{i = 1}^k \sum_{j = 1}^{n_i} X_{ij} = \frac{n_1 \bar{X}_1 + n_2 \bar{X}_2 + \dots + n_k \bar{X}_k}{n}.\]The ANOVA model is the following
\[X_{ij} = \mu_i + \epsilon_{ij}\]where
- $\mu_i$ is the mean response for the $i$th treatment ($i = 1, 2, \dots, k$)
- $\epsilon_{ij}$ is an individual random error component ($j = 1, 2, \dots, n_i$)
The null hypothesis to be tested is $H_0 : \mu_1 = \mu_2 = \cdots = \mu_k$ versus the general alternative $H_a : \text{not all the means are equal}$.
Variability Decomposition
The ANOVA partitions the total variability in the sample data, described by the total sum of squares
\[SS_{Tot} = \sum_{i = 1}^k \sum_{j = 1}^{n_i} (X_{ij} - \bar{\bar{X}})^2 \quad (df = n - 1)\]into the treatment sum of squares (= variability between groups)
\[SS_{Tr} = \sum_{i = 1}^k n_i (\bar{X}_i - \bar{\bar{X}})^2 \quad (df = k - 1)\]and the error sum of squares (= variability within groups)
\[SS_{Er} = \sum_{i = 1}^k \sum_{j = 1}^{n_i} (X_{ij} - \bar{X}_i)^2 \quad (df = n - k).\]The sum of squares identity is
\[SS_{Tot} = SS_{Tr} + SS_{Er}.\]Mean Squared Error
\[MS_{Er} = \frac{SS_{Er}}{n - k}\]Treatment Mean Square
\[MS_{Tr} = \frac{SS_{Tr}}{k - 1}\]Fisher’s $F$-distribution
If $H_0$ is true, the ratio
\[F = \frac{MS_{Tr}}{MS_{Er}} = \frac{\frac{SS_{Tr}}{k - 1}}{\frac{SS_{Er}}{n - k}}\]follows a particular distribution known as the Fisher’s $F$-distribution with $k - 1$ and $n - k$ degrees of freedom.
It is denoted $F \sim \mathbf{F}_{k - 1, n - k}$.
A random variable, say $X$ is said to follow Fisher’s $F$-distribution with $d_1$ and $d_2$ degrees of freedom, i.e. $F \sim \mathbf{F}_{d_1, d_2}$ if its probability density function is given by
\[f(x) = \frac{\Gamma ((d_1 + d_2) / 2) (d_1 / d_2)^{d_1 / 2} x^{d_1 / 2 - 1}}{\Gamma (d_1 / 2) \Gamma (d_2 / 2) ((d_1 / d_2) x + 1)^{(d_1 + d_2) / 2}}\]where
- $x > 0$
- $d_1, d_2 \in \mathbb{Z}$
- $\Gamma(n) = (n - 1)!$
Mean & Variance Of The Fisher’s $F$-distribution
If $X \sim \mathbf{F}_{d_1, d_2}$,
\[E(X) = \frac{d_2}{d_2 - 2}, d_2 > 2, \qquad Var(X) = \frac{2 d_2^2 (d_1 + d_2 - 2)}{d_1 (d_2 - 2)^2 (d_2 - 4)}, d_2 > 4.\]$F$ Critical Value
Let $f_{d_1, d_2; \alpha}$ be the value such that $P(X > f_{d_1, d_2; \alpha}) = 1 - \alpha$ for $X \sim \mathbf{F}_{d_1, d_2}$. Then it can be shown that
\[f_{d_1, d_2; \alpha} = \frac{1}{f_{d_2, d_1; 1 - \alpha}}\]where $f_{d_1, d_2; \alpha}$ is also referred to as the $F$ critical value.
ANOVA Test
$p$-value
The observed value of the test statistic is $f_0 = \dfrac{MS_{Tr}}{MS_{Er}}$ and thus the $p$-value is given by $p = P(X > f_0)$.
Confidence Intervals On Treatment Means
A $100 \times (1 - \alpha)$% two-sided confidence interval for $\mu_i$, from the observed values $\bar{x}_i$ and $MS_{Er}$ is
\[\left [ \bar{x}_i - t_{n - k, 1 - \alpha / 2} \sqrt{\frac{MS_{Er}}{n_i}}, \bar{x}_i + t_{n - k, 1 - \alpha / 2} \sqrt{\frac{MS_{Er}}{n_i}} \right].\]Pairwise Comparisons
It is also possible to build confidence intervals for the differences between two means $\mu_i$ and $\mu_j$. From observed values $\bar{x}_i, \bar{x}_j$ and $MS_{Er}$, a $100 \times (1 - \alpha)$% confidence interval for $\mu_i - \mu_j$ is
\[\left[ (\bar{x}_i - \bar{x}_j) - t_{n - k; 1 - \alpha / 2} \sqrt{MS_{Er} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}, (\bar{x}_i - \bar{x}_j) + t_{n - k; 1 - \alpha / 2} \sqrt{MS_{Er} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)} \right]\]for any pair of groups $(i, j)$.