Written by Luka Kerr on August 2, 2019


Statistics is the science of the collection, processing, analysis, and interpretation of data.



Descriptive Statistics

Data should be presented in ways that facilitate their interpretation and subsequent analysis.

Types Of Variables

Descriptive Measures

Measures Of Centre

The Sample Mean
\[\bar{x} = \dfrac{1}{n} \sum_{i = 1}^n x_i\]
The Sample Median

The median, usually denoted $m$ (or $\overset{\sim}{x}$), is the value which divides the data into two equal parts, half below the median and half above.

\[\begin{cases} \text{even}(n) & \dfrac{1}{2} (x_{\frac{n}{2}} + x_{\frac{n}{2} + 1}) \\ \text{odd}(n) & x_{\frac{n + 1}{2}} \end{cases}\]

Quartiles & Percentiles

The sample $(100 \times p)$th percentile (or quantile) is the value such that $100 \times p$% of the observations are below this value (or equal to it), and the other $100 \times (1 - p)$% are above this value (or equal to it).

Five Number Summary
\[\{ x_1, q_1, m, q_3, x_n \}\]


Measure Of Variability

The Sample Variance
\[\begin{array}{rcl} s^2 & = & \dfrac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})^2 \\ & \equiv & \dfrac{1}{n - 1} (\sum_{i = 1}^n x_1^2 - n \bar{x}^2) \end{array}\]

The number $n - 1$ is the number of degrees of freedom for $s^2$.

The Sample Standard Deviation
\[s = \sqrt{s^2} = \sqrt{\dfrac{1}{n - 1} \sum_{i = 1}^n (x_i - \bar{x})^2}\]
Interquartile Range
\[iqr = q_3 - q_1\]

Elements Of Probability

A random experiment (or chance experiment) is any experiment whose exact outcome cannot be predicted with certainty.

The set of all possible outcomes of a random experiment is called the sample space of the experiment. It is usually denoted $S$.

An event $E$ is a subset of the sample space of a random experiment.



$P(E)$ denotes how likely $E$ is to occur.

\[P(E) = \lim_{n \to \infty} \dfrac{\#occurrences(E)}{n}\]

Combinatorics Rules

Conditional Probabilities

The conditional probability of $E_1$, conditional on $E_2$, is defined as

\[P(E_1 | E_2) = \dfrac{P(E_1 \cap E_2)}{P(E_2)}.\]

Bayes’ First Rule: If $P(E_1) > 0$ and $P(E_2) > 0$ then $P(E_1 | E_2) = P(E_2 | E_1) \times \dfrac{P(E_1)}{P(E_2)}$.

Multiplicative Law Of Probability: $P(\bigcup_{i = 1}^n E_i) = P(E_1) \times P(E_2 | E_1) \times P(E_3 | E_1 \cap E_2) \times \cdots \times P(E_n | \bigcap_{i = 1}^{n - 1} E_i)$.

Independent Events

Two events $E_1$ and $E_2$ are said to be independent if and only if

\[P(E_1 \cap E_2) = P(E_1) \times P(E_2).\]

The events $E_1$, $E_2, \dots, E_n$ are said to be independent iff for every subset ${ i_1, i_2, \dots, i_r : r \le n }$ of ${ 1, 2, \dots, n }$,

\[P(\bigcap_{j = 1}^r E_{i_j}) = \prod_{j = 1}^r P(E_{i_j})\]

Bayes’ Formula

A sequence of events $E_1, E_2, \dots, E_n$ such that

  1. $S = \bigcap_{i = 1}^n E_i$ and
  2. $E_i \cap E_j = \emptyset$ for all $i \ne j$

is called a partition of $S$.

Law Of Total Probability

Given a partition ${ E_1, E_2, \dots, E_n }$ of $S$ such that $P(E_i) > 0$ for all $i$, the probability of any event $A$ can be written

\[P(A) = \sum_{i = 1}^n P(A | E_i) \times P(E_i).\]

Bayes’ Second Rule

Given a partition ${ E_1, E_2, \dots, E_n }$ of $S$ such that $P(E_i) > 0$ for all $i$, we have, for any event $A$ such that $P(A) > 0$,

\[P(E_i | A) = \dfrac{P(A | E_j) P(E_i)}{\sum_{j = 1}^n P(A | E_j) P(E)j)}\]

Random Variables

A random variable is a real-valued function defined over the sample space

\[\begin{array}{rcl} X : S & \to & \mathbb{R} \\ \omega & \to & X(\omega) \end{array}\]

$S_x$ is the domain of variation of $X$, that is the set of possible values taken by $X$.

Cumulative Distribution Function

A random variable is often described by its cumulative distribution function (cdf) (or just distribution).

The cdf of the random variable $X$ is defined for any real number $x$ by $F(x) = P(X \le x)$.

Discrete Random Variables

A random variable is said to be discrete if it can only assume a finite (or at most countably infinite) number of values.

The probability mass function (pmf) of a discrete random variable $X$ is defined for any real number $x$, by $p(x) = P(X = x)$.

Bernoulli Random Variable

Continuous Random Variables

A random variable $X$ is said to be continuous if there exists a nonnegative function $f(x)$ defined for all real $x \in \mathbb{R}$ such that for any set $B$ of real numbers, $P(X \in B) = \int_B f(x) dx$.

Discrete Vs Continuous Random Variables

  Discrete Continuous
Domain $S_x = { x_1, x_2, \dots }$  
PMF $\forall x \in \mathbb{R}, p(x) = P(X = x) \ge 0$ $p(x) \equiv 0$
PDF Doesn’t exist $\forall x \in \mathbb{R}, f(x) = F^\prime(x) \ge 0$

Expectation Of A Random Variable

The expectation or the mean of a random variable $X$, denoted $E(X)$, or $\mu$, is given by

Discrete Random Variable: $\mu = E(X) = \sum_{x \in S_x} x p(x)$

Continuous Random Variable: $\mu = E(X) = \int_{S_x} x f(x) \ dx$

Expectation Of A Function Of A Random Variable

Discrete Random Variable: $E(g(X)) = \sum_{x \in S_x} g(x) p(x)$

Continuous Random Variable: $E(g(X)) = \int_{S_x} g(x) f(x) \ dx$

Variance Of A Random Variable

The variance of a random variable $X$, usually denoted by $Var(X)$, or $\sigma^2$, is defined by $Var(X) = E((X - \mu)^2)$.

Discrete Random Variable: $\sigma^2 = Var(X) = \sum_{x \in S_x} (x - \mu)^2 p(x)$

Continuous Random Variable: $\sigma^2 = Var(X) = \int_{S_x} (x - \mu)^2 f(x) \ dx$

An alternative formula for $Var(X)$ is $\sigma^2 = Var(X) = E(X^2) - (E(X))^2 = E(X^2) - \mu^2$, where $E(X^2) = \sum_{x \in S_x} x^2 p(x)$.


Suppose you have a random variable $X$ with mean $\mu$ and variance $\sigma^2$. Then, the associated standardised random variable, often denoted $Z$, is given by

\[Z = \dfrac{X - \mu}{\sigma}\]

that is, $Z = \dfrac{1}{\sigma} X - \dfrac{\mu}{\sigma}$.

A standardised random variable has always mean 0 and variance 1.

Joint Distribution Function

The joint cumulative distribution function of $X$ and $Y$ is given by

\[F_{XY}(x, y) = P(X \le x, Y \le y) \quad \forall (x, y) \in \mathbb{R} \times \mathbb{R}\]

where $P(X \le x, Y \le y) \equiv (X \le x) \cap (Y \le y)$.


If $X$ and $Y$ are both discrete, the joint probability mass function is defined by

\[p_{XY}(x, y) = P(X = x, Y = y).\]


$X$ and $Y$ are said to be jointly continuous if there exists a function $f_{XY}(x, y) : \mathbb{R} \times \mathbb{R} \to \mathbb{R}^+$ such that for any sets $A$ and $B$ of real numbers

\[P(X \in A, Y \in B) = \int_A \int_B f_{XY}(x, y) \ dy \ dx.\]

Expectation Of A Function Of Two Random Variables

For any function $g : \mathbb{R} \times \mathbb{R} \to \mathbb{R}$, the expectation of $g(X, Y)$ is given by

Discrete Random Variable: $E(g(X, Y)) = \sum_{x \in S_x} \sum_{y \in S_y} g(x, y) p_{XY}(x, y)$

Continuous Random Variable: $G(g(X, Y)) = \int_{S_x} \int_{S_y} g(x, y) f_{XY}(x, y) \ dy \ dx$

Independent Random Variables

The random variables $X$ and $Y$ are said to be independent if, for all $(x, y) \in \mathbb{R} \times \mathbb{R}$,

\[P(X \le x, Y \le y) = P(X \le x) \times P(Y \le y).\]

If $X$ and $Y$ are independent, then for any functions $h$ and $g$,

\[E(h(X) g(Y)) = E(h(X)) \times E(g(Y)).\]

Covariance Of Two Random Variables

The covariance of two random variables $X$ and $Y$ is defined by

\[Cov(X, Y) = E((X - E(X))(Y - E(Y))).\]

Special Random Variables

Binomial Random Variables

The Binomial Distribution

Define $X =$ number of successes observed over the $n$ repetitions. We say that $X$ is a binomial random variable with parameters $n$ and $\pi$:

\[X \sim Bin(n, \pi)\]

The binomial probability mass function is given by $p(x) = C^n_x \pi^x (1 - \pi)^{n - x}$ for $n \in S_x$.

The Bernoulli Distribution

If $n = 1$ then there is a Bernoulli distribution.

\[X \sim Bern(\pi)\]

The Bernoulli probability mass function is given by

\[p(x) = \begin{cases} x = 0 & 1 - \pi \\ x = 1 & \pi \\ \text{otherwise} & 0 \end{cases}\]
Mean & Variance Of The Binomial Distribution

If $X \sim Bin(n, \pi)$,

\[E(X) = n \pi, \qquad Var(X) = n \pi (1 - \pi).\]

Poisson Random Variables

The Poisson Distribution

Define $X =$ number of occurrences. We say that $X$ is a Poisson random variable with parameter $\lambda$, i.e. $X \sim \mathcal{P}(\lambda)$, if

\[p(x) = e^{-\lambda} \dfrac{\lambda^x}{x!}\]

for $x \in S_x = \{ 0, 1, 2, \dots \}$.

The Poisson distribution is suitable for modelling the number of occurrences of a random phenomenon satisfying some assumptions of continuity, stationarity, independence and non-simultaneity.

Mean & Variance Of The Poisson Distribution

If $X \sim \mathcal{P}(\lambda)$,

\[E(X) = \lambda, \qquad Var(X) = \lambda.\]

Uniform Random Variables

The Uniform Distribution

A random variable is said to be uniformly distributed over an interval $[\alpha, \beta]$, i.e. $X \sim U_{[\alpha, \beta]}$ if its probability density function is given by

\[f(x) = \begin{cases} x \in [\alpha, \beta] & \dfrac{1}{\beta - \alpha} \\ \text{otherwise} & 0 \end{cases}\]
Mean & Variance Of The Uniform Distribution

If $X \sim U_{[\alpha, \beta]}$,

\[E(X) = \dfrac{\alpha + \beta}{2}, \qquad Var(X) = \dfrac{(\beta - \alpha)^2}{12}.\]

Exponential Random Variables

The Exponential Distribution

A random variable is said to be an Exponential random variable with parameter $\mu (\mu > 0)$, i.e. $X \sim Exp(\mu)$ if its probability density function is given by

\[f(x) = \begin{cases} x \ge 0 & \frac{1}{\mu} e^{-\frac{x}{\mu}} \\ \text{otherwise} & 0 \end{cases}\]

This distribution is often useful for representing random amounts of time.

Mean & Variance Of The Exponential Distribution

If $X \sim Exp(\mu)$,

\[E(X) = \mu, \qquad Var(X) = \mu^2.\]

Normal Random Variables

The Normal Distribution

A random variable is said to be normally distributed with parameters $\mu$ and $\sigma (\sigma > 0)$, i.e. $X \sim \mathcal{N}(\mu, \sigma)$ if its probability density function is given by

\[f(x) = \dfrac{1}{\sqrt{2 \pi} \sigma} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}\]
Mean & Variance Of The Normal Distribution

If $X \sim \mathcal{N}(\mu, \sigma)$,

\[E(X) = \mu, \qquad Var(X) = \sigma^2\]

The Standard Normal Distribution

The Standard Normal distribution is the Normal distribution with $\mu = 1$ and $\sigma = 1$.


If $X \sim \mathcal{N}(\mu, \sigma)$, then

\[Z = \dfrac{X - \mu}{\sigma} \sim \mathcal{N}(0, 1).\]


A value $z_\alpha$ is called the quantile of level $\alpha$ of the standard normal distribution.

Common quantiles include $0.95, 0.975$ and $0.995$.

Sampling Distributions

Random Sampling

The set of observations $X_1, X_2, \dots, X_n$ constitutes a random sample if

Estimation & Sampling Distribution

An estimator of the unknown parameter of interest $\theta$ must be a statistic, i.e. a function of the sample $\hat{\Theta} = h(X_1, X_2, \dots, X_n)$.

Inference Concerning A Mean

Point Estimation

An estimator is a random variable, which has its mean, its variance and its probability distribution, known as the sampling distribution.



An estimator $\hat{\Theta}$ of $\theta$ is said to be unbiased iff its mean is equal to $\theta$, whatever the value of $\theta$, i.e. $E(\hat{\Theta}) = \theta$.

If an estimator is not unbiased, then the difference $E(\hat{\Theta}) - \theta$ is called the bias of the estimator.

The property of unbiasedness is one of the most desirable properties of an estimator.


A logical principle of estimation is to choose the unbiased estimator that has minimum variance. Such an estimator is said to be efficient among the unbiased estimators.


An estimator is consistent if the probability that the estimator is ‘close’ to $\theta$ increases to one as the sample size increases.

Standard Error

The standard error of an estimator $\hat{\Theta}$ is its standard deviation $sd(\hat{\Theta})$.

Interval Estimation

Instead of giving a point estimate $\hat{\theta}$, that is a single value that we know not to be equal to $\theta$ anyway, we give an interval in which we are very confident to find the true value.

Confidence Intervals

A confidence interval is an interval for which we can assert with a reasonable degree of certainty (or confidence) that it will contain the true value of the population parameter under consideration.

A confidence interval of level $100 \times (1 - \alpha)$% means that we are $100 \times (1 - \alpha)$% confident that the true value of the parameter is included into the interval ($\alpha$ is a real number in $[0, 1]$).

The most frequently used confidence levels are 90%, 95% and 99%.

if $\hat{x}$ is the sample mean of an observed random sample of size $n$ from a normal distribution with known variance $\sigma^2$, a confidence interval of level $100 \times (1 - \alpha)$ for $\mu$ is given by

\[\left[ \hat{x} - z_{1 - \alpha/2} \dfrac{\sigma}{\sqrt{n}}, \hat{x} + z_{1 - \alpha/2} \dfrac{\sigma}{\sqrt{n}} \right].\]

The Central Limit Theorem

The sum of a large number of independent random variables has a distribution that is approximately normal.

If $X_1, X_2, \dots, X_n$ a random sample taken from a population with mean $\mu$ and finite variance $\sigma^2$, and if $\hat{X}$ the sample mean, then the limiting distribution of $\sqrt{n} \frac{\hat{X} - \mu}{\sigma}$ as $n \to \infty$, is the standard normal distribution.

The standard normal distribution provides a reasonable approximation to the distribution of $\sqrt{n} \frac{\hat{X} - \mu}{\sigma}$ when $n$ is large. This is usually denoted $\sqrt{n} \frac{\hat{X} - \mu}{\sigma} \overset{a}{\sim} \mathcal{N} (0, 1)$.

Confidence Interval On The Mean Of A Normal Distribution, Variance Unknown

Sample Variance

From the random sample $X_1, X_2, \dots, X_n$ we have a natural estimator of the unknown $\sigma^2$, the sample variance:

\[S^2 = \dfrac{1}{n - 1} \sum_{i = 1}^n (X_i - \hat{X})^2.\]

A natural procedure is thus to replace $\sigma$ with the sample standard deviation $S$, and to work with the random variable

\[T = \sqrt{n} \dfrac{\hat{X} - \mu}{S}.\]

The Student’s $t$-distribution

In a normal population, the exact distribution of $T$ is the so-called $t$-distribution with $n - 1$ degrees of freedom: $T \sim t_{n - 1}$.

A random variable, say $T$ , is said to follow the Student’s $t$-distribution with $v$ degrees of freedom, i.e. $T \sim t_v$.

Mean & Variance Of The Student’s $t$-distribution

If $T \sim t_v$,

\[E(T) = 0, \qquad Var(T) = \dfrac{v}{v - 2}\]

for $v > 2$.


Let $t_{v; \alpha}$ be the value such that $P(T > t_{v; \alpha}) = 1 - \alpha$ for $T \sim t_v$. Like the standard normal distribution the symmetry of any $t_v$-distribution implies that $t_{v; 1 - \alpha} = -t_{v; \alpha}$. $t_{v; \alpha}$ is also referred to as $t$ critical value.

$t$-confidence Interval

If $\hat{x}$ and $s$ are the sample mean and sample standard deviation of an observed random sample of size $n$ from a normal distribution, a confidence interval of level $100 \times (1 - \alpha)$% for $\mu$ is given by

\[\left[ \hat{x} - t_{n - 1; 1 - \alpha / 2} \dfrac{s}{\sqrt{n}}, \hat{x} + t_{n - 1; 1 - \alpha / 2} \dfrac{s}{\sqrt{n}} \right].\]

This confidence interval is sometimes called $t$-confidence interval.

Large Sample Confidence Interval

If $\hat{x}$ and $s$ are the sample mean and standard deviation of an observed random sample of large size $n$ from any distribution, an approximate confidence interval of level $100 \times (1 - \alpha)$% for $\mu$ is

\[\left[ \hat{x} - z_{1 - \alpha / 2} \dfrac{s}{\sqrt{n}}, \hat{x} + z_{1 - \alpha / 2} \dfrac{s}{\sqrt{n}} \right]\]


Prediction Intervals

The predictor of $X_{n + 1}$, say $X^\ast$, should be a statistic.

We desire the predictor to have expected prediction error equal to 0:

\[E(X_{n + 1} - X^\ast) = 0 \quad \iff \quad E(X^\ast) = \mu.\]

Confidence Interval On A Proportion

We would like to learn about $\pi$, the proportion of the population that has a characteristic of interest. In this situation, the random variable to study is

\[X = \begin{cases} 1 & \text{if the individual has the characteristic of interest} \\ 0 & \text{if not} \end{cases}\]

The number $Y$ of individuals of the sample with the characteristic is

\[Y = \sum_{i = 1}^n X_i \sim Bin(n, \pi)\]

and the sample proportion is

\[\hat{P} = \dfrac{Y}{n}.\]

If $\hat{p}$ is the sample proportion in an observed random sample of size $n$, an approximate two-sided confidence interval of level $100 - (1 - \alpha)$% for $\pi$ is given by

\[\left[ \hat{p} - z_{1 - \alpha / 2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}, \hat{p} + z_{1 - \alpha / 2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}} \right]\]

Tests Of Hypotheses

Many problems in engineering require that we decide which of two competing statements about some parameters are true. The statements are called hypotheses.

The null hypothesis is usually denoted $H_0$. It is the default hypothesis we will assume is true unless we have enough evidence to compel us to change our minds.

The alternative hypothesis is usually denoted $H_a$.

Null Hypothesis

The value $\mu_0$ of the population parameter specified in the null hypothesis $H_0 : \mu = \mu_0$ is usually determined in one of three ways:

Alternative Hypothesis

The alternative hypothesis can essentially be of two types.

A two-sided alternative is when $H_a$ is of the form $H_a : \mu \ne \mu_0$. This is the exact negation of the null hypothesis.

A one-sided alternative is when we may wish to favour a given direction for the alternative and is of the form $H_a : \mu < \mu_0$ or $H_a : \mu > \mu_0$.

Hypothesis Testing

The numerical value which is computed from a sample and used to decide between $H_0$ and $H_a$ is called the test statistic.

The values for which we reject $H_0$ are called the rejection regions for the test, the limiting values are called the critical values.


The probability of type I error error is usually denoted $\alpha$, and the probability of type II error is usually denoted $\beta$. The power of the test is $1 - \beta$.


The $p$-value is the smallest level of significance that would lead to rejection of $H_0$ with the observed sample.

Procedures In Hypothesis Testing

  1. State the null and alternative hypotheses $H_0$ and $H_a$
  2. Determine the rejection criterion
  3. Compute the appropriate test statistic and determine its distribution
  4. Calculate the $p$-value using the test statistics computed
  5. Reject/do not reject $H_0$, relate back to the research question

Confidence Intervals Vs Hypothesis Tests

Generally speaking, there is always a close relationship between the test of hypothesis about any parameter, say $\theta$, and the confidence interval for $\theta$.

If $[\ell, u]$ is a $100 \times (1 - \alpha)$% confidence interval for a parameter $\theta$ then the hypothesis test for $H_0 : \theta = \theta_0$ against $H_0 : \theta \ne \theta_0$ will reject $H_0$ at significance level $\alpha$ iff $\theta_0$ is not in $[\ell, u]$.

Inferences Concerning A Difference Of Means

Independent Samples

Hypothesis Test For The Difference In Means

The general situation is as follows:

Inferences will be based on two random samples of sizes $n_1$ and $n_2$ with $X_n$ being a sample from population $n$. We will first assume that the samples are independent. We would like to know whether $\mu_1 = \mu_2$ or not.

Hypothesis Test For $\mu_1 = \mu_2$

We can formalise this by stating the null hypothesis as $H_0 : \mu_1 = \mu_2$.

We observe two samples for which we compute the sample means $\bar{x_1}$ and $\bar{x_2}$.

We deduce the sampling distribution of $\bar{X_1} - \bar{X_2}$:

\[\bar{X_1} - \bar{X_2} \overset{(a)}{\sim} \mathcal{N} \left( \mu_1 - \mu_2, \sqrt{\frac{\sigma_1^2}{n_1}}{\frac{\sigma_2^2}{n_2}} \right).\]

Confidence Interval For $\mu_1 - \mu_2$

A $100 \times (1 - \alpha)$% two-sided confidence interval for $\mu_1 - \mu_2$ is

\[\left[ (\bar{x_1} - \bar{x_2}) - z_{1 - \alpha / 2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}, (\bar{x_1} - \bar{x_2}) + z_{1 - \alpha / 2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \right].\]

Similarly $100 \times (1 - \alpha)$% one-sided confidence intervals for $\mu_1 - \mu_2$ are

\[\left( -\infty, (\bar{x_1} - \bar{x_2}) + z_{1 - \alpha} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \right] \quad \text{and} \quad \left[ (\bar{x_1} - \bar{x_2}) - z_{1 - \alpha} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}, +\infty, \right)\]

Inferences Concerning A Variance

The $\chi^2$-Distribution

A random variable, say $X$, is said to follow the chi-square-distribution with $v$ degrees of freedom, i.e. $X \sim \chi_v^2$ if its probability density function is given by

\[f(x) = \dfrac{1}{2^{v / 2} \Gamma \left( \frac{v}{2} \right)} x^{v / 2 - 1} e^{-x / 2}\]


Mean & Variance Of The $\chi^2$-distribution

If $X \sim \chi_v^2$,

\[E(X) = v, \qquad Var(X) = 2v.\]

If $s$ is the observed sample variance in a random sample of size $n$ drawn from a normal population, then a two-sided $100 \times (1 - \alpha)$% confidence interval for $\sigma^2$ is

\[\left[ \dfrac{(n - 1)s^2}{\chi_{n - 1; 1 - \alpha / 2}^2}, \dfrac{(n - 1)s^2}{\chi_{n - 1; \alpha / 2}^2} \right].\]

Regression Analysis

The collection of statistical tools that are used to model and explore relationships between variables that are related is called regression analysis, and is one of the most widely used statistical techniques.

Simple Linear Regression

The model $Y = \beta_0 + \beta_1 X + \epsilon$ is called the simple linear regression model, where

The linear function $\beta_0 + \beta_1 x$ is the regression function, and will be denoted $\mu_{Y|X = x} = \beta_0 + \beta_1$.

Least Squares Estimators


\[\begin{array}{lcl} S_{XX} = \sum_{i = 1}^n (X_i - \bar{X})^2 & & \left( = \sum_{i = 1}^n X_i^2 - \frac{(\sum_i X_i)^2}{n} \right) \\ S_{XY} = \sum_{i = 1}^n (X_i - \bar{X})(Y_i - \bar{Y}) & & \left( = \sum_{i = 1}^n X_i Y_i - \frac{(\sum_i X_i)(\sum_i Y_i)}{n} \right) \end{array}\]

we have

\[\hat{\beta_1} = \frac{S_{XY}}{S_{XX}} \quad \text{and} \quad \hat{\beta_0} = \bar{Y} - \frac{S_{XY}}{S_{XX}} \bar{X}.\]

The estimated or fitted regression line is therefore $\hat{\beta_0} + \hat{\beta_1} x$ which is an estimate of $\mu_{Y|X = x}$. This is also used for predicting the future observation of $Y$ when $X$ is set to $x$, and is often denoted $\hat{y}(x) = \hat{\beta_0} + \hat{\beta_1} x$.

Estimating $\sigma^2$

An unbiased estimate of $\sigma^2$ is

\[s^2 = \frac{1}{n - 1} \sum_{i = 1}^n \hat{e_i^2}\]

where $\hat{e_i} = y_i - \hat{y}(x_i), i = 1, 2, \dots, n$.

Inferences In Simple Linear Regression

Inferences Concerning $\beta_1$

An important hypothesis to consider regarding the simple linear regression model is that $\beta_1 = 0$ is equivalent to stating that the response does not depend on the predictor $X$ (as $Y = \beta_0 + \epsilon$).

At significance level $\alpha$, the rejection criterion for $H_0 : \beta_1 = 0$ against $H_a : \beta_1 \ne 0$ is

\[\text{reject} \ H_0 \ \text{if} \ \hat{b_1} \not\in \left[ -t_{n - 2, 1 - \alpha / 2} \frac{s}{\sqrt{s_{XX}}}, t_{n - 2, 1 - \alpha / 2} \frac{s}{\sqrt{s_{XX}}} \right]\]

with the estimated standard deviation $s = \sqrt{\frac{1}{n - 1} \sum_{i = 1}^n \hat{e}_i^2}$.

From the observed value of the test statistic under $t_0 = \sqrt{s_{XX}} \frac{\hat{b}_1}{s}$ we can compute the $p$-value

\[p = 1 - P(T \in [-|t_0|, |t_0|]) = 2 \times P(T > |t_0|)\]

where $T$ is a random variable with distribution $t_{n - 2}$.

Confidence Interval On The Mean Response

A confidence interval may be constructed on the mean response at a specified value of $X$, say, $x$.

\[E(\hat{\mu}_{Y|X = x}) = \mu_{Y|X = x}, \qquad Var(\hat{\mu}_{Y|X = x}) = \sigma^2 \left( \frac{1}{n} + \frac{(x - \bar{x})^2}{s_{XX}} \right)\]

The sampling distribution of the estimator $\hat{\mu}_{Y|X = x}$ is

\[\hat{\mu}_{Y|X = x} \sim \mathcal{N} \left( \mu_{Y|X = x}, \sigma \sqrt{\frac{1}{n} + \frac{(x - \bar{x})^2}{s_{XX}}} \right).\]

From an observed sample for which we find $s$ and $\hat{y}(x)$ from the fitted model $\hat{y}(x) = \hat{b}_0 + \hat{b}_1 x$, a two-sided $100 \times (1 - \alpha)$% confidence interval for the parameter $\mu_{Y|X = x}$ is

\[\left[ \hat{y}(x) - t_{n - 2; 1 - \alpha} s \sqrt{\frac{1}{n} + \frac{(x - \hat{x})^2}{s_{XX}}}, \hat{y}(x) + t_{n - 2; 1 - \alpha} s \sqrt{\frac{1}{n} + \frac{(x - \hat{x})^2}{s_{XX}}} \right]\]

Analysis Of Variance (ANOVA)

Comparing the between-group variability and the within-group variability is the purpose of the Analysis of Variance.

Suppose that we have $k$ different groups that we wish to compare. Often, each group is called a treatment or treatment level.

The response for each of the $k$ treatments is the random variable of interest, say $X$. Denote $X_{ij}$ the $j$th observation ($j = 1, \dots, n_i$) taken under treatment $i$.

We have $k$ independent samples where $\hat{X}_i$ and $S_i$ are the sample mean and standard deviation of the $i$th sample. The total number of observations is $n = n_1 + n_2 + \dots + n_k$, and the grand mean of all the observations, denoted $\bar{\bar{X}}$ is

\[\bar{\bar{X}} = \frac{1}{n} \sum_{i = 1}^k \sum_{j = 1}^{n_i} X_{ij} = \frac{n_1 \bar{X}_1 + n_2 \bar{X}_2 + \dots + n_k \bar{X}_k}{n}.\]

The ANOVA model is the following

\[X_{ij} = \mu_i + \epsilon_{ij}\]


The null hypothesis to be tested is $H_0 : \mu_1 = \mu_2 = \cdots = \mu_k$ versus the general alternative $H_a : \text{not all the means are equal}$.

Variability Decomposition

The ANOVA partitions the total variability in the sample data, described by the total sum of squares

\[SS_{Tot} = \sum_{i = 1}^k \sum_{j = 1}^{n_i} (X_{ij} - \bar{\bar{X}})^2 \quad (df = n - 1)\]

into the treatment sum of squares (= variability between groups)

\[SS_{Tr} = \sum_{i = 1}^k n_i (\bar{X}_i - \bar{\bar{X}})^2 \quad (df = k - 1)\]

and the error sum of squares (= variability within groups)

\[SS_{Er} = \sum_{i = 1}^k \sum_{j = 1}^{n_i} (X_{ij} - \bar{X}_i)^2 \quad (df = n - k).\]

The sum of squares identity is

\[SS_{Tot} = SS_{Tr} + SS_{Er}.\]

Mean Squared Error

\[MS_{Er} = \frac{SS_{Er}}{n - k}\]

Treatment Mean Square

\[MS_{Tr} = \frac{SS_{Tr}}{k - 1}\]

Fisher’s $F$-distribution

If $H_0$ is true, the ratio

\[F = \frac{MS_{Tr}}{MS_{Er}} = \frac{\frac{SS_{Tr}}{k - 1}}{\frac{SS_{Er}}{n - k}}\]

follows a particular distribution known as the Fisher’s $F$-distribution with $k - 1$ and $n - k$ degrees of freedom.

It is denoted $F \sim \mathbf{F}_{k - 1, n - k}$.

A random variable, say $X$ is said to follow Fisher’s $F$-distribution with $d_1$ and $d_2$ degrees of freedom, i.e. $F \sim \mathbf{F}_{d_1, d_2}$ if its probability density function is given by

\[f(x) = \frac{\Gamma ((d_1 + d_2) / 2) (d_1 / d_2)^{d_1 / 2} x^{d_1 / 2 - 1}}{\Gamma (d_1 / 2) \Gamma (d_2 / 2) ((d_1 / d_2) x + 1)^{(d_1 + d_2) / 2}}\]


Mean & Variance Of The Fisher’s $F$-distribution

If $X \sim \mathbf{F}_{d_1, d_2}$,

\[E(X) = \frac{d_2}{d_2 - 2}, d_2 > 2, \qquad Var(X) = \frac{2 d_2^2 (d_1 + d_2 - 2)}{d_1 (d_2 - 2)^2 (d_2 - 4)}, d_2 > 4.\]

$F$ Critical Value

Let $f_{d_1, d_2; \alpha}$ be the value such that $P(X > f_{d_1, d_2; \alpha}) = 1 - \alpha$ for $X \sim \mathbf{F}_{d_1, d_2}$. Then it can be shown that

\[f_{d_1, d_2; \alpha} = \frac{1}{f_{d_2, d_1; 1 - \alpha}}\]

where $f_{d_1, d_2; \alpha}$ is also referred to as the $F$ critical value.



The observed value of the test statistic is $f_0 = \dfrac{MS_{Tr}}{MS_{Er}}$ and thus the $p$-value is given by $p = P(X > f_0)$.

Confidence Intervals On Treatment Means

A $100 \times (1 - \alpha)$% two-sided confidence interval for $\mu_i$, from the observed values $\bar{x}_i$ and $MS_{Er}$ is

\[\left [ \bar{x}_i - t_{n - k, 1 - \alpha / 2} \sqrt{\frac{MS_{Er}}{n_i}}, \bar{x}_i + t_{n - k, 1 - \alpha / 2} \sqrt{\frac{MS_{Er}}{n_i}} \right].\]

Pairwise Comparisons

It is also possible to build confidence intervals for the differences between two means $\mu_i$ and $\mu_j$. From observed values $\bar{x}_i, \bar{x}_j$ and $MS_{Er}$, a $100 \times (1 - \alpha)$% confidence interval for $\mu_i - \mu_j$ is

\[\left[ (\bar{x}_i - \bar{x}_j) - t_{n - k; 1 - \alpha / 2} \sqrt{MS_{Er} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)}, (\bar{x}_i - \bar{x}_j) + t_{n - k; 1 - \alpha / 2} \sqrt{MS_{Er} \left( \frac{1}{n_i} + \frac{1}{n_j} \right)} \right]\]

for any pair of groups $(i, j)$.