Roll a dice of 10 faces - What are $\Omega$, $\mathcal{F}$, and $P$?
Axioms
Other properties:
If $A\subset B$, $P(A)\leq P(B)$
$P(A\cap B)\leq \min(P(A),P(B))$
Union bound: $P(A\cup B)\leq P(A)+P(B)$
Complement set: $P(\Omega\backslash A)=1-P(A)$
Independence $A\perp B$:
Conditional independence $A\perp B|C$:
Roll a dice of 10 faces
(Statistics flavor)
Consider a dice of 4 faces, and you were told the numbers are either all even or all odd.
For continuous RV,
Properties:
"Spread-out in outcome": $\sigma^2=Var[X]=\mathbb{E}[X-\mathbb{E}[X]]^2$
Alternative form: $\mathbb{E}[X-\mathbb{E}[X]]^2 = \mathbb{E}[X^2]-\mathbb{E}[X]^2$
$Var[af(X)]=a^2Var[f(X)]$
For two continuous RV's,
More properties:
$Var[X+Y]=Var[X]+Var[Y]+2Cov[X,Y]$
If $X$ and $Y$ are independent, $Cov[X,Y]=0$
and $\mathbb{E}[f(X)g(Y)]=\mathbb{E}[f(X)]\mathbb{E}[g(Y)]$
Maximum Likelihood
Reference: [PRML] §2.3, [MLAPP] §4.3, 4.4
The Multivariate Normal / Gaussian distribution with
Partition $x \sim \mathcal{N}(\mu, \Sigma)$ as $x = [x_a, x_b]^T$, and
Exercise: Marginals are obtained by taking a subset of rows and columns:
$$ \begin{align} P(x_a) &= \int P(x_a, x_b) \,dx_b \\ &= \mathcal{N}(x_a | \mu_a, \Sigma_{aa}) \end{align} $$Marginals are Gaussian!
Exercise: Conditionals are given by
$$ \begin{align} P(x_a | x_b) &= \mathcal{N}(x_a | \mu_{a|b}, \Sigma_{a|b}) \\ \Sigma_{a|b} &= \Lambda_{aa}^{-1} = \Sigma_{aa} - \Sigma_{ab}\Sigma_{bb}^{-1} \Sigma_{ba} \\ \mu_{a|b} &= \Sigma_{a|b} \left[ \Lambda_{aa}\mu_a - \Lambda_{ab}(x_b-\mu_b) \right] \\ &= \mu_a - \Lambda_{aa}^{-1}\Lambda_{ab}(x_b - \mu_b) \\ &= \mu_a + \Sigma_{ab} \Sigma_{bb}^{-1} (x_b - \mu_b) \end{align} $$Let's consider $D=2$ $$ v = \begin{bmatrix} x \\ y \end{bmatrix} \sim \frac{1}{2\pi} \frac{1}{|\Sigma|^{1/2}} \exp\left[ -\frac{1}{2} (v-\mu)^T \Sigma^{-1} (v - \mu) \right] $$
$$ \mu = \begin{bmatrix} \mu_x \\ \mu_y \end{bmatrix} \quad \Sigma = \begin{bmatrix} \sigma_{xx} & \sigma_{xy} \\ \sigma_{yx} & \sigma_{yy} \end{bmatrix} $$$$ \sigma_{xy} = \sigma_{yx} = \rho\sigma_x\sigma_y $$Let's look at (1) $p(x,y)$, (2) $p(x)$ and $p(y)$, (3) $p(y|x=x_0)$
# This function plots a bi-variate Gaussian, the marginalized x and y distributions,
# and the distribution conditioned on a certain value of x
# It also returns 20000 samples drawn from the given distribution, so that
# one can compare statistical mean/variance/covariance to the parameters in Gaussian.
rv = plot_mvn(0.0, # mu_x
0.0, # mu_y
1.0, # var_x
1.0, # var_y
0.0, # rho = cov_xy / (sigma_x*sigma_y)
loc=-1.0) # x to be conditioned on
# Computing a few statistical quantities
x = rv[:,0] # X-coordinates of the samples
y = rv[:,1] # Y-coordinates of the samples
mean_x = np.mean(x)
mean_y = np.mean(y)
var_x = np.mean((x-mean_x)**2)
var_y = np.mean((y-mean_y)**2)
cov_xy = np.mean((x-mean_x)*(y-mean_y))
print(np.array([mean_x, mean_y, var_x, var_y, cov_xy]))
[-0.01097573 -0.00727679 1.0021389 0.98853092 0.0036498 ]
Suppose we want to estimate a parameter $\mu$ from data.
Initially, before new data comes in, $$ \mu \sim \mathcal{N}(\mu_0,\sigma_0^2) $$
Now we get some new noisy data, $$ x \sim \mathcal{N}(x_0,\sigma^2) $$
Given the new data, what is the new estimate of the parameter? $$ \mu \sim \mathcal{N}(\mu_1,\sigma_1^2) $$
bayes_update(1.0, # Mean of the prior
4.0, # Variance of the prior, i.e. confidence
2.0, # Mean of the likelihood
16.0) # Variance of the likelihood