1.24¶

Logistics¶

OH: M 14:00-16:00

Clustering¶

Input: $x_1,x_2,...,x_n\in R^p$\ Goal: Identify groups (clusters) of data or points

In [ ]:
import numpy as np
u = np.array([1., 3., 5. ,7.])
print(u)
[1. 3. 5. 7.]
In [ ]:
from numpy import linalg as LA
LA.norm(u)
#check
np.sqrt(np.sum(u ** 2))
Out[ ]:
9.16515138991168
In [ ]:
#creating a matrix using stack in numpy:
u = np.array([1., 3., 5., 7.])
v = np.array([2., 4., 6., 8.])
w = np.array([9., 8., 7., 6.])
X = np.stack((u,v,w))
print(X)
[[1. 3. 5. 7.]
 [2. 4. 6. 8.]
 [9. 8. 7. 6.]]
In [ ]:
import matplotlib.pyplot as plt
x = np.linspace(-2,2,100)
y = x ** 2
plt.plot(x,y)
plt.show()

Probability¶

$Var(X)=E[(X-E[X])^2]=E[X^2]-E[X]^2$

Markov inequality¶

$P(X\ge a)\le \frac{E[X]}{a},\forall a>0$

Chebyshev's inequality¶

$P(|X-\mu|\ge a)\le \frac{Var(X)}{a}$

1.26¶

$Var(aX+Y)=Var(aX)+Var(Y)=a^2Var(X)+Var(Y)$, only if X and Y are independent

Weak laws of big numbers¶

$X_1,X_2,...,X_n$ are i.i.d RVs and $\frac{s_n}{n}=\frac{X_1+...+X_n}{n}$, then $\lim_{n\rightarrow \infty}\frac{s_n}{n} = E[X]$, or\ sample mean $\rightarrow$ population mean. That is:\ $\lim_{n\rightarrow \infty}P(|\frac{s_n}{n}-E[X]|>\epsilon)=0$

Proof¶

$E[\frac{s_n}{n}]=E[\frac{X_1+...+X_n}{n}]\ =\frac{E[X_1]+...+E[X_n]}{n}\ =E[X_1]$

$Var(\frac{s_n}{n})=\frac{1}{n^2}Var(X_1+...+X_n)=\frac{1}{n^2}nVar(X_1+...+X_n)=\frac{Var(X_1)}{n}$ So, $n\rightarrow \infty$, $Var[X]\rightarrow 0$.

And by Chebyshev, $P(|\frac{s_n}{n}-E[X]|>\epsilon) \le \frac{Var(\frac{s_n}{n})}{\epsilon ^2} = \frac{Var(X_1)}{n \epsilon ^2}$

K-means Clustering¶

Objective¶

$\min_{C_1,...,C_k} \min_{\mu_1,...,\mu_k \in \mathbf{R}^d} \sum_{i=1}^k \sum_{j\in C_i} ||x_i-\mu_i||^2$

Lemma:¶

$\mu_i^*=\frac{1}{|C_i|}\sum_{j \in C_i}x_j$

K-means algorithm (Alternating minimization)¶

  • Optimal representative ($\mu_i$, centroid of the cluster, which is the mean of the members)
  • Optimal clustering: update $C_1,...,C_k$

K-means by matrix:¶

$Z$ is the representative matrix, for example, if $C_1={1,4,6,8}, C_2={2,3,7}, C_3={5}$. Then $Z=\begin{bmatrix} 1&0&0\\ 0&1&0\\ 0&1&0\\ 1&0&0\\ 0&0&1\\ 1&0&0\\ 0&1&0\\ 1&0&0\\ \end{bmatrix}$

1.31¶

K-means clustering in high dimensional data¶

K means clustering performs bad in high dimensional data.

New phenomenon¶

Imagine an n-d cube and a ball inscribed in it. Randomly pick a point in the cube, what is the possibility of the point happens to fall in the ball?\ Let $B=\{x \in R^d:||x|| \leq \frac{1}{2}\}$ and $C=[-\frac{1}{2},\frac{1}{2}]^d$. Pick $\overrightarrow{X}$~$U[C]$, where $\overrightarrow{X}=(X_1,X_2,...,X_n)$ Then as $d \rightarrow + \infty$:\ $P[X\in B]\rightarrow 0$\ Why?\ PDF of $X_i=1$ on $[-\frac{1}{2},\frac{1}{2}]$, $P(\overrightarrow{X}=(x1,...,x_n))= P(X_1=x_1,...,X_d=x_d)$. So:\ $P[\overrightarrow{X} \in B]=\frac{Vol(B)}{Vol(Cube)}=Vol(B)\approx \frac{1}{n}$, which goes to 0 as $n \rightarrow 0$

2.7¶

Linear algebra review¶

If $V\subset R^n$, $\exists$ basis $u_1,...,u_r\in V$ s.t.:

  1. $V \subseteq span(u_1,...u_r)$

  2. $u_1,...,u_r$ are linearly independent ($r=dim(V)$) ### Orthogonality Let $u,v\in R^n$, we say $u$ and $v$ are orthogonal, $u\perp v$ if $<u,v>=u^Tv=0$ because $<u,v>=||u||\cdot ||v||\cdot cos\theta$, so $<u,v>=0 \iff cos \theta =0\iff \theta=\frac{\pi}{2}$ #### Properties:

  3. Symmetry $<u,v>=<v,u>$
  4. Linear $<au+v,w>=a<u,w>+<v,w>$
  5. Norm $<u,u>=||u||\cdot ||u||\cdot cos(0)=||u||^2$

Lemma.(Phythgoras)

  • If $u,v\in R^n, u\perp v$, then $||u+v||^2=||u||^2+||v||^2$
  • Proof: Symmetry

Lemma. (Cauchy-Schwartz)

  • If $u,v\in R^n$, then $|<u,v>|\le||u||\cdot||v||$
  • Proof: Linearity, decomposition

A list of vectors $[v_1,...,v_n]$ is orthonormal, then $<u_i,u_j>=0$ if $i\neq j$ and $=1$ if $i=j$.

Lemma.

  1. If $[v_1,...,v_n]$ is orthonormal, then they are linearly independent
  2. $||\sum_{i=1}^m \alpha_i u_i||^2=\sum_{i=1}^m \alpha_i^2$

Thm. (Orthonormal expansion)

$\{q_1,...,q_n\}=$ orthonormal basis of $U\subseteq R^n$. For any $w\in U$, $w=\sum_{i=1}^m<w,q_j>\cdot q_j$

Thm. (Orthorgonal projection)

The point where it with the ending and starting points of the vector form a right angle.

Def.

$U \subseteq R^n, v\in R^n$. Then:

  1. $\exists$ unique solution p to $\min_{P\in U}||p-v||$, denote p as $proj_{u}(v)$
  2. p satisfies: $<v-p,u>=0, \forall u\in U$ ($u$ is the origin to any point on the plane $U$)
  3. If $[v_1,...,v_n]$ is orthonormal, then $p*=proj_{\{v_1,...,v_n\}}(v)=\sum_{j=1}^m<v,q_j>q_j$

Orthorgonal in high dimension¶

2.28¶

PCA¶

Goal: Dimension Reduction (Find $\phi$ s.t. sample variance after projection is the maximum)¶

Conditional Numbers¶

$Ax=b$, how sensitive x is w.r.t. changing b?

$k(A)=||A||_2 ||A^{-1}||_2=\frac{\sigma_1}{\sigma_n}$

Thm.¶

$$\max_{d\neq0} \frac{\frac{||M(z+d)-Mz||}{Mz}}{\frac{||d||}{||z||}}\leq k_2(M)$$

$M=A^{-1}, z=b, d=\delta b$

$$x+\delta b=A^{-1}(b+\delta b)=A^{-1}b+A^{-1}\delta b$$

where $A^{-1}b=x$, that becomes:

$$x+A^{-1}\delta b=x+\delta x \Rightarrow A^{-1}\delta b = \delta x$$$$\frac{||\delta x||}{||x||} \leq k_2(A) \frac{||\delta b||}{||b||}$$
In [ ]:
import numpy as np

a=np.array([1,2,3,4])
b=np.array([5,8,9,2])

np.outer(a,b)
Out[ ]:
array([[ 5,  8,  9,  2],
       [10, 16, 18,  4],
       [15, 24, 27,  6],
       [20, 32, 36,  8]])