机器学习数学学习笔记:Chapter 2. Linear Algebra
Chapter 2. Linear Algebra
Resources
Gilbert Strang’s Linear Algebra course
Linear Algebra Series by 3Blue1Brown
Algebra
Construct a set of objects (symbols) and a set of rules to manipulate these objects.
Vector objects:
Geometric vectors
Polynomials
Audio signals are vectors
Elements of R n \R^n Rn are vectors
2.1 Systems of Linear Algebra
a
11
X
1
+
⋅
⋅
⋅
+
a
1
n
x
n
=
b
1
a_{11}X_1 + ··· + a_{1n}x_n = b_1
a11X1+⋅⋅⋅+a1nxn=b1
a
m
1
X
1
+
⋅
⋅
⋅
+
a
m
n
x
n
=
b
m
a_{m1}X_1 + ··· + a_{mn}x_n = b_m
am1X1+⋅⋅⋅+amnxn=bm
where a i j ∈ R a_{ij} \in \R aij∈R and b i ∈ R b_i \in \R bi∈R
The above is the general form of a system of linear algebra. ( x 1 , . . . . . . , x n ) ∈ R n (x_1,......,x_n) \in \R^n (x1,......,xn)∈Rn that satisfies above equations is a solution of the linear equation system
Every linear equation represents a line
The solution set is the intersection of these lines
[ a 11 ⋮ a m 1 ] x 1 + [ a 12 ⋮ a m 2 ] x 2 + ⋯ + [ a 1 n ⋮ a m n ] x n = [ b 1 ⋮ b m ] \left[ \begin{matrix} a_{11} \\ \vdots \\ a_{m1} \end{matrix} \right] x_1 + \left[ \begin{matrix} a_{12} \\ \vdots \\ a_{m2} \end{matrix} \right] x_2 + \cdots + \left[ \begin{matrix} a_{1n} \\ \vdots \\ a_{mn} \end{matrix} \right] x_n = \left[ \begin{matrix} b_{1} \\ \vdots \\ b_{m} \end{matrix} \right] ⎣⎢⎡a11⋮am1⎦⎥⎤x1+⎣⎢⎡a12⋮am2⎦⎥⎤x2+⋯+⎣⎢⎡a1n⋮amn⎦⎥⎤xn=⎣⎢⎡b1⋮bm⎦⎥⎤
[ a 11 ⋯ a 1 n ⋮ ⋮ a m 1 ⋯ a m n ] [ x 1 ⋮ x n ] = [ b 1 ⋮ b m ] \left[ \begin{matrix} a_{11} \cdots a_{1n} \\ \vdots \vdots\\ a_{m1} \cdots a_{mn} \end{matrix} \right] \left[\begin{matrix} x_1 \\\vdots\\ x_n \end{matrix} \right]= \left[ \begin{matrix} b_{1} \\ \vdots \\ b_{m} \end{matrix} \right] ⎣⎢⎡a11⋯a1n⋮⋮am1⋯amn⎦⎥⎤⎣⎢⎡x1⋮xn⎦⎥⎤=⎣⎢⎡b1⋮bm⎦⎥⎤
2.2 Matrices
A = [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋮ a m 1 a m 2 ⋯ a m n ] , a i j ∈ R A=\left [\begin{matrix} a_{11} \ a_{12} \ \cdots a_{1n} \\ a_{21}\ a_{22}\ \cdots a_{2n} \\ \vdots \ \ \vdots \ \ \ \ \ \ \ \ \ \ \ \ \ \vdots \\ a_{m1} \ a_{m2} \cdots a_{mn} \end{matrix}\right], a_{ij} \in \R A=⎣⎢⎢⎢⎡a11 a12 ⋯a1na21 a22 ⋯a2n⋮ ⋮ ⋮am1 am2⋯amn⎦⎥⎥⎥⎤,aij∈R
(1,n) - matrices = rows
(m,1) - matrices = columns
2.2.1 Addition and Multiplication
-
Addition
- plus each other
-
Multiplication
- c i j = ∑ l = 1 n a i j b i j , i = 1 , ⋯ , m , j = 1 , ⋯ , k c_{ij} = \displaystyle \sum^{n}_{l=1}{a_{ij}b_{ij}} , i = 1, \cdots ,m, j = 1, \cdots , k cij=l=1∑naijbij,i=1,⋯,m,j=1,⋯,k.
-
For dimensions: M \cdot k \times k \cdot N = M \times N
2.2.2 Inverse and Transpose
-
Inverse
- A B = I n = B A AB=I_n = BA AB=In=BA B is called the inverse of A and denoted by A − 1 A^{-1} A−1
- A = [ a 11 a 12 a 21 a 22 ] = > A − 1 = 1 ( a 11 a 22 − a 12 a 21 ) [ a 22 − a 12 − a 21 a 11 ] A = \left[ \begin{matrix} a_{11} \ \ \ a_{12} \\ a_{21} \ \ \ a_{22} \end{matrix}\right] => A^{-1} = \frac{1}{( a_{11} a_{22} - a_{12}a_{21})} \left[ \begin{matrix} a_{22} \ \ \ -a_{12} \\ -a_{21} \ \ \ \ \ a_{11} \end{matrix} \right] A=[a11 a12a21 a22]=>A−1=(a11a22−a12a21)1[a22 −a12−a21 a11]
-
Transpose
- A ∈ R m × n a n d B ∈ R n × m w i t h b i j = a j i A \in \R^{m\times n} and B\in \R^{n\times m} with b_{ij} = a_{ji} A∈Rm×nandB∈Rn×mwithbij=aji is called the transpose of A . B = A T A. B = A^T A.B=AT
-
Important Properties
- A A − 1 = I = A − 1 A AA^{-1} = I = A^{-1}A AA−1=I=A−1A
- ( A B ) − 1 = B − 1 A − 1 (AB)^{-1} = B^{-1}A^{-1} (AB)−1=B−1A−1
- ( A + B ) − 1 ≠ A − 1 + B − 1 (A+B)^{-1} \neq A^{-1} + B^{-1} (A+B)−1=A−1+B−1
- ( A T ) T = A (A^T)^T = A (AT)T=A
- ( A + B ) T = A T + B T (A+B)^T = A^T + B^T (A+B)T=AT+BT
- ( A B ) T = B T A T (AB)^T = B^T A^T (AB)T=BTAT
-
Symmetric Matrix
- A T = A A^T = A AT=A
2.2.3 Multiplication by Scalar
-
A ∈ R m × n A\in \R^{m\times n} A∈Rm×n and λ ∈ R T h e n , λ A = K , K i j = λ a i j \lambda \in \R Then, \lambda A = K, K _{ij} = \lambda a_{ij} λ∈RThen,λA=K,Kij=λaij
-
Associativity
- ( λ ψ ) C = λ ψ ( C ) , C ∈ R m × n (\lambda \psi)C = \lambda \psi(C), C\in \R^{m\times n} (λψ)C=λψ(C),C∈Rm×n
- λ ( B C ) = ( λ B ) C = B ( λ C ) = ( B C ) λ , B ∈ R m × n , C ∈ R n × k \lambda(BC) = (\lambda B)C=B(\lambda C) = (BC)\lambda, B \in \R^{m\times n}, C\in \R^{n\times k} λ(BC)=(λB)C=B(λC)=(BC)λ,B∈Rm×n,C∈Rn×k
- ( λ C ) T = C T λ T = C T λ = λ C T s i n c e λ = λ T f o r a l l λ ∈ R (\lambda C)^T = C^T\lambda^T=C^T\lambda = \lambda C^T since \lambda = \lambda^T for all \lambda \in \R (λC)T=CTλT=CTλ=λCTsinceλ=λTforallλ∈R
-
Distributivity
- ( λ + ψ ) C = λ C + ψ C , C ∈ R m × n (\lambda + \psi)C = \lambda C + \psi C , C \in \R^{m\times n} (λ+ψ)C=λC+ψC,C∈Rm×n
- λ ( B + C ) = λ B + ψ C , B , C ∈ R m × n \lambda(B+C) = \lambda B + \psi C, B,C \in \R^{m\times n} λ(B+C)=λB+ψC,B,C∈Rm×n
2.3 Solving Systems of Linear Equations
2.3.1 Particular and General Solution
-
The general approach we followed by three steps:
- Find a particular solution to A x = b Ax = b Ax=b
- Find all solutions to A x = 0 Ax = 0 Ax=0
- Combine the above steps to the general solution
-
Gasussian Elimination
- Transform system of linear equations into particular simple form. Only simple form can use above three steps
2.3.2 Elementary Transformation
-
Keep the solution set the same, but transfor the equation system into a simple form (Equation = Row):
- Exchange the two equations
- Multiplication of equations with a constant
- Addition of two equations
-
Pivot
- The leading coefiicient of a row (first nonzero number from the left) is called the pivot .
-
REF- Row Echelon From
- The variable corresponding to the pivots in the REF are called basic variable and the other variables are free variables.
-
Gaussian Elimination
- An algorithm that performs elementary transformation to bring a system of linear equations into reduced REF
-
Reduced REF
-
A = [ 1 3 0 0 3 0 0 1 0 9 0 0 0 1 − 4 ] A = \left[\begin{matrix} 1 \ 3 \ 0 \ 0 \ \ \ \ \ \ 3 \\ 0 \ 0\ 1 \ 0 \ \ \ \ \ \ 9 \\ 0 \ 0 \ 0 \ 1 \ -4 \end{matrix} \right] A=⎣⎡1 3 0 0 30 0 1 0 90 0 0 1 −4⎦⎤
-
The pivot is the position of the first row the first column, the second row the third column and the third row the fourth column.
-
The key idea for find the solutions of A x = 0 Ax = 0 Ax=0 is to look at the non-pivot columns , which we will need to expree as a linear combination of the pivot columns.
-
Solutions:
- [ 3 − 1 0 0 0 ] a n d [ 3 0 9 − 4 − 1 ] \left[ \begin{matrix} 3 \ -1 \ 0 \ 0 \ 0 \end{matrix} \right] and \left[ \begin{matrix} 3 \ 0 \ 9 \ -4 \ -1 \end{matrix} \right] [3 −1 0 0 0]and[3 0 9 −4 −1]
-
2.3.3 The Minus-1 Trick
-
The columns of A ~ \widetilde{A} A that contain the -1 pivots are solutions of the homogeneous equation system Ax = 0 ,These columns form a basis of the solution space of A x = 0 Ax = 0 Ax=0 , which we call the kernel or null space
-
Example
- A = [ 1 3 0 0 3 0 0 1 0 9 0 0 0 1 − 4 ] A = \left [ \begin{matrix} 1 \ \ \ \ \ 3 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 3 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 1 \ \ \ \ \ 0 \ \ \ \ \ 9 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 1 -4 \end{matrix} \right] A=⎣⎡1 3 0 0 30 0 1 0 90 0 0 1−4⎦⎤
- augment this matrix to a 5\times 5 matrix by adding rows of the Minus -1 at the place where the pivots on the diagonal are missing.
- Obtain: A = [ 1 3 0 0 3 0 − 1 0 0 0 0 0 1 0 9 0 0 0 1 − 4 0 0 0 0 − 1 ] A = \left [ \begin{matrix} 1 \ \ \ \ \ 3 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 3 \\ 0 -1\ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 1 \ \ \ \ \ 0 \ \ \ \ \ 9 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 1 -4 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 -1 \end{matrix} \right] A=⎣⎢⎢⎢⎢⎡1 3 0 0 30−1 0 0 00 0 1 0 90 0 0 1−40 0 0 0−1⎦⎥⎥⎥⎥⎤
- Soltions: [ 3 − 1 0 0 0 ] a n d [ 3 0 9 − 4 − 1 ] \left[ \begin{matrix} 3\\ -1 \\0 \\0 \\0 \end{matrix} \right] and \left[ \begin{matrix} 3\\ 0 \\ 9 \\ -4 \\-1 \end{matrix} \right] ⎣⎢⎢⎢⎢⎡3−1000⎦⎥⎥⎥⎥⎤and⎣⎢⎢⎢⎢⎡309−4−1⎦⎥⎥⎥⎥⎤
-
Colculating the Inverse
-
The matrix is augmented by inserting a matrix of diagonal to 1 to the right. Then, use Gaussian Elimination to bring it into REF. Finally, the right side is the inverse of the matrix
-
A X = I n , X = A − 1 AX = I_{n} , X = A ^{-1} AX=In,X=A−1
-
Example:
- Origin Matrix:
- [ 1 0 2 0 1 1 0 0 1 2 0 1 1 1 1 1 ] \left[ \begin{matrix} 1 \ 0 \ 2 \ 0 \\1 \ 1 \ 0 \ 0 \\ 1\ 2\ 0 \ 1 \\ 1 \ 1\ 1\ 1 \end{matrix} \right] ⎣⎢⎢⎡1 0 2 01 1 0 01 2 0 11 1 1 1⎦⎥⎥⎤
- Augmented:
- [ 1 0 2 0 ∣ 1 0 0 0 1 1 0 0 ∣ 0 1 0 0 1 2 0 1 ∣ 0 0 1 0 1 1 1 1 ∣ 0 0 0 1 ] \left[ \begin{matrix} 1 \ 0\ 2 \ 0 \ | \ 1 \ 0 \ 0 \ 0 \\ 1 \ 1 \ 0 \ 0 \ | \ 0 \ 1 \ 0 \ 0 \\ 1 \ 2 \ 0 \ 1 \ | \ 0 \ 0 \ 1 \ 0 \\ 1 \ 1 \ 1 \ 1 \ | \ 0 \ 0 \ 0 \ 1 \end{matrix} \right] ⎣⎢⎢⎡1 0 2 0 ∣ 1 0 0 01 1 0 0 ∣ 0 1 0 01 2 0 1 ∣ 0 0 1 01 1 1 1 ∣ 0 0 0 1⎦⎥⎥⎤
- Gaussian Elimination
- [ 1 0 0 0 ∣ 0 − 1 2 − 2 0 1 0 0 ∣ 1 − 1 2 − 2 0 0 1 0 ∣ 1 − 1 1 − 1 0 0 0 1 ∣ − 1 0 − 1 2 ] \left[ \begin{matrix} 1 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ | \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2 \\ 0 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ | \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2 \\ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ | \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \\ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ | \ \ \ \ -1 \ \ \ \ \ \ \ \ \ \ 0 \ \ \ \ -1 \ \ \ \ \ \ \ \ \ \ 2 \end{matrix} \right] ⎣⎢⎢⎡1 0 0 0 ∣ 0 −1 2 −20 1 0 0 ∣ 1 −1 2 −20 0 1 0 ∣ 1 −1 1 −10 0 0 1 ∣ −1 0 −1 2⎦⎥⎥⎤
- The Verse is A − 1 = [ 0 − 1 2 − 2 1 − 1 2 − 2 1 − 1 1 − 1 − 1 0 − 1 2 ] A ^{-1} = \left[ \begin{matrix} \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2 \\ \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2 \\ \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \\ \ \ \ \ -1 \ \ \ \ \ \ \ \ \ \ 0 \ \ \ \ -1 \ \ \ \ \ \ \ \ \ \ 2 \end{matrix} \right] A−1=⎣⎢⎢⎡ 0 −1 2 −2 1 −1 2 −2 1 −1 1 −1 −1 0 −1 2⎦⎥⎥⎤
-
2.4 Vectors Space
A set of elements and an operation defined on these elements that keeps some structure of the set intact.
2.4.1 Groups
-
Consider a set G \mathcal{G} G and an operation ⊗ : G × G → G \otimes :\mathcal{G} \times \mathcal{G} \rightarrow \mathcal{G} ⊗:G×G→G defined on G \mathcal{G} G. Then, G : = ( G , ⊗ ) G := (\mathcal{G}, \otimes) G:=(G,⊗) is called a group if the following hold:
- Closure of G \mathcal{G} G under ⊗ : ∀ x , y ∈ G : ( x ⊗ y ) ∈ G \otimes : \forall x,y\in \mathcal{G}:(x \otimes y) \in \mathcal{G} ⊗:∀x,y∈G:(x⊗y)∈G
- Associativity: ∀ x , y , z ∈ G : ( x ⊗ y ) ⊗ z = x ⊗ ( y ⊗ z ) \forall x,y , z\in \mathcal{G}:(x \otimes y) \otimes z = x \otimes (y \otimes z) ∀x,y,z∈G:(x⊗y)⊗z=x⊗(y⊗z)
- Neutral element: ∃ e ∈ G ∀ x ∈ G : x ⊗ e = x a n d e ⊗ x = x \exists \ e \in \mathcal{G} \ \forall x \in \mathcal{G} : x\otimes e = x \ and \ e \otimes x = x ∃ e∈G ∀x∈G:x⊗e=x and e⊗x=x
- Inverse element: ∀ x ∈ G ∃ y ∈ G : x ⊗ y = e a n d y ⊗ x = e \forall \ x \in \mathcal{G} \ \exists y \in \mathcal{G} : x\otimes y = e \ and \ y \otimes x = e ∀ x∈G ∃y∈G:x⊗y=e and y⊗x=e , where e is the neutral element.
-
If additionally ∀ x , y ∈ G : x ⊗ y = y ⊗ x , t h e n G = ( G , ⊗ ) \forall x, y \in \mathcal{G} : x \otimes y = y \otimes x , then \ G = (\mathcal{G},\otimes) ∀x,y∈G:x⊗y=y⊗x,then G=(G,⊗) is an Abelian group (commutative)
-
Examples:
- ( Z , + ) (\Z , +) (Z,+) is an Abelian group
- ( N 0 , + ) (\N_0 , + ) (N0,+) is not a group, because the inverse elements are missing.
2.4.2 Vector Spaces
- A real-valued vector space V = ( V , + , ⋅ ) V = (V, +, ·) V=(V,+,⋅) is a set \mathcal{V} with two operations: + : V × V → V ⋅ : R × V → V \\ + : \mathcal{V} \times \mathcal{V} \rightarrow \mathcal{V} \\ \ · \ : \R \times \mathcal{V} \rightarrow \mathcal{V} +:V×V→V ⋅ :R×V→V
- where 1. ( V , + ) (\mathcal{V}, +) (V,+) is an Abelian group.
- 2, Distributivity: 1. ∀ λ ∈ R , x , y ∈ V : λ ⋅ ( x + y ) = λ ⋅ x + λ ⋅ y 2. ∀ λ , ψ ∈ R , x ∈ V : ( λ + ψ ) ⋅ x = λ ⋅ x + ψ ⋅ x \\ \ \ \ \ \ 1. \forall \lambda \in \R , x,y \in \mathcal{V}: \lambda ·(x + y) = \lambda·x + \lambda·y \\ \ \ \ \ \ 2. \forall \lambda, \psi \in \R, x \in \mathcal{V}: (\lambda + \psi) · x = \lambda·x + \psi·x 1.∀λ∈R,x,y∈V:λ⋅(x+y)=λ⋅x+λ⋅y 2.∀λ,ψ∈R,x∈V:(λ+ψ)⋅x=λ⋅x+ψ⋅x
-
- Associativity (outer operation): ∀ λ , ψ ∈ R , x ∈ V : λ ⋅ ( ψ ⋅ x ) = ( λ ψ ) ⋅ x \forall \lambda, \psi \in \R , x \in \mathcal{V} : \lambda · (\psi · x) = (\lambda\psi) · x ∀λ,ψ∈R,x∈V:λ⋅(ψ⋅x)=(λψ)⋅x
-
- Neutral element with repsect to the outer operation: ∀ x ∈ V : 1 ⋅ x = x \forall x\in \mathcal{V} : 1 ·x=x ∀x∈V:1⋅x=x
- The element x ∈ V x\in \mathcal{V} x∈V are called vectors.
- The neutral element of ( V , + ) (\mathcal{V},+) (V,+) is the zero vector 0 = [ 0 , ⋯ , 0 ] T 0 = [0,\cdots,0]^T 0=[0,⋯,0]T, the inner operation + is called vector addition.
- The λ ∈ R \lambda \in \R λ∈R are called scalars. the outer operation · is a multiplication by scalars.
- A “vector multiplication” a b , a , b ∈ R n ab, a, b \in \R^{n} ab,a,b∈Rn is not defined.
- Defined multiplication: a b T ∈ R n × n ab^T \in \R^{n\times n} abT∈Rn×n (outer product), a T b ∈ R a^Tb \in \R aTb∈R (inner/scalar/dot product
2.3.4 Vector Subspaces
- Let V = ( V , + , ⋅ ) V = (V,+,·) V=(V,+,⋅) be a vector space and U ⊆ V , U ≠ ∅ \mathcal{U} \subseteq\mathcal{V},\mathcal{U} \neq \emptyset U⊆V,U=∅ . Then U = ( U , + , ⋅ ) U = (\mathcal{U},+,·) U=(U,+,⋅) is called vector subspace of V V V (or linear subspace) if U U U is a vector space with the vector space operations + and · restricted to U × U a n d R × U \mathcal{U} \times \mathcal{U} and \R \times \mathcal{U} U×UandR×U . We write U ⊆ V U\subseteq V U⊆V to denote a subspace U U U of V V V
- Every subspsce U ⊆ ( R n , + , ⋅ ) U \subseteq (\R^n ,+,·) U⊆(Rn,+,⋅) is the solution space of a homogeneous system of linear equations A x = 0 Ax=0 Ax=0 for x ∈ R n x\in\R^n x∈Rn
2.5 Linear Independence
Linear Combination
- Consider a vector space V V V and a finite number of vectors x 1 , ⋯ , x k ∈ V x_1,\cdots,x_k \in V x1,⋯,xk∈V. Then, every v ∈ V v\in V v∈V of the form v = λ 1 x 1 + ⋯ + λ k x k = ∑ i = 1 k λ i x i ∈ V \\ v = \lambda_1x_1 + \cdots + \lambda_kx_k = \sum^{k}_{i=1} \lambda_ix_i \in V v=λ1x1+⋯+λkxk=∑i=1kλixi∈V with λ 1 , ⋯ , λ k ∈ R \lambda_1,\cdots,\lambda_k \in \R λ1,⋯,λk∈R is a linear combination of the vectors x 1 , ⋯ , x k x_1,\cdots,x_k x1,⋯,xk.
Linear (In)dependence
- Let us consider a vector space V V V with k ∈ N a n d x 1 , ⋯ , x k ∈ V k\in\N and x_1,\cdots,x_k \in V k∈Nandx1,⋯,xk∈V. If there is a non-trivial linear combination, such that 0 = ∑ i = 1 k λ i x i 0 = \sum_{i=1}^{k} \lambda_ix_i 0=∑i=1kλixi with at least one λ i ≠ 0 \lambda_i \neq 0 λi=0. The vectors x 1 , ⋯ , x k x_1,\cdots,x_k x1,⋯,xk are linear dependent. If only the trivial solution exists, i.e., λ 1 = ⋯ = λ k = 0 \lambda_1 = \cdots = \lambda_k = 0 λ1=⋯=λk=0 the vectors x 1 , ⋯ , x k x_1,\cdots,x_k x1,⋯,xk are linear independent.
either independent or dependent. No third option.
The v e c t o r { x 1 , ⋯ , x k : x i ≠ 0 , i = 1 , ⋯ k } , k ⩾ 2 vector \{x_1,\cdots,x_k : x_i \neq 0, i = 1,\cdots k\}, k \geqslant 2 vector{x1,⋯,xk:xi=0,i=1,⋯k},k⩾2 linearly dependent if and oly if at least one of them is a linear combination of the others. In particular, if one vector is a multiple of another vector then the set is linearly dependent.
determine by Gaussian Elimination in REF.
- All column vectors are linearly independent if and only if all columns are pivot columns. If there is at least one non-pivot column, the columns are dependent.
2.6 Basis and Rank
2.6.1 Generating Set and Basis
-
Generating Set ans Span
- Consider a vector space V = ( V , + , ⋅ ) V = (\mathcal{V},+,·) V=(V,+,⋅) and set of vectors A = { x 1 , ⋯ , x k } ⊆ V \mathcal{A} = \{ x_1,\cdots,x_k\} \subseteq \mathcal{V} A={x1,⋯,xk}⊆V. If every vector v ∈ V v \in \mathcal{V} v∈V can be expressed as a linear combination of x 1 , ⋯ , x k , A x_1,\cdots,x_k,\mathcal{A} x1,⋯,xk,A is called a generating set of V$. The set of all linear combination of vectors in A \mathcal{A} A is called the span of A \mathcal{A} A. If A \mathcal{A} A spans the vector space V V V, we write V = s p a n [ A ] V = span[\mathcal{A}] V=span[A] or V = s p a n [ x 1 , ⋯ , x k ] V = span\left[x_1,\cdots,x_k \right] V=span[x1,⋯,xk].
- Generating sets are sets of vectors that span vector (sub)spaces.
-
Basis
- There exists no smaller set that spans V V V. Every linearly independent generating set of V V V is minimal and is called a basis of V V V.
-
In R 3 \R^3 R3, the canonical/standard basis is B = { [ 1 0 0 ] , [ 0 1 0 ] , [ 0 0 1 ] } \\ \mathcal{B} = \{ \left[ \begin{matrix} 1 \\ 0 \\0 \end{matrix} \right] , \left[ \begin{matrix} 0 \\ 1 \\0 \end{matrix} \right] , \left[ \begin{matrix} 0 \\ 0 \\1 \end{matrix} \right] \} B={⎣⎡100⎦⎤,⎣⎡010⎦⎤,⎣⎡001⎦⎤}
-
Determining a Basis
- REF: [ 1 2 3 − 1 0 1 2 − 2 0 0 0 1 0 0 0 0 0 0 0 0 ] \left[ \begin{matrix} 1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ 3 \ \ \ \ \ \ \ -1\\0 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2\\ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 1\\ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0 \\0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0\end{matrix} \right] ⎣⎢⎢⎢⎢⎡1 2 3 −10 1 2 −20 0 0 10 0 0 00 0 0 0⎦⎥⎥⎥⎥⎤
- The pivot columns indicate which set of vectors is linearly independent, see from REF that x 1 , x 2 , x 3 , x 4 x_1,x_2,x_3,x_4 x1,x2,x3,x4 are linearly independent. because λ 1 x 1 + λ 2 x 2 + λ 4 x 4 = 0 \lambda_1 x_1 + \lambda_2 x_2 + \lambda_4 x_4 = 0 λ1x1+λ2x2+λ4x4=0 can only be solved with λ 1 = λ 2 = λ 4 = 0 \lambda_1 = \lambda_2 = \lambda_4 = 0 λ1=λ2=λ4=0. Therefore, { x 1 , x 2 , x 3 , x 4 } \{x_1,x_2,x_3,x_4\} {x1,x2,x3,x4} is a basis of U U U.
2.6.2 Rank
-
The number of linearly independent columns of a matrix A ∈ R m × n A \in \R^{m\times n} A∈Rm×n equals the number of linearly independent row and is called the rank of A A A and is denoted by r k ( A ) rk(A) rk(A)
-
Important properties
- r k ( A ) = r k ( A T ) rk(A) = rk(A^T) rk(A)=rk(AT)
- The columns of A ∈ R m × n A\in \R^{m\times n} A∈Rm×n span a subspace U ⊆ R m U\subseteq\R^{m} U⊆Rm with d i m ( U ) = r k ( A ) dim(U) = rk(A) dim(U)=rk(A) . We call this subspace the image or range. A A A basis of U U U can be found by applying Gaussian Elimination to A A A to identify the pivot columns.
- The Rows of A ∈ R m × n A \in \R^{m\times n} A∈Rm×n span a subspace W ⊆ R n w i t h d i m ( W ) = r k ( A ) W\subseteq \R^n with dim(W) = rk(A) W⊆Rnwithdim(W)=rk(A) . A basis of W W W can be found by applying Gaussian Elimination to A T A^T AT
- A = ∈ R n × n A= \in \R^{n\times n} A=∈Rn×n it holds that A A A is regular (invertible) if and only if r k ( A ) = n rk(A) = n rk(A)=n.
- A x = b Ax = b Ax=b can be solved if and only if r k ( A ) = r k ( A ∣ b ) rk(A) = rk(A|b) rk(A)=rk(A∣b). The A|b denotes the augmented system.
- A x = 0 Ax = 0 Ax=0 possessed dimension n − r k ( A ) n - rk(A) n−rk(A) . We call this subspace the kernel or the null space.
- When rank equals the largest possible rank for a matrix of the same dimensions that means the matrix has full rank.This mean that the rank of a full-rank matrix is the lesser of the number of rows and columns.
- If it does not have full rank, the matrix is said to be rank deficient.
2.7 Linear Mappings - Linear Transformation
Definition: For vector spaces V,W, a mapping Φ : V → W \Phi :V \rightarrow W Φ:V→W is called a linear mapping.(or vector space homomorphism/ linear transformation) if ∀ x , y ∈ V ∀ λ , ψ ∈ R : Φ ( λ x + ψ y ) = λ Φ ( x ) + ψ Φ ( y ) \\ \forall x,y \in V \ \forall \lambda,\psi \in \R : \Phi(\lambda x+\psi y) = \lambda \Phi(x) + \psi\Phi(y) ∀x,y∈V ∀λ,ψ∈R:Φ(λx+ψy)=λΦ(x)+ψΦ(y).
Φ : V → W V , W \Phi :\mathcal{V \rightarrow W} \ \ \mathcal{V,W} Φ:V→W V,W can be arbitrary sets. Then Φ \Phi Φ is called
- Injective if ∀ x , y ∈ V : Φ ( x ) = Φ ( y ) ⇒ x = y \forall x,y\in \mathcal{V}: \Phi(x) = \Phi(y) \Rightarrow x= y ∀x,y∈V:Φ(x)=Φ(y)⇒x=y
- Surjective if Φ ( V ) = W \Phi(\mathcal{V} ) = \mathcal{W} Φ(V)=W
- Bijective if it is injective and surjective.
Finite-dimensional vector spaces V V V and W W W are isomorphic if and only if d i m ( V ) = d i m ( W ) dim(V) = dim(W) dim(V)=dim(W).
2.7.1 Matrix Representation of Linear Mapping
- A Φ = [ a 1 , a 2 , a 3 ] = [ 1 2 0 − 1 1 3 3 7 1 − 1 2 4 ] A_{\Phi} = \left[\mathcal{a_1,a_2,a_3}\right] = \left[\begin{matrix} \ \ \ 1 \ 2 \ 0 \\ -1 \ 1\ 3 \\ \ \ \ 3 \ 7 \ 1 \\-1 \ 2 \ 4 \end{matrix}\right] AΦ=[a1,a2,a3]=⎣⎢⎢⎡ 1 2 0−1 1 3 3 7 1−1 2 4⎦⎥⎥⎤ where the a j , j = 1 , 2 , 3 a_j , j = 1,2,3 aj,j=1,2,3 are the coordinate vectors of Φ ( b j ) \Phi(b_j) Φ(bj) with respect to C C C.
- Basis Vectors
2.7.2 Basis Change
-
Equivalence
- Two matrices A , A ~ ∈ R m × n a r e e q u i v a l e n t i f t h e r e e x i s t r e g u l a r m a t r i c e s S ∈ R n × n A,\tilde{A}\in \R ^{m\times n} are equivalent if there exist regular matrices S\in \R^{n\times n} A,A~∈Rm×nareequivalentifthereexistregularmatricesS∈Rn×n and T ∈ R m × m T \in \R^{m\times m} T∈Rm×m , such that A ~ = T − 1 A S \tilde{A} = T^{-1}AS A~=T−1AS
-
Similarity
- Two matrices A , A ~ ∈ R n × n A,\tilde{A}\in \R ^{n\times n} A,A~∈Rn×n are similarity if there exist regular matrices S ∈ R n × n S\in \R^{n\times n} S∈Rn×n with A ~ = S − 1 A S \tilde{A} = S^{-1}AS A~=S−1AS
-
Transformation formular: A ~ = T − 1 A Φ S A Φ : B → C , A ~ Φ : B ~ → C ~ , S : B ~ → B , T : C ~ → C a n d T − 1 : C → C ~ B ~ → C ~ = B ~ → B → C → C ~ \\ \tilde{A} = T^{-1}A_{\Phi}S \\ A_{\Phi} : B \rightarrow C, \tilde{A}_{\Phi} : \tilde{B} \rightarrow \tilde{C} , S : \tilde{B} \rightarrow B, T:\tilde{C} \rightarrow C \ and \ \ T^{-1} : C \rightarrow \tilde{C} \\ \tilde{B} \rightarrow \tilde{C} = \tilde{B} \rightarrow B \rightarrow C \rightarrow \tilde{C} A~=T−1AΦSAΦ:B→C,A~Φ:B~→C~,S:B~→B,T:C~→C and T−1:C→C~B~→C~=B~→B→C→C~
2.7.3 Image and Kernel
-
For Φ : V → W \Phi : V \rightarrow W Φ:V→W, we defined the kernel/null space k e r ( Φ ) : = Φ − 1 ( 0 W ) = { v ∈ V : Φ ( v ) = 0 W } \\ ker(\Phi) := \Phi^{-1}(0_W) = \{v \in V : \Phi(v) = 0_W\} ker(Φ):=Φ−1(0W)={v∈V:Φ(v)=0W} and the image/range I m ( Φ ) : = Φ ( V ) = { w ∈ W ∣ ∃ v ∈ V : Φ ( v ) = w } \\ Im(\Phi) := \Phi(V) = \{w\in W | \exist v\in V : \Phi(v) = w\} Im(Φ):=Φ(V)={w∈W∣∃v∈V:Φ(v)=w} we call V V V and W W W also the domain and codomain of Φ \Phi Φ, respectively.
-
r k ( A ) = d i m ( I m ( Φ ) ) rk(A) = dim(Im(\Phi)) rk(A)=dim(Im(Φ))
-
[ 1 0 0 1 0 1 − 1 2 − 1 2 ] \left[ \begin{matrix} 1 \ \ \ \ 0 \ \ \ \ 0 \ \ \ \ \ \ 1 \\ 0 \ \ \ \ 1 -\frac{1}{2} -\frac{1}{2} \end{matrix} \right] [1 0 0 10 1−21−21] This matrix is in REF, and we can use the Minus-1 Trick to compute a basis of the kernel. Alternatively, we can express the non-pivot columns(3,4) as linear combinations of the pivot columns(1,2). The third column a 3 a_3 a3 is equivalent to − 1 2 -\frac{1}{2} −21 times the second column a 2 a_2 a2 . Therefor, 0 = a 3 + 1 2 a 2 0 = a_3 + \frac{1}{2}a_2 0=a3+21a2. In the same way, we see that a 4 = a 1 − 1 2 a 2 a_4 = a_1 - \frac{1}{2}a_2 a4=a1−21a2 and, therefore, 0 = a 1 − 1 2 a 2 − a 4 0 = a_1 - \frac{1}{2}a_2 -a_4 0=a1−21a2−a4. Overall, this gives us the kernel (null space) is k e r ( Φ ) = s p a n [ [ 0 1 2 1 0 ] , [ − 1 1 2 0 1 ] ] \\ ker(\Phi) = span[ \left[ \begin{matrix} 0 \\ \frac{1}{2} \\1 \\ 0 \end{matrix} \right], \left[ \begin{matrix} -1\\ \frac{1}{2} \\0 \\ 1 \end{matrix} \right]] ker(Φ)=span[⎣⎢⎢⎡02110⎦⎥⎥⎤,⎣⎢⎢⎡−12101⎦⎥⎥⎤]
-
Rank-Nullity Theorem
- For vector spaces V , W V,W V,W and a linear mapping Φ : V → W \Phi : V \rightarrow W Φ:V→W it holds that d i m ( k e r ( Φ ) ) + d i m ( I m ( Φ ) ) = d i m ( V ) \\ dim(ker(\Phi)) + dim(Im(\Phi)) = dim(V) dim(ker(Φ))+dim(Im(Φ))=dim(V)
- Fundamental theorem of linear mappings
- If d i m ( I m ( Φ ) ) < d i m ( V ) dim(Im(\Phi)) < dim(V ) dim(Im(Φ))<dim(V), then k e r ( Φ ) ker(\Phi) ker(Φ) is non-trivial, i.e., the kernel contains more than 0 v 0_v 0v and d i m ( k e r ( Φ ) ) > 1 dim(ker(\Phi)) > 1 dim(ker(Φ))>1
- If A Φ A_{\Phi} AΦ is the transformation matrix of Φ \Phi Φ with respect to an ordered basis and d i m ( I m ( Φ ) ) < d i m ( V ) dim(Im(\Phi)) < dim(V) dim(Im(Φ))<dim(V), then the system of linear equations A Φ x = 0 A_{\Phi}x = 0 AΦx=0 has infinitely many solutions.
- If d i m ( V ) = d i m ( W ) dim(V) = dim(W) dim(V)=dim(W), then the Φ \Phi Φ is bijective(includ: injective and surjective). since I m ( Φ ) ⊆ W Im(\Phi) \subseteq W Im(Φ)⊆W.
2.8 Affine Spaces
Resemble linear mapping
Affine Subspace
- Let V V V be a vector space, x 0 ∈ V x_0 \in V x0∈V and U ⊆ V U \subseteq V U⊆V a subspace. The the subset L = x 0 + U : = { x 0 + u : u ∈ U } = { v ∈ V ∣ ∃ u ∈ U : v = x 0 + u } ⊆ V \\ L = x_0 + U := \{x_0 + u : u \in U\} \\ \ \ \ \ = \{v\in V | \ \exist u\in U : v=x_0 + u\} \subseteq V L=x0+U:={x0+u:u∈U} ={v∈V∣ ∃u∈U:v=x0+u}⊆V is called affine subspace or linear manifold of V V V. U U U is called direction or direction space, and x 0 x_0 x0 is called support point.
- If x 0 ∉ U x_0 \notin U x0∈/U, an affine subspace is not a linear subspace(vector subspace) of V V V.
- Are points, lines, and planes in R 3 \R^3 R3, which do not (necessarily) go through the origin.
Affine Mappings
- For two vector spaces V , W V,W V,W , a linear mapping Φ : V → W \Phi : V \rightarrow W Φ:V→W, and a ∈ W a\in W a∈W, the mapping ϕ : V → W x ↦ a + Φ ( x ) \\ \phi: V \rightarrow W \\ x \mapsto a + \Phi(x) ϕ:V→Wx↦a+Φ(x) is an affine mapping from V V V to W W W . The vector a is called the translation vector of ϕ \phi ϕ.
- Affine mappings keep the geometric structure invariant. They also preserve the dimension and parallelism.