机器学习数学学习笔记:Chapter 2. Linear Algebra

Chapter 2. Linear Algebra

Resources

Gilbert Strang’s Linear Algebra course

Linear Algebra Series by 3Blue1Brown

Algebra

Construct a set of objects (symbols) and a set of rules to manipulate these objects.

Vector objects:

Geometric vectors

Polynomials

Audio signals are vectors

Elements of R n \R^n Rn are vectors

2.1 Systems of Linear Algebra

a 11 X 1 + ⋅ ⋅ ⋅ + a 1 n x n = b 1 a_{11}X_1 + ··· + a_{1n}x_n = b_1 a11X1++a1nxn=b1
a m 1 X 1 + ⋅ ⋅ ⋅ + a m n x n = b m a_{m1}X_1 + ··· + a_{mn}x_n = b_m am1X1++amnxn=bm

where a i j ∈ R a_{ij} \in \R aijR and b i ∈ R b_i \in \R biR

The above is the general form of a system of linear algebra. ( x 1 , . . . . . . , x n ) ∈ R n (x_1,......,x_n) \in \R^n (x1,......,xn)Rn that satisfies above equations is a solution of the linear equation system

Every linear equation represents a line

The solution set is the intersection of these lines

[ a 11 ⋮ a m 1 ] x 1 + [ a 12 ⋮ a m 2 ] x 2 + ⋯ + [ a 1 n ⋮ a m n ] x n = [ b 1 ⋮ b m ] \left[ \begin{matrix} a_{11} \\ \vdots \\ a_{m1} \end{matrix} \right] x_1 + \left[ \begin{matrix} a_{12} \\ \vdots \\ a_{m2} \end{matrix} \right] x_2 + \cdots + \left[ \begin{matrix} a_{1n} \\ \vdots \\ a_{mn} \end{matrix} \right] x_n = \left[ \begin{matrix} b_{1} \\ \vdots \\ b_{m} \end{matrix} \right] a11am1x1+a12am2x2++a1namnxn=b1bm

[ a 11 ⋯ a 1 n ⋮ ⋮ a m 1 ⋯ a m n ] [ x 1 ⋮ x n ] = [ b 1 ⋮ b m ] \left[ \begin{matrix} a_{11} \cdots a_{1n} \\ \vdots \vdots\\ a_{m1} \cdots a_{mn} \end{matrix} \right] \left[\begin{matrix} x_1 \\\vdots\\ x_n \end{matrix} \right]= \left[ \begin{matrix} b_{1} \\ \vdots \\ b_{m} \end{matrix} \right] a11a1nam1amnx1xn=b1bm

2.2 Matrices

A = [ a 11   a 12   ⋯ a 1 n a 21   a 22   ⋯ a 2 n ⋮    ⋮               ⋮ a m 1   a m 2 ⋯ a m n ] , a i j ∈ R A=\left [\begin{matrix} a_{11} \ a_{12} \ \cdots a_{1n} \\ a_{21}\ a_{22}\ \cdots a_{2n} \\ \vdots \ \ \vdots \ \ \ \ \ \ \ \ \ \ \ \ \ \vdots \\ a_{m1} \ a_{m2} \cdots a_{mn} \end{matrix}\right], a_{ij} \in \R A=a11 a12 a1na21 a22 a2n               am1 am2amn,aijR

(1,n) - matrices = rows

(m,1) - matrices = columns

2.2.1 Addition and Multiplication

  • Addition

    • plus each other
  • Multiplication

    • c i j = ∑ l = 1 n a i j b i j , i = 1 , ⋯   , m , j = 1 , ⋯   , k c_{ij} = \displaystyle \sum^{n}_{l=1}{a_{ij}b_{ij}} , i = 1, \cdots ,m, j = 1, \cdots , k cij=l=1naijbij,i=1,,m,j=1,,k.
  • For dimensions: M \cdot k \times k \cdot N = M \times N

2.2.2 Inverse and Transpose

  • Inverse

    • A B = I n = B A AB=I_n = BA AB=In=BA B is called the inverse of A and denoted by A − 1 A^{-1} A1
    • A = [ a 11     a 12 a 21     a 22 ] = > A − 1 = 1 ( a 11 a 22 − a 12 a 21 ) [ a 22     − a 12 − a 21       a 11 ] A = \left[ \begin{matrix} a_{11} \ \ \ a_{12} \\ a_{21} \ \ \ a_{22} \end{matrix}\right] => A^{-1} = \frac{1}{( a_{11} a_{22} - a_{12}a_{21})} \left[ \begin{matrix} a_{22} \ \ \ -a_{12} \\ -a_{21} \ \ \ \ \ a_{11} \end{matrix} \right] A=[a11   a12a21   a22]=>A1=(a11a22a12a21)1[a22   a12a21     a11]
  • Transpose

    • A ∈ R m × n a n d B ∈ R n × m w i t h b i j = a j i A \in \R^{m\times n} and B\in \R^{n\times m} with b_{ij} = a_{ji} ARm×nandBRn×mwithbij=aji is called the transpose of A . B = A T A. B = A^T A.B=AT
  • Important Properties

    • A A − 1 = I = A − 1 A AA^{-1} = I = A^{-1}A AA1=I=A1A
    • ( A B ) − 1 = B − 1 A − 1 (AB)^{-1} = B^{-1}A^{-1} (AB)1=B1A1
    • ( A + B ) − 1 ≠ A − 1 + B − 1 (A+B)^{-1} \neq A^{-1} + B^{-1} (A+B)1=A1+B1
    • ( A T ) T = A (A^T)^T = A (AT)T=A
    • ( A + B ) T = A T + B T (A+B)^T = A^T + B^T (A+B)T=AT+BT
    • ( A B ) T = B T A T (AB)^T = B^T A^T (AB)T=BTAT
  • Symmetric Matrix

    • A T = A A^T = A AT=A

2.2.3 Multiplication by Scalar

  • A ∈ R m × n A\in \R^{m\times n} ARm×n and λ ∈ R T h e n , λ A = K , K i j = λ a i j \lambda \in \R Then, \lambda A = K, K _{ij} = \lambda a_{ij} λRThen,λA=K,Kij=λaij

  • Associativity

    • ( λ ψ ) C = λ ψ ( C ) , C ∈ R m × n (\lambda \psi)C = \lambda \psi(C), C\in \R^{m\times n} (λψ)C=λψ(C),CRm×n
    • λ ( B C ) = ( λ B ) C = B ( λ C ) = ( B C ) λ , B ∈ R m × n , C ∈ R n × k \lambda(BC) = (\lambda B)C=B(\lambda C) = (BC)\lambda, B \in \R^{m\times n}, C\in \R^{n\times k} λ(BC)=(λB)C=B(λC)=(BC)λ,BRm×n,CRn×k
    • ( λ C ) T = C T λ T = C T λ = λ C T s i n c e λ = λ T f o r a l l λ ∈ R (\lambda C)^T = C^T\lambda^T=C^T\lambda = \lambda C^T since \lambda = \lambda^T for all \lambda \in \R (λC)T=CTλT=CTλ=λCTsinceλ=λTforallλR
  • Distributivity

    • ( λ + ψ ) C = λ C + ψ C , C ∈ R m × n (\lambda + \psi)C = \lambda C + \psi C , C \in \R^{m\times n} (λ+ψ)C=λC+ψC,CRm×n
    • λ ( B + C ) = λ B + ψ C , B , C ∈ R m × n \lambda(B+C) = \lambda B + \psi C, B,C \in \R^{m\times n} λ(B+C)=λB+ψC,B,CRm×n

2.3 Solving Systems of Linear Equations

2.3.1 Particular and General Solution

  • The general approach we followed by three steps:

    1. Find a particular solution to A x = b Ax = b Ax=b
    2. Find all solutions to A x = 0 Ax = 0 Ax=0
    3. Combine the above steps to the general solution
  • Gasussian Elimination

    • Transform system of linear equations into particular simple form. Only simple form can use above three steps

2.3.2 Elementary Transformation

  • Keep the solution set the same, but transfor the equation system into a simple form (Equation = Row):

    • Exchange the two equations
    • Multiplication of equations with a constant
    • Addition of two equations
  • Pivot

    • The leading coefiicient of a row (first nonzero number from the left) is called the pivot .
  • REF- Row Echelon From

    • The variable corresponding to the pivots in the REF are called basic variable and the other variables are free variables.
  • Gaussian Elimination

    • An algorithm that performs elementary transformation to bring a system of linear equations into reduced REF
  • Reduced REF

    • A = [ 1   3   0   0        3 0   0   1   0        9 0   0   0   1   − 4 ] A = \left[\begin{matrix} 1 \ 3 \ 0 \ 0 \ \ \ \ \ \ 3 \\ 0 \ 0\ 1 \ 0 \ \ \ \ \ \ 9 \\ 0 \ 0 \ 0 \ 1 \ -4 \end{matrix} \right] A=1 3 0 0      30 0 1 0      90 0 0 1 4

    • The pivot is the position of the first row the first column, the second row the third column and the third row the fourth column.

    • The key idea for find the solutions of A x = 0 Ax = 0 Ax=0 is to look at the non-pivot columns , which we will need to expree as a linear combination of the pivot columns.

    • Solutions:

      • [ 3   − 1   0   0   0 ] a n d [ 3   0   9   − 4   − 1 ] \left[ \begin{matrix} 3 \ -1 \ 0 \ 0 \ 0 \end{matrix} \right] and \left[ \begin{matrix} 3 \ 0 \ 9 \ -4 \ -1 \end{matrix} \right] [3 1 0 0 0]and[3 0 9 4 1]

2.3.3 The Minus-1 Trick

  • The columns of A ~ \widetilde{A} A that contain the -1 pivots are solutions of the homogeneous equation system Ax = 0 ,These columns form a basis of the solution space of A x = 0 Ax = 0 Ax=0 , which we call the kernel or null space

  • Example

    • A = [ 1       3       0       0       3 0       0       1       0       9 0       0       0       1 − 4 ] A = \left [ \begin{matrix} 1 \ \ \ \ \ 3 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 3 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 1 \ \ \ \ \ 0 \ \ \ \ \ 9 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 1 -4 \end{matrix} \right] A=1     3     0     0     30     0     1     0     90     0     0     14
    • augment this matrix to a 5\times 5 matrix by adding rows of the Minus -1 at the place where the pivots on the diagonal are missing.
    • Obtain: A = [ 1       3       0       0       3 0 − 1       0       0       0 0       0       1       0       9 0       0       0       1 − 4 0       0       0       0 − 1 ] A = \left [ \begin{matrix} 1 \ \ \ \ \ 3 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 3 \\ 0 -1\ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 1 \ \ \ \ \ 0 \ \ \ \ \ 9 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 1 -4 \\ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 \ \ \ \ \ 0 -1 \end{matrix} \right] A=1     3     0     0     301     0     0     00     0     1     0     90     0     0     140     0     0     01
    • Soltions: [ 3 − 1 0 0 0 ] a n d [ 3 0 9 − 4 − 1 ] \left[ \begin{matrix} 3\\ -1 \\0 \\0 \\0 \end{matrix} \right] and \left[ \begin{matrix} 3\\ 0 \\ 9 \\ -4 \\-1 \end{matrix} \right] 31000and30941
  • Colculating the Inverse

    • The matrix is augmented by inserting a matrix of diagonal to 1 to the right. Then, use Gaussian Elimination to bring it into REF. Finally, the right side is the inverse of the matrix

    • A X = I n , X = A − 1 AX = I_{n} , X = A ^{-1} AX=In,X=A1

    • Example:

      • Origin Matrix:
      • [ 1   0   2   0 1   1   0   0 1   2   0   1 1   1   1   1 ] \left[ \begin{matrix} 1 \ 0 \ 2 \ 0 \\1 \ 1 \ 0 \ 0 \\ 1\ 2\ 0 \ 1 \\ 1 \ 1\ 1\ 1 \end{matrix} \right] 1 0 2 01 1 0 01 2 0 11 1 1 1
      • Augmented:
      • [ 1   0   2   0   ∣   1   0   0   0 1   1   0   0   ∣   0   1   0   0 1   2   0   1   ∣   0   0   1   0 1   1   1   1   ∣   0   0   0   1 ] \left[ \begin{matrix} 1 \ 0\ 2 \ 0 \ | \ 1 \ 0 \ 0 \ 0 \\ 1 \ 1 \ 0 \ 0 \ | \ 0 \ 1 \ 0 \ 0 \\ 1 \ 2 \ 0 \ 1 \ | \ 0 \ 0 \ 1 \ 0 \\ 1 \ 1 \ 1 \ 1 \ | \ 0 \ 0 \ 0 \ 1 \end{matrix} \right] 1 0 2 0  1 0 0 01 1 0 0  0 1 0 01 2 0 1  0 0 1 01 1 1 1  0 0 0 1
      • Gaussian Elimination
      • [ 1         0         0         0         ∣         0         − 1         2         − 2 0         1         0         0         ∣         1         − 1         2         − 2 0         0         1         0         ∣         1         − 1         1         − 1 0         0         0         1         ∣      − 1            0      − 1            2 ] \left[ \begin{matrix} 1 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ | \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2 \\ 0 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ | \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2 \\ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ | \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \\ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ | \ \ \ \ -1 \ \ \ \ \ \ \ \ \ \ 0 \ \ \ \ -1 \ \ \ \ \ \ \ \ \ \ 2 \end{matrix} \right] 1       0       0       0              0       1       2       20       1       0       0              1       1       2       20       0       1       0              1       1       1       10       0       0       1           1          0    1          2
      • The Verse is A − 1 = [         0         − 1         2         − 2         1         − 1         2         − 2         1         − 1         1         − 1      − 1            0      − 1            2 ] A ^{-1} = \left[ \begin{matrix} \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2 \\ \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2 \\ \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ -1 \\ \ \ \ \ -1 \ \ \ \ \ \ \ \ \ \ 0 \ \ \ \ -1 \ \ \ \ \ \ \ \ \ \ 2 \end{matrix} \right] A1=       0       1       2       2       1       1       2       2       1       1       1       1    1          0    1          2

2.4 Vectors Space

A set of elements and an operation defined on these elements that keeps some structure of the set intact.

2.4.1 Groups

  • Consider a set G \mathcal{G} G and an operation ⊗ : G × G → G \otimes :\mathcal{G} \times \mathcal{G} \rightarrow \mathcal{G} :G×GG defined on G \mathcal{G} G. Then, G : = ( G , ⊗ ) G := (\mathcal{G}, \otimes) G:=(G,) is called a group if the following hold:

    • Closure of G \mathcal{G} G under ⊗ : ∀ x , y ∈ G : ( x ⊗ y ) ∈ G \otimes : \forall x,y\in \mathcal{G}:(x \otimes y) \in \mathcal{G} :x,yG:(xy)G
    • Associativity: ∀ x , y , z ∈ G : ( x ⊗ y ) ⊗ z = x ⊗ ( y ⊗ z ) \forall x,y , z\in \mathcal{G}:(x \otimes y) \otimes z = x \otimes (y \otimes z) x,y,zG:(xy)z=x(yz)
    • Neutral element: ∃   e ∈ G   ∀ x ∈ G : x ⊗ e = x   a n d   e ⊗ x = x \exists \ e \in \mathcal{G} \ \forall x \in \mathcal{G} : x\otimes e = x \ and \ e \otimes x = x  eG xG:xe=x and ex=x
    • Inverse element: ∀   x ∈ G   ∃ y ∈ G : x ⊗ y = e   a n d   y ⊗ x = e \forall \ x \in \mathcal{G} \ \exists y \in \mathcal{G} : x\otimes y = e \ and \ y \otimes x = e  xG yG:xy=e and yx=e , where e is the neutral element.
  • If additionally ∀ x , y ∈ G : x ⊗ y = y ⊗ x , t h e n   G = ( G , ⊗ ) \forall x, y \in \mathcal{G} : x \otimes y = y \otimes x , then \ G = (\mathcal{G},\otimes) x,yG:xy=yx,then G=(G,) is an Abelian group (commutative)

  • Examples:

    • ( Z , + ) (\Z , +) (Z,+) is an Abelian group
    • ( N 0 , + ) (\N_0 , + ) (N0,+) is not a group, because the inverse elements are missing.

2.4.2 Vector Spaces

  • A real-valued vector space V = ( V , + , ⋅ ) V = (V, +, ·) V=(V,+,) is a set \mathcal{V} with two operations: + : V × V → V   ⋅   : R × V → V \\ + : \mathcal{V} \times \mathcal{V} \rightarrow \mathcal{V} \\ \ · \ : \R \times \mathcal{V} \rightarrow \mathcal{V} +:V×VV  :R×VV
  • where 1. ( V , + ) (\mathcal{V}, +) (V,+) is an Abelian group.
  • 2, Distributivity:       1. ∀ λ ∈ R , x , y ∈ V : λ ⋅ ( x + y ) = λ ⋅ x + λ ⋅ y       2. ∀ λ , ψ ∈ R , x ∈ V : ( λ + ψ ) ⋅ x = λ ⋅ x + ψ ⋅ x \\ \ \ \ \ \ 1. \forall \lambda \in \R , x,y \in \mathcal{V}: \lambda ·(x + y) = \lambda·x + \lambda·y \\ \ \ \ \ \ 2. \forall \lambda, \psi \in \R, x \in \mathcal{V}: (\lambda + \psi) · x = \lambda·x + \psi·x      1.λR,x,yV:λ(x+y)=λx+λy     2.λ,ψR,xV:(λ+ψ)x=λx+ψx
    1. Associativity (outer operation): ∀ λ , ψ ∈ R , x ∈ V : λ ⋅ ( ψ ⋅ x ) = ( λ ψ ) ⋅ x \forall \lambda, \psi \in \R , x \in \mathcal{V} : \lambda · (\psi · x) = (\lambda\psi) · x λ,ψR,xV:λ(ψx)=(λψ)x
    1. Neutral element with repsect to the outer operation: ∀ x ∈ V : 1 ⋅ x = x \forall x\in \mathcal{V} : 1 ·x=x xV:1x=x
  • The element x ∈ V x\in \mathcal{V} xV are called vectors.
  • The neutral element of ( V , + ) (\mathcal{V},+) (V,+) is the zero vector 0 = [ 0 , ⋯   , 0 ] T 0 = [0,\cdots,0]^T 0=[0,,0]T, the inner operation + is called vector addition.
  • The λ ∈ R \lambda \in \R λR are called scalars. the outer operation · is a multiplication by scalars.
  • A “vector multiplication” a b , a , b ∈ R n ab, a, b \in \R^{n} ab,a,bRn is not defined.
  • Defined multiplication: a b T ∈ R n × n ab^T \in \R^{n\times n} abTRn×n (outer product), a T b ∈ R a^Tb \in \R aTbR (inner/scalar/dot product

2.3.4 Vector Subspaces

  • Let V = ( V , + , ⋅ ) V = (V,+,·) V=(V,+,) be a vector space and U ⊆ V , U ≠ ∅ \mathcal{U} \subseteq\mathcal{V},\mathcal{U} \neq \emptyset UV,U= . Then U = ( U , + , ⋅ ) U = (\mathcal{U},+,·) U=(U,+,) is called vector subspace of V V V (or linear subspace) if U U U is a vector space with the vector space operations + and · restricted to U × U a n d R × U \mathcal{U} \times \mathcal{U} and \R \times \mathcal{U} U×UandR×U . We write U ⊆ V U\subseteq V UV to denote a subspace U U U of V V V
  • Every subspsce U ⊆ ( R n , + , ⋅ ) U \subseteq (\R^n ,+,·) U(Rn,+,) is the solution space of a homogeneous system of linear equations A x = 0 Ax=0 Ax=0 for x ∈ R n x\in\R^n xRn

2.5 Linear Independence

Linear Combination

  • Consider a vector space V V V and a finite number of vectors x 1 , ⋯   , x k ∈ V x_1,\cdots,x_k \in V x1,,xkV. Then, every v ∈ V v\in V vV of the form v = λ 1 x 1 + ⋯ + λ k x k = ∑ i = 1 k λ i x i ∈ V \\ v = \lambda_1x_1 + \cdots + \lambda_kx_k = \sum^{k}_{i=1} \lambda_ix_i \in V v=λ1x1++λkxk=i=1kλixiV with λ 1 , ⋯   , λ k ∈ R \lambda_1,\cdots,\lambda_k \in \R λ1,,λkR is a linear combination of the vectors x 1 , ⋯   , x k x_1,\cdots,x_k x1,,xk.

Linear (In)dependence

  • Let us consider a vector space V V V with k ∈ N a n d x 1 , ⋯   , x k ∈ V k\in\N and x_1,\cdots,x_k \in V kNandx1,,xkV. If there is a non-trivial linear combination, such that 0 = ∑ i = 1 k λ i x i 0 = \sum_{i=1}^{k} \lambda_ix_i 0=i=1kλixi with at least one λ i ≠ 0 \lambda_i \neq 0 λi=0. The vectors x 1 , ⋯   , x k x_1,\cdots,x_k x1,,xk are linear dependent. If only the trivial solution exists, i.e., λ 1 = ⋯ = λ k = 0 \lambda_1 = \cdots = \lambda_k = 0 λ1==λk=0 the vectors x 1 , ⋯   , x k x_1,\cdots,x_k x1,,xk are linear independent.

either independent or dependent. No third option.

The v e c t o r { x 1 , ⋯   , x k : x i ≠ 0 , i = 1 , ⋯ k } , k ⩾ 2 vector \{x_1,\cdots,x_k : x_i \neq 0, i = 1,\cdots k\}, k \geqslant 2 vector{x1,,xk:xi=0,i=1,k},k2 linearly dependent if and oly if at least one of them is a linear combination of the others. In particular, if one vector is a multiple of another vector then the set is linearly dependent.

determine by Gaussian Elimination in REF.

  • All column vectors are linearly independent if and only if all columns are pivot columns. If there is at least one non-pivot column, the columns are dependent.

2.6 Basis and Rank

2.6.1 Generating Set and Basis

  • Generating Set ans Span

    • Consider a vector space V = ( V , + , ⋅ ) V = (\mathcal{V},+,·) V=(V,+,) and set of vectors A = { x 1 , ⋯   , x k } ⊆ V \mathcal{A} = \{ x_1,\cdots,x_k\} \subseteq \mathcal{V} A={x1,,xk}V. If every vector v ∈ V v \in \mathcal{V} vV can be expressed as a linear combination of x 1 , ⋯   , x k , A x_1,\cdots,x_k,\mathcal{A} x1,,xk,A is called a generating set of V$. The set of all linear combination of vectors in A \mathcal{A} A is called the span of A \mathcal{A} A. If A \mathcal{A} A spans the vector space V V V, we write V = s p a n [ A ] V = span[\mathcal{A}] V=span[A] or V = s p a n [ x 1 , ⋯   , x k ] V = span\left[x_1,\cdots,x_k \right] V=span[x1,,xk].
    • Generating sets are sets of vectors that span vector (sub)spaces.
  • Basis

    • There exists no smaller set that spans V V V. Every linearly independent generating set of V V V is minimal and is called a basis of V V V.
  • In R 3 \R^3 R3, the canonical/standard basis is B = { [ 1 0 0 ] , [ 0 1 0 ] , [ 0 0 1 ] } \\ \mathcal{B} = \{ \left[ \begin{matrix} 1 \\ 0 \\0 \end{matrix} \right] , \left[ \begin{matrix} 0 \\ 1 \\0 \end{matrix} \right] , \left[ \begin{matrix} 0 \\ 0 \\1 \end{matrix} \right] \} B={100,010,001}

  • Determining a Basis

    • REF: [ 1         2         3         − 1 0         1         2         − 2 0          0          0          1 0          0          0          0 0          0          0          0 ] \left[ \begin{matrix} 1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ 3 \ \ \ \ \ \ \ -1\\0 \ \ \ \ \ \ \ 1 \ \ \ \ \ \ \ 2 \ \ \ \ \ \ \ -2\\ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 1\\ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0 \\0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0 \ \ \ \ \ \ \ \ 0\end{matrix} \right] 1       2       3       10       1       2       20        0        0        10        0        0        00        0        0        0
    • The pivot columns indicate which set of vectors is linearly independent, see from REF that x 1 , x 2 , x 3 , x 4 x_1,x_2,x_3,x_4 x1,x2,x3,x4 are linearly independent. because λ 1 x 1 + λ 2 x 2 + λ 4 x 4 = 0 \lambda_1 x_1 + \lambda_2 x_2 + \lambda_4 x_4 = 0 λ1x1+λ2x2+λ4x4=0 can only be solved with λ 1 = λ 2 = λ 4 = 0 \lambda_1 = \lambda_2 = \lambda_4 = 0 λ1=λ2=λ4=0. Therefore, { x 1 , x 2 , x 3 , x 4 } \{x_1,x_2,x_3,x_4\} {x1,x2,x3,x4} is a basis of U U U.

2.6.2 Rank

  • The number of linearly independent columns of a matrix A ∈ R m × n A \in \R^{m\times n} ARm×n equals the number of linearly independent row and is called the rank of A A A and is denoted by r k ( A ) rk(A) rk(A)

  • Important properties

    • r k ( A ) = r k ( A T ) rk(A) = rk(A^T) rk(A)=rk(AT)
    • The columns of A ∈ R m × n A\in \R^{m\times n} ARm×n span a subspace U ⊆ R m U\subseteq\R^{m} URm with d i m ( U ) = r k ( A ) dim(U) = rk(A) dim(U)=rk(A) . We call this subspace the image or range. A A A basis of U U U can be found by applying Gaussian Elimination to A A A to identify the pivot columns.
    • The Rows of A ∈ R m × n A \in \R^{m\times n} ARm×n span a subspace W ⊆ R n w i t h d i m ( W ) = r k ( A ) W\subseteq \R^n with dim(W) = rk(A) WRnwithdim(W)=rk(A) . A basis of W W W can be found by applying Gaussian Elimination to A T A^T AT
    • A = ∈ R n × n A= \in \R^{n\times n} A=Rn×n it holds that A A A is regular (invertible) if and only if r k ( A ) = n rk(A) = n rk(A)=n.
    • A x = b Ax = b Ax=b can be solved if and only if r k ( A ) = r k ( A ∣ b ) rk(A) = rk(A|b) rk(A)=rk(Ab). The A|b denotes the augmented system.
    • A x = 0 Ax = 0 Ax=0 possessed dimension n − r k ( A ) n - rk(A) nrk(A) . We call this subspace the kernel or the null space.
    • When rank equals the largest possible rank for a matrix of the same dimensions that means the matrix has full rank.This mean that the rank of a full-rank matrix is the lesser of the number of rows and columns.
    • If it does not have full rank, the matrix is said to be rank deficient.

2.7 Linear Mappings - Linear Transformation

Definition: For vector spaces V,W, a mapping Φ : V → W \Phi :V \rightarrow W Φ:VW is called a linear mapping.(or vector space homomorphism/ linear transformation) if ∀ x , y ∈ V   ∀ λ , ψ ∈ R : Φ ( λ x + ψ y ) = λ Φ ( x ) + ψ Φ ( y ) \\ \forall x,y \in V \ \forall \lambda,\psi \in \R : \Phi(\lambda x+\psi y) = \lambda \Phi(x) + \psi\Phi(y) x,yV λ,ψR:Φ(λx+ψy)=λΦ(x)+ψΦ(y).

Φ : V → W    V , W \Phi :\mathcal{V \rightarrow W} \ \ \mathcal{V,W} Φ:VW  V,W can be arbitrary sets. Then Φ \Phi Φ is called

  • Injective if ∀ x , y ∈ V : Φ ( x ) = Φ ( y ) ⇒ x = y \forall x,y\in \mathcal{V}: \Phi(x) = \Phi(y) \Rightarrow x= y x,yV:Φ(x)=Φ(y)x=y
  • Surjective if Φ ( V ) = W \Phi(\mathcal{V} ) = \mathcal{W} Φ(V)=W
  • Bijective if it is injective and surjective.

Finite-dimensional vector spaces V V V and W W W are isomorphic if and only if d i m ( V ) = d i m ( W ) dim(V) = dim(W) dim(V)=dim(W).

2.7.1 Matrix Representation of Linear Mapping

  • A Φ = [ a 1 , a 2 , a 3 ] = [     1   2   0 − 1   1   3     3   7   1 − 1   2   4 ] A_{\Phi} = \left[\mathcal{a_1,a_2,a_3}\right] = \left[\begin{matrix} \ \ \ 1 \ 2 \ 0 \\ -1 \ 1\ 3 \\ \ \ \ 3 \ 7 \ 1 \\-1 \ 2 \ 4 \end{matrix}\right] AΦ=[a1,a2,a3]=   1 2 01 1 3   3 7 11 2 4 where the a j , j = 1 , 2 , 3 a_j , j = 1,2,3 aj,j=1,2,3 are the coordinate vectors of Φ ( b j ) \Phi(b_j) Φ(bj) with respect to C C C.
  • Basis Vectors

2.7.2 Basis Change

  • Equivalence

    • Two matrices A , A ~ ∈ R m × n a r e e q u i v a l e n t i f t h e r e e x i s t r e g u l a r m a t r i c e s S ∈ R n × n A,\tilde{A}\in \R ^{m\times n} are equivalent if there exist regular matrices S\in \R^{n\times n} A,A~Rm×nareequivalentifthereexistregularmatricesSRn×n and T ∈ R m × m T \in \R^{m\times m} TRm×m , such that A ~ = T − 1 A S \tilde{A} = T^{-1}AS A~=T1AS
  • Similarity

    • Two matrices A , A ~ ∈ R n × n A,\tilde{A}\in \R ^{n\times n} A,A~Rn×n are similarity if there exist regular matrices S ∈ R n × n S\in \R^{n\times n} SRn×n with A ~ = S − 1 A S \tilde{A} = S^{-1}AS A~=S1AS
  • Transformation formular: A ~ = T − 1 A Φ S A Φ : B → C , A ~ Φ : B ~ → C ~ , S : B ~ → B , T : C ~ → C   a n d    T − 1 : C → C ~ B ~ → C ~ = B ~ → B → C → C ~ \\ \tilde{A} = T^{-1}A_{\Phi}S \\ A_{\Phi} : B \rightarrow C, \tilde{A}_{\Phi} : \tilde{B} \rightarrow \tilde{C} , S : \tilde{B} \rightarrow B, T:\tilde{C} \rightarrow C \ and \ \ T^{-1} : C \rightarrow \tilde{C} \\ \tilde{B} \rightarrow \tilde{C} = \tilde{B} \rightarrow B \rightarrow C \rightarrow \tilde{C} A~=T1AΦSAΦ:BC,A~Φ:B~C~,S:B~B,T:C~C and  T1:CC~B~C~=B~BCC~

2.7.3 Image and Kernel

  • For Φ : V → W \Phi : V \rightarrow W Φ:VW, we defined the kernel/null space k e r ( Φ ) : = Φ − 1 ( 0 W ) = { v ∈ V : Φ ( v ) = 0 W } \\ ker(\Phi) := \Phi^{-1}(0_W) = \{v \in V : \Phi(v) = 0_W\} ker(Φ):=Φ1(0W)={vV:Φ(v)=0W} and the image/range I m ( Φ ) : = Φ ( V ) = { w ∈ W ∣ ∃ v ∈ V : Φ ( v ) = w } \\ Im(\Phi) := \Phi(V) = \{w\in W | \exist v\in V : \Phi(v) = w\} Im(Φ):=Φ(V)={wWvV:Φ(v)=w} we call V V V and W W W also the domain and codomain of Φ \Phi Φ, respectively.

  • r k ( A ) = d i m ( I m ( Φ ) ) rk(A) = dim(Im(\Phi)) rk(A)=dim(Im(Φ))

  • [ 1      0      0        1 0      1 − 1 2 − 1 2 ] \left[ \begin{matrix} 1 \ \ \ \ 0 \ \ \ \ 0 \ \ \ \ \ \ 1 \\ 0 \ \ \ \ 1 -\frac{1}{2} -\frac{1}{2} \end{matrix} \right] [1    0    0      10    12121] This matrix is in REF, and we can use the Minus-1 Trick to compute a basis of the kernel. Alternatively, we can express the non-pivot columns(3,4) as linear combinations of the pivot columns(1,2). The third column a 3 a_3 a3 is equivalent to − 1 2 -\frac{1}{2} 21 times the second column a 2 a_2 a2 . Therefor, 0 = a 3 + 1 2 a 2 0 = a_3 + \frac{1}{2}a_2 0=a3+21a2. In the same way, we see that a 4 = a 1 − 1 2 a 2 a_4 = a_1 - \frac{1}{2}a_2 a4=a121a2 and, therefore, 0 = a 1 − 1 2 a 2 − a 4 0 = a_1 - \frac{1}{2}a_2 -a_4 0=a121a2a4. Overall, this gives us the kernel (null space) is k e r ( Φ ) = s p a n [ [ 0 1 2 1 0 ] , [ − 1 1 2 0 1 ] ] \\ ker(\Phi) = span[ \left[ \begin{matrix} 0 \\ \frac{1}{2} \\1 \\ 0 \end{matrix} \right], \left[ \begin{matrix} -1\\ \frac{1}{2} \\0 \\ 1 \end{matrix} \right]] ker(Φ)=span[02110,12101]

  • Rank-Nullity Theorem

    • For vector spaces V , W V,W V,W and a linear mapping Φ : V → W \Phi : V \rightarrow W Φ:VW it holds that d i m ( k e r ( Φ ) ) + d i m ( I m ( Φ ) ) = d i m ( V ) \\ dim(ker(\Phi)) + dim(Im(\Phi)) = dim(V) dim(ker(Φ))+dim(Im(Φ))=dim(V)
    • Fundamental theorem of linear mappings
    • If d i m ( I m ( Φ ) ) < d i m ( V ) dim(Im(\Phi)) < dim(V ) dim(Im(Φ))<dim(V), then k e r ( Φ ) ker(\Phi) ker(Φ) is non-trivial, i.e., the kernel contains more than 0 v 0_v 0v and d i m ( k e r ( Φ ) ) > 1 dim(ker(\Phi)) > 1 dim(ker(Φ))>1
    • If A Φ A_{\Phi} AΦ is the transformation matrix of Φ \Phi Φ with respect to an ordered basis and d i m ( I m ( Φ ) ) < d i m ( V ) dim(Im(\Phi)) < dim(V) dim(Im(Φ))<dim(V), then the system of linear equations A Φ x = 0 A_{\Phi}x = 0 AΦx=0 has infinitely many solutions.
    • If d i m ( V ) = d i m ( W ) dim(V) = dim(W) dim(V)=dim(W), then the Φ \Phi Φ is bijective(includ: injective and surjective). since I m ( Φ ) ⊆ W Im(\Phi) \subseteq W Im(Φ)W.

2.8 Affine Spaces

Resemble linear mapping

Affine Subspace

  • Let V V V be a vector space, x 0 ∈ V x_0 \in V x0V and U ⊆ V U \subseteq V UV a subspace. The the subset L = x 0 + U : = { x 0 + u : u ∈ U }      = { v ∈ V ∣   ∃ u ∈ U : v = x 0 + u } ⊆ V \\ L = x_0 + U := \{x_0 + u : u \in U\} \\ \ \ \ \ = \{v\in V | \ \exist u\in U : v=x_0 + u\} \subseteq V L=x0+U:={x0+u:uU}    ={vV uU:v=x0+u}V is called affine subspace or linear manifold of V V V. U U U is called direction or direction space, and x 0 x_0 x0 is called support point.
  • If x 0 ∉ U x_0 \notin U x0/U, an affine subspace is not a linear subspace(vector subspace) of V V V.
  • Are points, lines, and planes in R 3 \R^3 R3, which do not (necessarily) go through the origin.

Affine Mappings

  • For two vector spaces V , W V,W V,W , a linear mapping Φ : V → W \Phi : V \rightarrow W Φ:VW, and a ∈ W a\in W aW, the mapping ϕ : V → W x ↦ a + Φ ( x ) \\ \phi: V \rightarrow W \\ x \mapsto a + \Phi(x) ϕ:VWxa+Φ(x) is an affine mapping from V V V to W W W . The vector a is called the translation vector of ϕ \phi ϕ.
  • Affine mappings keep the geometric structure invariant. They also preserve the dimension and parallelism.