Projection (linear algebra)
In linear algebra and functional analysis, a projection is a linear transformation P from a vector space to itself such that P 2 = P. That is, whenever P is applied twice to any value, it gives the same result as if it were applied once (idempotent). It leaves its image unchanged.[1] Though abstract, this definition of "projection" formalizes and generalizes the idea of graphical projection. One can also consider the effect of a projection on a geometrical object by examining the effect of the projection on points in the object.
Contents
1 Simple example
1.1 Orthogonal projection
1.1.1 Finding projection with an inner product
1.2 Oblique projection
2 Properties and classification
2.1 Orthogonal projections
2.1.1 Properties and special cases
2.1.1.1 Formulas
2.2 Oblique projections
3 Canonical forms
4 Projections on normed vector spaces
5 Applications and further considerations
6 Generalizations
7 See also
8 Notes
9 References
10 External links
Simple example
Orthogonal projection
For example, the function which maps the point (x, y, z) in three-dimensional space R3 to the point (x, y, 0) is an orthogonal projection onto the x–y plane. This function is represented by the matrix
- P=[100010000].displaystyle P=beginbmatrix1&0&0\0&1&0\0&0&0endbmatrix.
The action of this matrix on an arbitrary vector is
- P(xyz)=(xy0).displaystyle Pbeginpmatrixx\y\zendpmatrix=beginpmatrixx\y\0endpmatrix.
To see that P is indeed a projection, i.e., P = P2, we compute
- P2(xyz)=P(xy0)=(xy0)=P(xyz).displaystyle P^2beginpmatrixx\y\zendpmatrix=Pbeginpmatrixx\y\0endpmatrix=beginpmatrixx\y\0endpmatrix=Pbeginpmatrixx\y\zendpmatrix.
Finding projection with an inner product
Let V be a vector space (in this case a plane) spanned by orthogonal vectors u1,u2,⋯,updisplaystyle u_1,u_2,cdots ,u_p. Let y be a vector. One can define a projection of y onto V as
- projVy=y⋅uiuj⋅ujuidisplaystyle operatorname proj _Vy=frac ycdot u^iu^jcdot u^ju^i
where the i 's imply Einstein sum notation. y can be written as an orthogonal sum such that y=projVydisplaystyle y=operatorname proj _Vy + z. projVydisplaystyle operatorname proj _Vy is sometimes denoted as y^displaystyle hat y. There is a theorem in Linear Algebra that states that this z is the shortest distance from y to V and is commonly used in areas such as machine learning.
Oblique projection
A simple example of a non-orthogonal (oblique) projection (for definition see below) is
- P=[00α1].displaystyle P=beginbmatrix0&0\alpha &1endbmatrix.
Via matrix multiplication, one sees that
- P2=[00α1][00α1]=[00α1]=P.displaystyle P^2=beginbmatrix0&0\alpha &1endbmatrixbeginbmatrix0&0\alpha &1endbmatrix=beginbmatrix0&0\alpha &1endbmatrix=P.
proving that P is indeed a projection.
The projection P is orthogonal if and only if α = 0.
Properties and classification
Let W be a finite dimensional vector space and P be a projection on W. Suppose the subspaces U and V are the range and kernel of P respectively.
Then P has the following properties:
- By definition, Pdisplaystyle P is idempotent (i.e. P2=Pdisplaystyle P^2=P).
Pdisplaystyle P is the identity operator Idisplaystyle I on Udisplaystyle U
∀x∈U:Px=xdisplaystyle forall xin U:Px=x.
- We have a direct sum W=U⊕Vdisplaystyle W=Uoplus V. Every vector x∈Wdisplaystyle xin W may be decomposed uniquely as x=u+vdisplaystyle x=u+v with u=Pxdisplaystyle u=Px and v=x−Px=(I−P)xdisplaystyle v=x-Px=(I-P)x, and where u∈U,v∈Vdisplaystyle uin U,vin V.
The range and kernel of a projection are complementary, as are Pdisplaystyle P and Q=I−Pdisplaystyle Q=I-P. The operator Qdisplaystyle Q is also a projection as the range and kernel of Pdisplaystyle P become the kernel and range of Qdisplaystyle Q and vice versa. We say Pdisplaystyle P is a projection along V onto U (kernel/range) and Qdisplaystyle Q is a projection along U onto V.
In infinite dimensional vector spaces, the
spectrum of a projection is contained in 0, 1 as
(λI−P)−1=1λI+1λ(λ−1)Pdisplaystyle (lambda I-P)^-1=frac 1lambda I+frac 1lambda (lambda -1)P.
Only 0 or 1 can be an eigenvalue of a projection, implying that Pdisplaystyle P is always a positive semi-definite matrix. The corresponding eigenspaces are (respectively) the kernel and range of the projection. Decomposition of a vector space into direct sums is not unique in general. Therefore, given a subspace Vdisplaystyle V, there may be many projections whose range (or kernel) is Vdisplaystyle V.
If a projection is nontrivial it has minimal polynomial x2−x=x(x−1)displaystyle x^2-x=x(x-1), which factors into distinct roots, and thus Pdisplaystyle P is diagonalizable.
The product of projections is not, in general, a projection, even if they are orthogonal. If projections commute, then their product is a projection.
Orthogonal projections
When the vector space W has an inner product and is complete (is a Hilbert space) the concept of orthogonality can be used. An orthogonal projection is a projection for which the range U and the null space V are orthogonal subspaces. Thus, for every x and y in W, ⟨Px,(y−Py)⟩=⟨(x−Px),Py⟩=0displaystyle langle Px,(y-Py)rangle =langle (x-Px),Pyrangle =0. Equivalently:
⟨x,Py⟩=⟨Px,Py⟩=⟨Px,y⟩displaystyle langle x,Pyrangle =langle Px,Pyrangle =langle Px,yrangle .
A projection is orthogonal if and only if it is self-adjoint. Using the self-adjoint and idempotent properties of P, for any x and y in W we have Px ∈ U, y − Py ∈ V, and
- ⟨Px,y−Py⟩=⟨P2x,y−Py⟩=⟨Px,P(I−P)y⟩=⟨Px,(P−P2)y⟩=0displaystyle langle Px,y-Pyrangle =langle P^2x,y-Pyrangle =langle Px,P(I-P)yrangle =langle Px,(P-P^2)yrangle =0,
where ⟨⋅,⋅⟩displaystyle langle cdot ,cdot rangle is the inner product associated with W. Therefore, Px and y − Py are orthogonal.[2]
The other direction, namely that if P is orthogonal then it is self-adjoint, follows from
- ⟨x,Py⟩=⟨Px,y⟩=⟨x,P∗y⟩displaystyle langle x,Pyrangle =langle Px,yrangle =langle x,P^*yrangle
for every x and y in W; thus P = P*.
Proof of existence Let H be a complete metric space with an inner product, and let U be a closed linear subspace of H (and hence complete as well).
For every x the following set of non-negative norms ‖x−u‖displaystyle has an infimum, and due to the completeness of U it is a minimum. We define Px as the point in U were this minimum is obtained.
Obviously Px is in U. It remains to show that Px satisfies <x − Px, Px> = 0 and that it is linear.
Let us define a=x−Pxdisplaystyle a=x-Px. For every non-zero v in U, the following holds:
- ‖a−⟨a,v⟩‖v‖2v‖2=‖a‖2−⟨a,v⟩2‖v‖2^2=
By defining w=Px+⟨a,v⟩‖v‖2vdisplaystyle w=Px+frac langle a,vrangle v we see that ‖x−w‖<‖x−Px‖x-Px unless ⟨a,v⟩displaystyle langle a,vrangle vanishes. Since Px was chosen as the minimum of the abovementioned set, it follows that ⟨a,v⟩displaystyle langle a,vrangle indeed vanishes. In particular, (for v = Px): ⟨x−Px,Px⟩=0displaystyle langle x-Px,Pxrangle =0.
Linearity follows from the vanishing of ⟨x−Px,v⟩displaystyle langle x-Px,vrangle for every v in U:
- ⟨(x+y)−P(x+y),v⟩=0displaystyle langle left(x+yright)-Pleft(x+yright),vrangle =0
- ⟨(x−Px)+(y−Py),v⟩=0displaystyle langle left(x-Pxright)+left(y-Pyright),vrangle =0
By taking the difference between the equations we have
- ⟨Px+Py−P(x+y),v⟩=0displaystyle langle Px+Py-Pleft(x+yright),vrangle =0
But since we may choose v = Px + Py − P(x + y) (as it is itself in U) it follows that Px + Py = P(x + y). Similarly we have λPx = P(λx) for every scalar λ.
Properties and special cases
An orthogonal projection is a bounded operator. This is because for every v in the vector space we have, by Cauchy–Schwarz inequality:
- ‖Pv‖2=⟨Pv,Pv⟩=⟨Pv,v⟩⩽‖Pv‖⋅‖v‖Pv
Thus ‖Pv‖⩽‖v‖.
For finite dimensional complex or real vector spaces, the standard inner product can be substituted for ⟨⋅,⋅⟩displaystyle langle cdot ,cdot rangle .
Formulas
A simple case occurs when the orthogonal projection is onto a line. If u is a unit vector on the line, then the projection is given by the outer product
- Pu=uuT.displaystyle P_u=uu^mathrm T .
(If u is complex-valued, the transpose in the above equation is replaced by a Hermitian transpose). This operator leaves u invariant, and it annihilates all vectors orthogonal to u, proving that it is indeed the orthogonal projection onto the line containing u.[3] A simple way to see this is to consider an arbitrary vector xdisplaystyle x as the sum of a component on the line (i.e. the projected vector we seek) and another perpendicular to it, x=x∥+x⊥displaystyle x=x_parallel +x_perp . Applying projection, we get
- Pux=uuTx∥+uuTx⊥=u(sign(uTx∥)‖x∥‖)+u⋅0=x∥right)+ucdot 0=x_parallel
by the properties of the dot product of parallel and perpendicular vectors.
This formula can be generalized to orthogonal projections on a subspace of arbitrary dimension. Let u1, ..., uk be an orthonormal basis of the subspace U, and let A denote the n-by-k matrix whose columns are u1, ..., uk. Then the projection is given by:[4]
- PA=AATdisplaystyle P_A=AA^mathrm T
which can be rewritten as
- PA=∑i⟨ui,⋅⟩ui.displaystyle P_A=sum _ilangle u_i,cdot rangle u_i.
The matrix AT is the partial isometry that vanishes on the orthogonal complement of U and A is the isometry that embeds U into the underlying vector space. The range of PA is therefore the final space of A. It is also clear that A·AT is the identity operator on U.
The orthonormality condition can also be dropped. If u1, ..., uk is a (not necessarily orthonormal) basis, and A is the matrix with these vectors as columns, then the projection is:[5][6]
- PA=A(ATA)−1AT.displaystyle P_A=A(A^mathrm T A)^-1A^mathrm T .
The matrix A still embeds U into the underlying vector space but is no longer an isometry in general. The matrix (ATA)−1 is a "normalizing factor" that recovers the norm. For example, the rank-1 operator uuT is not a projection if ‖u‖≠1.neq 1. After dividing by uTu=‖u‖2,u we obtain the projection u(uTu)−1uT onto the subspace spanned by u.
In the general case, we can have an arbitrary positive definite matrix D defining an inner product ⟨x,y⟩Ddisplaystyle langle x,yrangle _D, and the projection PAdisplaystyle P_A is given by PAx=argminy∈range(A)‖x−y‖D2_D^2. Then
- PA=A(ATDA)−1ATD.displaystyle P_A=A(A^mathrm T DA)^-1A^mathrm T D.
When the range space of the projection is generated by a frame (i.e. the number of generators is greater than its dimension), the formula for the projection takes the form:
PA=AA+displaystyle P_A=AA^+. Here A+displaystyle A^+ stands for the Moore–Penrose pseudoinverse. This is just one of many ways to construct the projection operator.
If [AB]displaystyle beginbmatrixA&Bendbmatrix is a non-singular matrix and ATB=0displaystyle A^mathrm T B=0 (i.e., B is the null space matrix of A),[7] the following holds:
- I=[AB][AB]−1[ATBT]−1[ATBT]=[AB]([ATBT][AB])−1[ATBT]=[AB][ATAOOBTB]−1[ATBT]=A(ATA)−1AT+B(BTB)−1BTdisplaystyle beginalignedI&=beginbmatrixA&BendbmatrixbeginbmatrixA&Bendbmatrix^-1beginbmatrixA^mathrm T \B^mathrm T endbmatrix^-1beginbmatrixA^mathrm T \B^mathrm T endbmatrix\&=beginbmatrixA&Bendbmatrixleft(beginbmatrixA^mathrm T \B^mathrm T endbmatrixbeginbmatrixA&Bendbmatrixright)^-1beginbmatrixA^mathrm T \B^mathrm T endbmatrix\&=beginbmatrixA&BendbmatrixbeginbmatrixA^mathrm T A&O\O&B^mathrm T Bendbmatrix^-1beginbmatrixA^mathrm T \B^mathrm T endbmatrix\[4pt]&=A(A^mathrm T A)^-1A^mathrm T +B(B^mathrm T B)^-1B^mathrm T endaligned
If the orthogonal condition is enhanced to ATW B = ATWTB = 0 with W non-singular, the following holds:
- I=[AB][(ATWA)−1AT(BTWB)−1BT]W.displaystyle I=beginbmatrixA&Bendbmatrixbeginbmatrix(A^mathrm T WA)^-1A^mathrm T \(B^mathrm T WB)^-1B^mathrm T endbmatrixW.
All these formulas also hold for complex inner product spaces, provided that the conjugate transpose is used instead of the transpose. Further details on sums of projectors can be found in Banerjee and Roy (2014).[8] Also see Banerjee (2004)[9] for application of sums of projectors in basic spherical trigonometry.
Oblique projections
The term oblique projections is sometimes used to refer to non-orthogonal projections. These projections are also used to represent spatial figures in two-dimensional drawings (see oblique projection), though not as frequently as orthogonal projections. Whereas calculating the fitted value of an ordinary least squares regression requires an orthogonal projection, calculating the fitted value of an instrumental variables regression requires an oblique projection.
Projections are defined by their null space and the basis vectors used to characterize their range (which is the complement of the null space). When these basis vectors are orthogonal to the null space, then the projection is an orthogonal projection. When these basis vectors are not orthogonal to the null space, the projection is an oblique projection. Let the vectors u1, ..., uk form a basis for the range of the projection, and assemble these vectors in the n-by-k matrix A. The range and the null space are complementary spaces, so the null space has dimension n − k. It follows that the orthogonal complement of the null space has dimension k. Let v1, ..., vk form a basis for the orthogonal complement of the null space of the projection, and assemble these vectors in the matrix B. Then the projection is defined by
- P=A(BTA)−1BT.displaystyle P=A(B^mathrm T A)^-1B^mathrm T .
This expression generalizes the formula for orthogonal projections given above.[10][11]
Canonical forms
Any projection P = P2 on a vector space of dimension d over a field is a diagonalizable matrix, since its minimal polynomial divides x2 − x, which splits into distinct linear factors. Thus there exists a basis in which P has the form
- P=Ir⊕0d−rdisplaystyle P=I_roplus 0_d-r
where r is the rank of P. Here Ir is the identity matrix of size r, and 0d−r is the zero matrix of size d − r. If the vector space is complex and equipped with an inner product, then there is an orthonormal basis in which the matrix of P is[12]
P=[1σ100]⊕⋯⊕[1σk00]⊕Im⊕0sdisplaystyle P=beginbmatrix1&sigma _1\0&0endbmatrixoplus cdots oplus beginbmatrix1&sigma _k\0&0endbmatrixoplus I_moplus 0_s .
where σ1 ≥ σ2 ≥ ... ≥ σk > 0. The integers k, s, m and the real numbers σidisplaystyle sigma _i are uniquely determined. Note that 2k + s + m = d. The factor Im ⊕ 0s corresponds to the maximal invariant subspace on which P acts as an orthogonal projection (so that P itself is orthogonal if and only if k = 0) and the σi-blocks correspond to the oblique components.
Projections on normed vector spaces
When the underlying vector space Xdisplaystyle X is a (not necessarily finite-dimensional) normed vector space, analytic questions, irrelevant in the finite-dimensional case, need to be considered. Assume now Xdisplaystyle X is a Banach space.
Many of the algebraic notions discussed above survive the passage to this context. A given direct sum decomposition of Xdisplaystyle X into complementary subspaces still specifies a projection, and vice versa. If Xdisplaystyle X is the direct sum X=U⊕Vdisplaystyle X=Uoplus V, then the operator defined by P(u+v)=udisplaystyle P(u+v)=u is still a projection with range Udisplaystyle U and kernel Vdisplaystyle V. It is also clear that P2=Pdisplaystyle P^2=P. Conversely, if Pdisplaystyle P is projection on Xdisplaystyle X, i.e. P2=Pdisplaystyle P^2=P, then it is easily verified that (1−P)2=(1−P)displaystyle (1-P)^2=(1-P). In other words, 1−Pdisplaystyle 1-P is also a projection. The relation P2=Pdisplaystyle P^2=P implies 1=P+(1−P)displaystyle 1=P+(1-P) and Xdisplaystyle X is the direct sum ran(P)⊕ran(1−P)displaystyle mathrm ran (P)oplus mathrm ran (1-P).
However, in contrast to the finite-dimensional case, projections need not be continuous in general. If a subspace Udisplaystyle U of Xdisplaystyle X is not closed in the norm topology, then projection onto Udisplaystyle U is not continuous. In other words, the range of a continuous projection Pdisplaystyle P must be a closed subspace. Furthermore, the kernel of a continuous projection (in fact, a continuous linear operator in general) is closed. Thus a continuous projection Pdisplaystyle P gives a decomposition of Xdisplaystyle X into two complementary closed subspaces: X=ran(P)⊕ker(P)=ker(1−P)⊕ker(P)displaystyle X=mathrm ran (P)oplus mathrm ker (P)=mathrm ker (1-P)oplus mathrm ker (P).
The converse holds also, with an additional assumption. Suppose U is a closed subspace of X. If there exists a closed subspace V such that X = U ⊕ V, then the projection P with range U and kernel V is continuous. This follows from the closed graph theorem. Suppose xn → x and Pxn → y. One needs to show that Px = y. Since U is closed and Pxn ⊂ U, y lies in U, i.e. Py = y. Also, xn − Pxn = (I − P)xn → x − y. Because V is closed and (I − P)xn ⊂ V, we have x − y ∈ V, i.e. P(x − y) = Px − Py = Px − y = 0, which proves the claim.
The above argument makes use of the assumption that both U and V are closed. In general, given a closed subspace U, there need not exist a complementary closed subspace V, although for Hilbert spaces this can always be done by taking the orthogonal complement. For Banach spaces, a one-dimensional subspace always has a closed complementary subspace. This is an immediate consequence of Hahn–Banach theorem. Let U be the linear span of u. By Hahn–Banach, there exists a bounded linear functional φ such that φ(u) = 1. The operator P(x) = φ(x)u satisfies P2 = P, i.e. it is a projection. Boundedness of φ implies continuity of P and therefore ker(P) = ran(I − P) is a closed complementary subspace of U.
Applications and further considerations
Projections (orthogonal and otherwise) play a major role in algorithms for certain linear algebra problems:
QR decomposition (see Householder transformation and Gram–Schmidt decomposition);- Singular value decomposition
- Reduction to Hessenberg form (the first step in many eigenvalue algorithms)
- Linear regression
- Projective elements of matrix algebras are used in the construction of certain K-groups in Operator K-theory
As stated above, projections are a special case of idempotents. Analytically, orthogonal projections are non-commutative generalizations of characteristic functions. Idempotents are used in classifying, for instance, semisimple algebras, while measure theory begins with considering characteristic functions of measurable sets. Therefore, as one can imagine, projections are very often encountered in the context operator algebras. In particular, a von Neumann algebra is generated by its complete lattice of projections.
Generalizations
More generally, given a map between normed vector spaces T:V→W,displaystyle Tcolon Vto W, one can analogously ask for this map to be an isometry on the orthogonal complement of the kernel: that (kerT)⊥→Wdisplaystyle (ker T)^perp to W be an isometry (compare Partial isometry); in particular it must be onto. The case of an orthogonal projection is when W is a subspace of V. In Riemannian geometry, this is used in the definition of a Riemannian submersion.
See also
Centering matrix, which is an example of a projection matrix.- Orthogonalization
- Invariant subspace
- Properties of trace
Dykstra's projection algorithm to compute the projection onto an intersection of sets
Notes
^ Meyer, pp 386+387
^ Meyer, p. 433
^ Meyer, p. 431
^ Meyer, equation (5.13.4)
^ Banerjee, Sudipto; Roy, Anindya (2014), Linear Algebra and Matrix Analysis for Statistics, Texts in Statistical Science (1st ed.), Chapman and Hall/CRC, ISBN 978-1420095388.mw-parser-output cite.citationfont-style:inherit.mw-parser-output qquotes:"""""""'""'".mw-parser-output code.cs1-codecolor:inherit;background:inherit;border:inherit;padding:inherit.mw-parser-output .cs1-lock-free abackground:url("//upload.wikimedia.org/wikipedia/commons/thumb/6/65/Lock-green.svg/9px-Lock-green.svg.png")no-repeat;background-position:right .1em center.mw-parser-output .cs1-lock-limited a,.mw-parser-output .cs1-lock-registration abackground:url("//upload.wikimedia.org/wikipedia/commons/thumb/d/d6/Lock-gray-alt-2.svg/9px-Lock-gray-alt-2.svg.png")no-repeat;background-position:right .1em center.mw-parser-output .cs1-lock-subscription abackground:url("//upload.wikimedia.org/wikipedia/commons/thumb/a/aa/Lock-red-alt-2.svg/9px-Lock-red-alt-2.svg.png")no-repeat;background-position:right .1em center.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registrationcolor:#555.mw-parser-output .cs1-subscription span,.mw-parser-output .cs1-registration spanborder-bottom:1px dotted;cursor:help.mw-parser-output .cs1-hidden-errordisplay:none;font-size:100%.mw-parser-output .cs1-visible-errorfont-size:100%.mw-parser-output .cs1-subscription,.mw-parser-output .cs1-registration,.mw-parser-output .cs1-formatfont-size:95%.mw-parser-output .cs1-kern-left,.mw-parser-output .cs1-kern-wl-leftpadding-left:0.2em.mw-parser-output .cs1-kern-right,.mw-parser-output .cs1-kern-wl-rightpadding-right:0.2em
^ Meyer, equation (5.13.3)
^ See also Linear least squares (mathematics) § Properties of the least-squares estimators.
^ Banerjee, Sudipto; Roy, Anindya (2014), Linear Algebra and Matrix Analysis for Statistics, Texts in Statistical Science (1st ed.), Chapman and Hall/CRC, ISBN 978-1420095388
^ Banerjee, Sudipto (2004), "Revisiting Spherical Trigonometry with Orthogonal Projectors", The College Mathematics Journal, 35: 375–381, doi:10.1080/07468342.2004.11922099
^ Banerjee, Sudipto; Roy, Anindya (2014), Linear Algebra and Matrix Analysis for Statistics, Texts in Statistical Science (1st ed.), Chapman and Hall/CRC, ISBN 978-1420095388
^ Meyer, equation (7.10.39)
^ Doković, D. Ž. (August 1991). "Unitary similarity of projectors". Aequationes Mathematicae. 42 (1): 220–224. doi:10.1007/BF01818492.
References
Banerjee, Sudipto; Roy, Anindya (2014), Linear Algebra and Matrix Analysis for Statistics, Texts in Statistical Science (1st ed.), Chapman and Hall/CRC, ISBN 978-1420095388
Dunford, N.; Schwartz, J. T. (1958). Linear Operators, Part I: General Theory. Interscience.
Meyer, Carl D. (2000). Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics. ISBN 978-0-89871-454-8.
External links
MIT Linear Algebra Lecture on Projection Matrices on YouTube, from MIT OpenCourseWare
Linear Algebra 15d: The Projection Transformation on YouTube, by Pavel Grinfeld.
Planar Geometric Projections Tutorial – a simple-to-follow tutorial explaining the different types of planar geometric projections.