Useful Identities of Computing Gradients

Xf(X)T=(f(X)X)T\frac{\partial}{\partial X} f(X)^T =(\frac{\partial f(X)}{\partial X})^T Xtr(f(X))=tr(f(X)X)\frac{\partial}{\partial X} tr(f(X)) =tr (\frac{\partial f(X)}{\partial X}) Xdet(f(X))=det(f(X))tr(f(X)1f(X)X)\frac{\partial}{\partial X} \det (f(X)) =\det(f(X)) tr(f(X) ^{-1}\frac{\partial f(X)}{\partial X}) Xf(X)1=f(X)1f(X)Xf(X)1\frac{\partial}{\partial X} f(X)^{-1} =-f(X)^{-1} \frac{\partial f(X)}{\partial X} f(X)^-1 aTX1bX=(X1)TabT(X1)T\frac{\partial a^TX^{-1}b}{\partial X} = - (X^{-1})^Tab^T(X^{-1})^T xTax=aT\frac{\partial x^T a}{\partial x} =a^T aTxx=aT\frac{\partial a^Tx}{\partial x} =a^T aTXbX=aTb\frac{\partial a^TXb}{\partial X} = a^Tb xTBxX=xT(BT+B)\frac{\partial x^T B x}{\partial X} = x^T(B^T + B) s(xAs)TW(xAs)=2(xAs)TWA for symmetric matrix W\frac{\partial}{\partial s}(x - As)^T W (x - As) = - 2(x - As)^T W A \text{ for symmetric matrix W}