Lasso (l1 penalty VS Ridge (l2 penalty)
Ridge and Lasso are forms of regularized linear regressions. The regularization can also be interpreted as prior in a maximum a posterior estimation method. Ridge and Lasso regression use two different penalty functions. Ridge uses l2, which is the sum of the squares of the coefficients. And for Lasso is the L1 norm, which is the sum of the absolute values of the coefficients. The ridge (L2) regression can't zero coefficients out, so we either select all the coefficients or none of them, whereas Lasso (L1) does both parameter shrinkage and variable selection automatically because it zero out the coefficient of collinear variables, which mean it can help to select the variables out of given n variables while performing lasso regression. We will continue to talk about the difference between L1 and L2 norm. While practicing machine learning, you may have come upon a choice of L1 and L2. Usually the two decisions are : 1) L1-norm vs L2-norm loss function; and 2) L1-regu...