Lasso (l1 penalty VS Ridge (l2 penalty)
Ridge and Lasso are forms of regularized linear regressions. The regularization can also be interpreted as prior in a maximum a posterior estimation method. Ridge and Lasso regression use two different penalty functions. Ridge uses l2, which is the sum of the squares of the coefficients. And for Lasso is the L1 norm, which is the sum of the absolute values of the coefficients.
The ridge (L2) regression can't zero coefficients out, so we either select all the coefficients or none of them, whereas Lasso (L1) does both parameter shrinkage and variable selection automatically because it zero out the coefficient of collinear variables, which mean it can help to select the variables out of given n variables while performing lasso regression.
We will continue to talk about the difference between L1 and L2 norm. While practicing machine learning, you may have come upon a choice of L1 and L2. Usually the two decisions are :1) L1-norm vs L2-norm loss function; and 2) L1-regularization vs L2-regularization.
L1-regularization will move any weight towards 0 with the same step size, regardless the weight's value. In contrast, you can see that the gradient is linearly decreasing towards 0 as the weight goes towards 0. Therefore, L2-regularization will also move any weight towards 0, but it will take smaller and smaller steps as a weight approaches 0.
The ridge (L2) regression can't zero coefficients out, so we either select all the coefficients or none of them, whereas Lasso (L1) does both parameter shrinkage and variable selection automatically because it zero out the coefficient of collinear variables, which mean it can help to select the variables out of given n variables while performing lasso regression.
We will continue to talk about the difference between L1 and L2 norm. While practicing machine learning, you may have come upon a choice of L1 and L2. Usually the two decisions are :1) L1-norm vs L2-norm loss function; and 2) L1-regularization vs L2-regularization.
L1-regularization will move any weight towards 0 with the same step size, regardless the weight's value. In contrast, you can see that the gradient is linearly decreasing towards 0 as the weight goes towards 0. Therefore, L2-regularization will also move any weight towards 0, but it will take smaller and smaller steps as a weight approaches 0.
More explanation about l1 and l2 please look at here:
http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/
https://stats.stackexchange.com/questions/45643/why-l1-norm-for-sparse-models
https://stats.stackexchange.com/questions/45643/why-l1-norm-for-sparse-models
Comments
Post a Comment