Maximizes

Given our data, what is the model is the best model?

This is connected to MLE through Bayes’ Rule:

Intuitively, is accounting for how ‘likely’ this model is. We can also treat this as a regularizer.

Where acts like the regularizing term. In fact, many regularizers are equivalent to negative log-priors.

Relation between regularized loss functions

L2-Regularized Least Squares

If we assume a Gaussian likelihood and a Gaussian prior, then MAP estimation is equivalent to minimizing

L2-Regularized Robust Regression

If we assume a Laplace likelihood and a Gaussian prior, then MAP estimation is equivalent to minimizing