We want a model of for use in decision theory.
- Predictions generally map to labels for classes (for binary prediction, we used )
- Probabilities we want to map to the range
The most common choice is to use the sigmoid function:
Multi-class Probabilities
See also: multi-class classification
The softmax function allows us to map real numbers to probabilities