We want a model of for use in decision theory.

  • Predictions generally map to labels for classes (for binary prediction, we used )
  • Probabilities we want to map to the range

The most common choice is to use the sigmoid function:

Multi-class Probabilities

See also: multi-class classification

The softmax function allows us to map real numbers to probabilities