6 September 2024
Visualizing softmax
The softmax function
The softmax function is defined as
It maps an -component vector to a vector with positive components and unit norm — all components positive, summing to one. Any collection of numbers becomes a probability distribution.
This property is central to classification neural networks, where softmax typically sits in the output layer paired with one-hot encoded labels.
The same formula has an older life in statistical mechanics: it is the Boltzmann–Gibbs distribution, giving the probability that a thermodynamic system occupies a state given the energies of all accessible states. What machine learning calls the “temperature” of a softmax is not a metaphor — it is literally the same parameter that physics puts in the denominator of the exponent.
Interactive visualization
Drag the temperature and watch a random dataset reshape into sharper or flatter distributions: