Derivative of softmax

15 Dec 2017

Softmax is a vector function. It takes a vector a and produces a vector as output. $S(a): \mathbb{R}^N \rightarrow \mathbb{R}^N$

There is not exactly a derivative as in sigmoid or tanh… you have to specify:

  • Which output component $S_i$ are you seeking to find the derivative of.
  • Which is the input $a_i$ you are asking to derivate from.

The derivative is in fact a Jacobian matrix of $N \times N$:

Using cuotient rule for derivatives

where $g_i = e^{a_i}$ and $h_i = \sum_{k=1}^N e^{a_k}$

No matter which $a_j$ derivative of $h_i$ is always e^{a_j}:

Derivative of $g_i$:

Lets calculate $D_jS_i$ when $i=j$:

And calculate $D_jS_i$ when $i \neq j$:

To summarize:

Softmax extends binary classification to ‘n’ levels useful for:

  • Faces
  • Car
  • MNIST Digits 0-9

Softmax is a generalization of Logistic function. Compress a K-dimension z Real values to a K-dimension $\sigma (z)$

Softmax for K=2 is the same as a sigmoid where $w = w_1 - w_0$

Softmax is a generalization of sigmoid for K>2.

Softmax for K Classes:



who am i

Engineer in Barcelona, working in BI and Cloud service projects. Very interested in the new wave of Machine-Learning and IA applications

what is this

This is a blog about software, some mathematics and python libraries used in Mathematics and Machine-Learning problems

where am i

github//m-alcu
twitter//alcubierre
linkedin//martinalcubierre
facebook//m.alcubierre
2017 by Martín Alcubierre Arenillas.
Content available under Creative Commons (BY-NC-SA) unless otherwise noted.
This site is hosted at Github Pages and created with Jekyll.