# Derivative of softmax

15 Dec 2017

Softmax is a vector function. It takes a vector a and produces a vector as output. $S(a): \mathbb{R}^N \rightarrow \mathbb{R}^N$

There is not exactly a derivative as in sigmoid or tanh… you have to specify:

- Which output component $S_i$ are you seeking to find the derivative of.
- Which is the input $a_i$ you are asking to derivate from.

The derivative is in fact a Jacobian matrix of $N \times N$:

Using cuotient rule for derivatives

where $g_i = e^{a_i}$ and $h_i = \sum_{k=1}^N e^{a_k}$

No matter which $a_j$ derivative of $h_i$ is always e^{a_j}:

Derivative of $g_i$:

Lets calculate $D_jS_i$ when $i=j$:

And calculate $D_jS_i$ when $i \neq j$:

To summarize:

Softmax extends binary classification to ‘n’ levels useful for:

- Faces
- Car
- MNIST Digits 0-9

Softmax is a generalization of Logistic function. Compress a K-dimension z Real values to a K-dimension $\sigma (z)$

Softmax for K=2 is the same as a sigmoid where $w = w_1 - w_0$

Softmax is a generalization of sigmoid for K>2.

Softmax for K Classes: