Derivative of softmax
15 Dec 2017
Softmax is a vector function. It takes a vector a and produces a vector as output. $S(a): \mathbb{R}^N \rightarrow \mathbb{R}^N$
There is not exactly a derivative as in sigmoid or tanh… you have to specify:
- Which output component $S_i$ are you seeking to find the derivative of.
- Which is the input $a_i$ you are asking to derivate from.
The derivative is in fact a Jacobian matrix of $N \times N$:
Using cuotient rule for derivatives
where $g_i = e^{a_i}$ and $h_i = \sum_{k=1}^N e^{a_k}$
No matter which $a_j$ derivative of $h_i$ is always e^{a_j}:
Derivative of $g_i$:
Lets calculate $D_jS_i$ when $i=j$:
And calculate $D_jS_i$ when $i \neq j$:
To summarize:
Softmax extends binary classification to ‘n’ levels useful for:
- Faces
- Car
- MNIST Digits 0-9
Softmax is a generalization of Logistic function. Compress a K-dimension z Real values to a K-dimension $\sigma (z)$
Softmax for K=2 is the same as a sigmoid where $w = w_1 - w_0$
Softmax is a generalization of sigmoid for K>2.
Softmax for K Classes: