Gradiend descent

26 Dec 2017

Demonstration of normal (Batch) gradient descend formulas

$y$ is the training input (always 0 or 1)
$m$ is the size of de dataset input x
$w$ are the parameters of logistic regression

First assumption is that can aproximate the likehood with the sigmoid function:

Cost funcion (is the likelihood negated) is the error on every iteration (cross entropy error) and depends on likelihood.

graphic of cost

This formula is derived from calculation of likelihood of the $m$ inputs as:

Note: we assume that the output has an output binary class k = 2, that have redundant and opposed probabilities.

Retuning to the cost ($J(w)$):

There is a global minimum thar can be achieved using gradient descent.

There are several types of gradient descent implementation:

  • Batch gradient descent: iteration (new coefs) is done as the sum of all training examples
  • Stochastic gradient descent: iteration on each example, converges faster than batch gradient descent in case of large training dataset.
  • Mini-batch gradient descent: iteration on a group of $b$ examples, in some cases can converge faster than batch or stochastic.
  • Stochastic gradient descent with momentum: each gradient has a momentum added from the previous gradient:

Important implementation concepts:

  • Debug results to adjust the learning rate
  • Possibly set dinamic learning rate as: $\alpha = \dfrac{const_1}{interationNumber + const_2}$ so learning rate decreases as it aproaches to the minimum.

dynamic alpha

An special case of stochastic gracient descent is the Online learning. Learning is done continuously from a source flow that generates training examples. Each example is used for train and ignored (not stored).

source: coursera machine learning course

Glosary

  • MLE: Maximum Likelihood Estimation
  • sigmoid: function to map $[-\infty,\infty]$ to $[0,1]$
  • Epoch: Epoch means one iteration of Stochastic Gradient Descent


who am i

Engineer in Barcelona, working in BI and Cloud service projects. Very interested in the new wave of Machine-Learning and IA applications

what is this

This is a blog about software, some mathematics and python libraries used in Mathematics and Machine-Learning problems

where am i

github//m-alcu
twitter//alcubierre
linkedin//martinalcubierre
facebook//m.alcubierre
2017 by Martín Alcubierre Arenillas.
Content available under Creative Commons (BY-NC-SA) unless otherwise noted.
This site is hosted at Github Pages and created with Jekyll.