XOR with neural network
- python : http://www.bogotobogo.com/python/python_Neural_Networks_Backpropagation_for_XOR_using_one_hidden_layer.php
- neuralpy: http://pythonhosted.org/neuralpy/gettingstarted.html
Gradient descent variations
- batch : computes weights for entire training data set. slow, and doesn't work if dataset do not fit in memory. can't work with sample on-the-fly
- scochastic (SGD): computes weights for each training example. flucturate. Need to shuffle samples for each iteration. can be used to learn online. complicates convergence. it may overshoot.
- mini-batch: computes weights for min-batch. Need to shuffle samples too. mini batch size 50 or so. used to neural networks.
- momentum: when it goes to same direction, give momentum (accelerate). When it reverses, slow down.
- Nesterov accelerated gradient
- Adagrad: well-suited with sparse data.
- Adadelta: extension of Adagrad