Personal/Andrew Ng Notes/VIII Neural Networks - Representation
- If you have two terms x1 and x2, you can build hypotheses with many combinations with two features, such as x1, x2, x1x2, x1^2 * x2, x1 * x2^2, and so on. You might be able to build hypotheses that works for the problem.
- But what if there are more features?
- It will result in two many features; it may overfit, it might be computationally expensive to run.
- How about image detection. Too many features (pixel intensity, or RGB value) to represent to solve using logistic regression.
Neurons and the Brain
- Neural network works pretty well than other machine learning. neural network is very effective.
- origin: algirhtms that try to mimic the brain.
- was popular in 80s, and early 90s. popularity diminished late 90s - maybe too many computation power needed.
- recent resurgence: state-of-art computer technique.
One learning algorithm hypothesis
- auditory cortex which is a part of brain is connected to ears, and knows how to hear.
- cut the connection, and connect it to eye, then the auditory cortex learned to see.
- somatorsensory cortex which is responsible for sensing. It also learned to see.
- neuro-rewire. maybe brain cells are capable of learning anything. not specific to each part. vision can be handled by any part.
- connect any sensor to any part of brain, and brain learns how to use it.
Neurons in the brain
- nucleus : in the cell body
- dendrite : input wires
- axon : output wires
- nucleus has calculation unit.
- pulses of electricity.
- axon => (pulse) => dendrite of other neuron =>
- neuron as a logistic unit
- input wires : gets number of input
- neuron does computation
- output wire - computation of h(theta)(x)
- x0: bias unit.
- sigmoid (logistic) activation function. g(z) = 1 / (1 + e^-z)
- theta: weights of the model. parameters.
- group of different neurons connected together.
- Layer1: input x1, x2, x3 => input layer
- Layer2: a1, a2, a3 : hidden layer
- Layer3: neuron : output layer
- a_i(j): activation of unit i in layer j
- THETA(j): matrix of weights from layer j to layer j + 1
- use sigmoid
Forward propagation: vectorized implementation
Neural netowrk learning its own features
- similar to logistic regression
- a1, a2, a3 becomes feature like x1, x2, x3
- each neuron learns logic regression
- it can implement complicated model
Neural Network Architecture
- How neuron are connected.
- any layer not input or output layer are called hidden layer.
Example and Intuitions
- (x1, x2) are binary
- For AND logic, it doesn't need a hidden layer.
- With logistic regression, can implement and and or logic.
- sigmoid. sigmoid(4.6) => 0.99, sigmoid(-4.6) => 0.01
- AND logic: theta = [ -30; 20; 20 ] => and logic.
- OR logic: theta = [-10; 20; 20 ]
- NOT logic: theta = [ 10; -20 ]
- (NOT x1) AND (NOT x2) : output is 1 only for (x1, x2) = (0, 0)
- positive weight for bias
- negative weight for both x1 and x2 like for NOT logic
- the abs(negative weigth) should be bigger than abs(positive weight)
- theta [ 10; -20; -20 ]
Non-linear classification example: XOR, XNOR
- for (x1 and x2) : [ -30; 20; 20 ]
- for (!x1 and !x2) : [ 10; -20; -20 ]
- for (x1 or x2) : [ -10; 20; 20 ]
- input layer to a1(2) : x1 & x2 => (0 0 0 1)
- input layer to a2(2) : !x & !x2 => (1 0 0 1)
- a1(2), a2(2) to a1(3) : OR => (1 0 0 1) => XNOR
- XOR (0 1 1 0)
- (x1 | x2) & (!x1 | !x2) : (0 1 1 1) & (1 1 1 0)
- use one hidden layer
Handwritten digit classification (Yann LeCun)
- Yann LeCun : NYU. Founding father of convolutional network (CNN)
- CNN: feed forward
- like digit recognization
- multiple output units: one vs all
- want to detect image => output is A;B;C;D => 4 output ports, each for one vs all