Learning Functions and Neural Networks II

24-787 Lecture 9 Learning Functions and Neural Networks II Luoting Fu Spring 2012 Previous lecture Physiological basis Perceptron Input 0 Wb Input 1 X0 X1 W0 W1 + fH fH(x) Applications x Demos Output Y Y = u(W0X0 + W1X1 + Wb) Δ Wi = η (Y0-Y) Xi 2 In this lecture • Multilayer perceptron (MLP) – Representation – Feed forward – Back-propagation • Break • Case studies • Milestones & forefront 2 3 Perceptron A 400-26 perceptron A B C D ⋮ Z 4 © Springer XOR Exclusive OR 5 Root cause Consider a 2-1 perceptron, 𝑦 = 𝜎 𝑤1 𝑥1 + 𝑤2 𝑥2 + 𝑤0 Let 𝑦 = 0.5, Wb Input 0 Input 1 W0 W1 + fH(x) Output 𝑤1 𝑥1 + 𝑤2 𝑥2 = 𝜎 −1 0.5 − 𝑤0 = const 6 A single perceptron is limited to learning linearly separable cases. Minsky M. L. and Papert S. A. 1969. Perceptrons. Cambridge, MA: MIT Press. 7 8 An MLP can learn any continuous function. Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314 A single perceptron is limited to learning linearly separable cases (linear function). 9 How’s that relevant? Function approximation ∈ Intelligence The road ahead Speed Bearing Waveform Wheel turn Pedal depression Words Regression Recognition 10 11 0 12 1 13 2 ℎ = tanh(𝑎 14 3 15 3 16 ∞ 17 18 Matrix representation 𝐷 𝑥∈ℝ 1 𝑀×𝐷 𝑤 ∈ℝ 𝑧 = ℎ(𝑤 1 𝑥 ∈ ℝ𝑀 2 𝐾×𝑀 𝑤 ∈ℝ 𝑦 = 𝜎(𝑤 2 𝑧 ∈ ℝ𝐾 19 Knowledge learned by an MLP is encoded in its layers of weights. 20 What does it learn? • Decision boundary perspective 21 What does it learn? • Highly non-linear decision boundaries 22 What does it learn? • Real world decision boundaries 23 An MLP can learn any continuous function. Cybenko., G. (1989) "Approximations by superpositions of sigmoidal functions", Mathematics of Control, Signals, and Systems, 2 (4), 303-314 Think Fourier. 24 What does it learn? • Weight perspective An 64-M-3 MLP 𝑥 ∈ ℝ𝐷 𝑤 1 ∈ ℝ𝑀×𝐷 𝑧 = ℎ(𝑤 1 𝑥 ∈ ℝ𝑀 𝑤 2 ∈ ℝ𝐾×𝑀 𝑦 = 𝜎(𝑤 2 𝑧 ∈ ℝ𝐾 25 How does it learn? • From examples 0 1 2 3 4 5 6 7 8 9 Polar bear Not a polar bear • By back propagation 26 Back propagation 27 Gradient descent “epoch” 28 29 Back propagation 30 Back propagation • Steps Think about this: What happens when you train a 10-layer MLP? 31 Learning curve error Overfitting and cross-validation 32 Break 33 Design considerations • • • • • • • • • Learning task X - input Y - output D M K #layers Training epochs Training data – # – Source 34 Case study 1: digit recognition An 768-1000-10 MLP 28 28 35 Case study 1: digit recognition 36 Milestones: a race to 100% accuracy on MNIST 37 Milestones: a race to 100% accuracy on MNIST CLASSIFIER ERROR RATE (%) Perceptron 12.0 LeCun et al. 1998 2-layer NN, 1000 hidden units 4.5 LeCun et al. 1998 5-layer Convolutional net 0.95 LeCun et al. 1998 5-layer Convolutional net 0.4 Simard et al. 2003 6-layer NN 784-2500-2000-15001000-500-10 (on GPU) 0.35 Ciresan et al. 2010 Reported by See full list at http://yann.lecun.com/exdb/mnist/ 38 Milestones: a race to 100% accuracy on MNIST 39 Milestones: a race to 100% accuracy on MNIST 40 Case study 2: sketch recognition 41 Case study 2: sketch recognition • Convolutional neural network Scope Transf. Fun. Gain Sum Sine wave … Or Convolution Sub-sampling Product Matrices Element of a vector (LeCun, 1998) 42 Case study 2: sketch recognition 43 Case study 2: sketch recognition 44 Case study 3: autonomous driving Pomerleau, 1995 45 Case study 4: sketch beautification Orbay and Kara, 2011 46 Case study 4: sketch beautification 47 Case study 4: sketch beautification 48 Research forefront • Deep belief network – Critique, or classify – Create, synthesize Demo at: http://www.cs.toronto.edu/~hinton/adi/index.htm 49 In summary 1.Powerful machinery 2.Feed-forward 3.Back propagation 4.Design considerations 50

Learning Functions and Neural Networks II

Related documents

Products

Support

Learning Functions and Neural Networks II

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib