Training Neural Networks — CS24
Cost · Gradient Descent · Backpropagation
Step 1 / 11
w₁
w₂
w₃
w₄
w₅
w₆
w₇
w₈
w₉
∂C/∂w₁
∂C/∂w₂
∂C/∂w₃
∂C/∂w₄
∂C/∂w₅
∂C/∂w₆
∂C/∂w₇
∂C/∂w₈
∂C/∂w₉
x₁
x₂
h₁
h₂
h₃
ŷ
1.0
0.0
0.61
0.37
0.14
0.72
y = 1.0
(true label)
ŷ = 0.72
y = 1.0
err = −0.28
← error flows backward (backpropagation) →
Input Layer
Hidden Layer
Output Layer
C (W, b)
Cost Function
W
(weights)
b
(biases)
(x, y)
training data
C
= 0.47
(a scalar)
C is a single number — one value for the entire network
w
C(w)
high
low
w*
slope = ∂C/∂w > 0
decrease w
w ← w − η·(∂C/∂w)
One Training Step
Forward Pass
compute ŷ from current W and b
Compute Loss C
C = (1/n) ∑ (ŷᴵ − yᴵ)²
Backpropagation
compute ∇C for every W and b
Update W and b
W ← W − η∇W, b ← b − η∇b
repeat
← Prev
Next →
← → arrow keys