Logistic Regression: Cost Function
To train the parameters π€ and π, we need to define a cost function.
Recap:
π¦Μ (π) = π(π€ π π₯ (π) + π), where π(π§ (π) )=
π₯ (π) the i-th training example
1
(π)
1+ π −π§
πΊππ£ππ {(π₯ (1) , π¦ (1) ), β― , (π₯ (π) , π¦ (π) )}, π€π π€πππ‘ π¦Μ (π) ≈ π¦ (π)
Loss (error) function:
The loss function measures the discrepancy between the prediction (π¦Μ (π) ) and the desired output (π¦ (π) ).
In other words, the loss function computes the error for a single training example.
πΏ(π¦Μ (π) , π¦ (π) ) =
1
(π¦Μ (π) − π¦ (π) )2
2
πΏ(π¦Μ (π) , π¦ (π) ) = −( π¦ (π) log(π¦Μ (π) ) + (1 − π¦ (π) )log(1 − π¦Μ (π) )
•
•
If π¦ (π) = 1: πΏ(π¦Μ (π) , π¦ (π) ) = − log(π¦Μ (π) ) where log(π¦Μ (π) ) and π¦Μ (π) should be close to 1
If π¦ (π) = 0: πΏ(π¦Μ (π) , π¦ (π) ) = − log(1 − π¦Μ (π) ) where log(1 − π¦Μ (π) ) and π¦Μ (π) should be close to 0
Cost function
The cost function is the average of the loss function of the entire training set. We are going to find the
parameters π€ πππ π that minimize the overall cost function.
π
π
π=1
π=1
1
1
π½(π€, π) =
∑ πΏ(π¦Μ (π) , π¦ (π) ) = − ∑[( π¦ (π) log(π¦Μ (π) ) + (1 − π¦ (π) )log(1 − π¦Μ (π) )]
π
π