What it is: Transforms any number into a probability between 0 and 1.
Why it's useful:
Note: When z=0, output is exactly 0.5 (decision boundary).
Note: Same as linear regression, but fed into sigmoid.
What it is: Loss for one prediction. Penalizes confident wrong predictions heavily.
Note: Uses \(\ln\) (natural log). Loss approaches infinity as prediction approaches wrong extreme.
What it is: Average Log-Loss across all training examples. The cost function to minimize.
Why it's useful:
Also known as: Log-Loss, Negative Log-Likelihood.
What it is: Mean Squared Error applied to sigmoid output.
Why it's BAD:
Rule: Use MSE for regression, Log-Loss for classification.
What it is: Direction to update weights to minimize loss.
Why it's useful:
Update rule: \(w := w - \alpha \cdot \frac{\partial J}{\partial w}\)
What it is: Converting probability to class label using threshold.
Why it's useful:
Note: Decision boundary is where \(z = 0\).