Why Not SSE/MSE for Logistic Regression?

Comparing SSE (Non-Convex, Vanishing Gradients) vs Log-Loss (Convex)

Data Points & Current Sigmoids
Data
SSE Sigmoid
LogLoss Sigmoid
SSE + Sigmoid
Non-Convex: Two Local Minima
$$ J = \sum(y - \sigma(mx+b))^2 $$
Epoch0
Loss0.00
m0.00
Two Valleys: Start left → left, start right → right
Log-Loss + Sigmoid
Convex: Single Global Minimum
$$ J = -\sum[y\log\hat{y} + (1-y)\log(1-\hat{y})] $$
Epoch0
Loss0.00
m0.00
Convex: Smooth bowl, reliable gradients