## Logistic Regression

Instead of predicting exactly 0 or 1, logistic regression generates a probability - a value between 0 and 1, exclusive.

### Sigmoid Function

It’s defined as: $$ y = \frac{1}{1 + e^{-z}}$$

The sigmoid function yields the following plot:

### Loss Function for Logistic Regression

The loss function for linear regression is square loss. The loss function for logistic regression is Log Loss, which is defined as follows:

$$ \log loss = \sum_{(x,y)\in D} -y\log(y\prime) - (1 - y)\log(1 - y\prime) $$

where: - $ (x,y) \in D $ is the data containing many labeled examples, which are (x,y) pairs. - $ y $ is the label in a labeled example. Since this is logistic regression, every value of $ y $ must either be 0 or 1. - $ y \prime $ is the predicted value (somewhere between 0 and 1), given the set of features in x.

**The equation for Log Loss is closely related to Shannon’s Entropy measure from Information Theory. It is also the negative logarithm of the likelihood function, assuming a Bernoulli distribution of $ y $. Indeed, minimizing the loss function yields a maximum likelihood estimate.**

### Regularization in Logistic Regression

Most logistic regression models use the following two strategies to dampen model complexity:

- L2 regularization.
- Early stopping, that is, limiting the number of training steps or the learning rate.

## Summary

- Logistic regression models generate probabilities.
- Log Loss is the loss function for logistic regression.
- Logistic regression is widely used by many practitioners.

Note: Cover Picture