# Machine Learning: Logistic Regression

*Logistic regression* is a classification case of linear regression whith
dependent variable $y$ taking binary values.

Problem: Given a training set $\langle x^{(i)}, y^{(i)} \rangle$, $1 \le i \le m$,
$x \in \mathbb{R}^{n+1}$, $x^{(i)} _ 0 = 0$, $y^{(i)} \in $ {0,1},
find *classification function*

## Gradient Descent

Let’s build function $h_\theta(x)$ as a sigmoid function of $\theta\cdot x$

Sigmoid function has rank infinity, i.e. it operates on scalars, vectors and matrices.

To find optimal parameter $\theta \in \mathbb{R}^{n+1}$ we are going to use optimized gradient descent method which takes as arguments cost function $J(\theta)$ and its gradient. For logistic regression they are

where $X = (x^{(i)}_j) _{m \times n+1}$ is a matrix of the training examples from the previous lecture.

Analogous to linear regression, logistic regression can be regularized too

Having computed $\theta$ we can now implement the prediction function

which can be used to classify new examples and check the prediction accuracy on the training set

## Multi-class Classification

Logistic regression works for binary $y$.
Suppose now that $y^{(i)} \in ${$1,…,K$}, where $K > 2$.
In this case we can use *One-vs-All* variation of this algorithm.

Step 1. Convert vector $y$ into a binary matrix $Y$

where $y^{(i)}_k = \delta _{k y^{(i)}}$, i.e. $y^{(i)}_k = 1$ when $y^{(i)} = k$, otherwise $y^{(i)}_k = 0$.

Step 2. Train logistic classifier on every column of matrix $Y$. The result will be a matrix $\Theta = (\theta_{jk})_{n+1 \times K}$

Step 3. For any given vector $x$ compute vector $h = x^T \Theta$. Then the predicted value $y$ will be

To compute accuracy of the one-vs-all classifier on the training set
use `accuracy.m`

script from above with modified `predict.m`