Machine Learning: Logistic Regression
Logistic regression is a classification case of linear regression whith dependent variable $y$ taking binary values.
Problem: Given a training set $\langle x^{(i)}, y^{(i)} \rangle$, $1 \le i \le m$, $x \in \mathbb{R}^{n+1}$, $x^{(i)} _ 0 = 0$, $y^{(i)} \in $ {0,1}, find classification function
Gradient Descent
Let’s build function $h_\theta(x)$ as a sigmoid function of $\theta\cdot x$
Sigmoid function has rank infinity, i.e. it operates on scalars, vectors and matrices.
To find optimal parameter $\theta \in \mathbb{R}^{n+1}$ we are going to use optimized gradient descent method which takes as arguments cost function $J(\theta)$ and its gradient. For logistic regression they are
where $X = (x^{(i)}_j) _{m \times n+1}$ is a matrix of the training examples from the previous lecture.
Analogous to linear regression, logistic regression can be regularized too
Having computed $\theta$ we can now implement the prediction function
which can be used to classify new examples and check the prediction accuracy on the training set
Multi-class Classification
Logistic regression works for binary $y$. Suppose now that $y^{(i)} \in ${$1,…,K$}, where $K > 2$. In this case we can use One-vs-All variation of this algorithm.
Step 1. Convert vector $y$ into a binary matrix $Y$
where $y^{(i)}_k = \delta _{k y^{(i)}}$, i.e. $y^{(i)}_k = 1$ when $y^{(i)} = k$, otherwise $y^{(i)}_k = 0$.
Step 2. Train logistic classifier on every column of matrix $Y$. The result will be a matrix $\Theta = (\theta_{jk})_{n+1 \times K}$
Step 3. For any given vector $x$ compute vector $h = x^T \Theta$. Then the predicted value $y$ will be
To compute accuracy of the one-vs-all classifier on the training set
use accuracy.m
script from above with modified predict.m