Learning Auto Learning: Logistic Regression

This is component 2 of learning machine learning introductory concepts. Recall that supervised learning had ii basic examples, regression as well as classification. We covered linear regression inwards component 1, as well as instantly inwards component 2 nosotros hold off at classification. Although the bring upwardly of the technique used here, logistic regression, includes the give-and-take "regression", this is inwards fact a classification algorithm. It builds on a like slope descent approach equally nosotros discussed inwards component 1 inwards the context of linear regression.

(In this post, i time again I follow/summarize from Andrew Ng's machine learning course of written report at Coursera. Here is Ng's course of written report fabric for CS 229 at Stanford. There are equally good good course of written report notes here, as well as I volition summarize fifty-fifty to a greater extent than briefly than those notes to highlight solely the big ideas.)

Hypothesis representation

The destination of the logistic regression algorithm is to determine what shape a novel input should autumn into. Here is an instance application. See, occupation plumbing equipment does non brand feel for this application. We necessitate discrete classification into aye or no categories.


For linear regression, our hypothesis representation was of the shape $h_\theta(x) = (\theta x)$. For classification, our hypothesis representation is of the shape $h_\theta(x) = g((\theta x))$, where nosotros define $g(z)= \frac{1}{(1 + e^{-z})}$. This is known equally the sigmoid function, or the logistic function. For a existent value $z$, the logistic component has the next plot.


If $z$ is positive, $g(z)$ is greater than 0.5. In our logistic regression hypothesis, nosotros receive got $z = (\theta x)$, so when  $\theta x \geq 0$, as well as so $h_\theta \geq 0.5$ as well as the hypothesis predicts $y=1$. When $\theta x \leq 0$ as well as so the hypothesis predicts $y=0$.

In other words, $\theta x \geq 0$  is the decision boundary. When our hypothesis $h_\theta(x)$ outputs a number, nosotros process that value equally the estimated probability that y=1 on input x.

If our hypothesis is linear, of the shape $h_\theta(x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_2)$, the determination boundary would hold upwardly a line. For example:


If our hypothesis is polynomial, $h_\theta(x) = g(\theta_0 + \theta_1 x_1 + \theta_2 x_1^2 + \theta_3 x_2^2)$ , the determination boundary tin hold upwardly a circle. (By using higher social club polynomial terms, you lot tin acquire fifty-fifty to a greater extent than complex determination boundaries.) For example:


OK, assuming nosotros had decided on our hypothesis, how does the logistic regression algorithm acquire values for plumbing equipment $\theta$ to the information to capture the information nicely inwards the determination boundary? We i time again usage slope descent, only this fourth dimension a niggling differently equally follows.

Cost component for logistic regression

Since $h_\theta(x) = h_\theta(\frac{1}{(1 + e^{-x})})$ is a sigmoid/nonlinear function, when nosotros plug this inwards the terms function, nosotros don't know if the terms component volition hold upwardly convex or not.  However, the terms component should hold upwardly convex for the slope descent to work. So nosotros usage a trick, nosotros define our terms component carefully to brand certain when $h_\theta(\frac{1}{(1 + e^{-x})})$  is plugged inwards the terms function, the component is nevertheless a convex function.

We define our terms component as:

Note that:
terms (1)= 0 if y=1, else it is infinity
terms (0)=0 if y=0, else it is infinity

In other words, this terms component harshly penalizes as well as hence aims to dominion out real confident mislabels; mislabels tin nevertheless receive got lukewarm 0.6 confidence because the penalization is less there.

The inwards a higher house is the terms for a unmarried example. For binary classification problems y is e'er 0 or 1, as well as using this, nosotros tin receive got a simpler vogue to write the terms function, as well as compress it into i equation equally follows.


Gradient descent for logistic regression

We usage slope descent to minimize the logistic regression terms function. As described earlier the slope descent algorithm repeatedly does the next update $\theta_j := \theta_j - \alpha \frac{\partial}{\partial \theta_j} J(\theta)$,

where $\frac{\partial}{\partial \theta_j} J(\theta)= \sum_{i=1}^m (h_\theta (x^i)-y^i)*x_j^i$.

Multiclass classification problems

We tin adopt this singleclass logistic regression persuasion for solving a multiclass classification employment using i vs. all approach: To produce k classifications, dissever the preparation ready into k separate binary classification problems.

0 Response to "Learning Auto Learning: Logistic Regression"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel