In our previous chapters, we mainly discussed about the Linear Regression model where the target variable to be predicted is continous in nature and there is a linear relationship between the independent and target varables. But how to predict a discrete varaible based uopn the predictors which are linearly related with the target. In this case Logistic Regression comes to rescue. In this article, we will mainly focus on this predictive model and know the inner engineering of this model.
So, What is Logistic Regression?
Logistic Regression is basically a predictive analysis technique used to predict a categorical dependent variable. It predicts the probability of occurrence of an event by fitting data to a logit function. As it’s a regression technique, which means we have to fit a straight line through the data in order to classify it. But again, as it is used to solve mainly classification problems, we have to keep the probability range of classification between 0 and 1. Hence we take the help of logit function or link function, which basically scales the probability values and keeps them in range of 0 and 1.
Now what is a Logit function? Logit basically belongs to Sigmoid class of functions, which are mainly defined as any “S” shaped curve (Similar functions inside sigmoid are tanh (ranges from -1 to 1 thru 0). Instead of logit function, we can also use probit and complimentary log-log functions. But logit results are mostly interpretable. Hence widely used. Equation of logit function is given by;
This Logit function is mainly responsible for bringing down the predictions ranging from -Inf to +Inf to within the range of 0 and 1.Similarly if we calculate (1-p); the equation will look like the below:
Now if we divide p and (1-p), it gives the odds ratio, i.e. The ratio probability of happening of an event to the probability of not happening of the event. It is otherwise termed as the risk factor.
The Logistic function is always described as the Log of this odds ratio. By doing this it makes us easy to interpret, what variable in particular is impacting the target variable by how many units. For getting this log of odds, we just need to apply “log” on both sides of the left hand side equation. Hence, it becomes;
As we can see in the above Logistic Function formula, it becomes very easy to explain which factor is exactly impacting the business, which is the basic advantage of any regression model. i.e. Explicability.
Now, what is a probit function then? Logit and Probit are basically two function which do the same task of bringing all the outcome probabilities of logistic regression between 0 and 1. But the way they do is different. Probit mainly integrates the standard normal distribution curve of the dependant variable Y, from −∞ to value of Y.
Here, ϕ(z) is the standard normal distribution curve for y. Hence, by integrating from ∞ to −514 (Let’s say for any negative Y), it calculates it to 0. Similarly by integrating from ∞ to 871 (Let’s say for any positive Y), it calculates it to +1.
However, Sometimes its not possible to get always a standard normal distribution curve for Y. In those cases we have to use Logit instead of probit. Also, logit results are mostly interpretable. Hence Logit is the mostly used function.
Now all the above process will be applied once we get the probabilities. Or in other words, after we get the straight line meant to classify the data. From Linear regression, we know that in order to get the straight line, we need correct coefficients of the line. Now in case of Logistic Regression the coefficients are calculated using another method which is popularly known as “Maximum Likelihood Estimation” method.
Maximum Likelihood Estimation:
Maximum Likelihood Estimation is a method to find out the optimized values of the parameters of the model.The parameters are chosen in such a way that, they will maximize the likelihood that the predicted values are almost same as actuals. These are obtained by differentiating the Log of Joint Probability Density Functions. (i.e. Under which values of the coefficients, the value of Joint PDF will be maximum).
The Partial Derivative of Log of Joint Probability Density Function (X) is taken with respect to the coefficients and equated with 0, which gives us the optimal value of the coefficient, at which the log of Joint PDF is highest. Generally the partial derivative is taken with respect to the variable, which needs to be calculated. If u have 2 parameters to determine, then take the partial derivative of the log of Joint PDF with respect to the first and calculate the first unknown parameter and then take the partial derivative of the log of Joint PDF with respect to the second and calculate the second unknown parameter. This is not the same as Gradient descent as we are taking the first order derivative and equating to zero in a single whole step unlike GD, where we only calculate the first order derivative and update the weights in a iterative manner. (Not a single whole step).
Sometimes in a real world scenario the derivative of the log-likelihood function is still analytically intractable (i.e. it’s way too hard/impossible to differentiate the function by hand). Therefore, iterative methods like Expectation-Maximization algorithms are used to find numerical solutions for the parameter estimates. The overall idea is still the same though. We will discuss more about this new method in out upcoming articles.
In short the main difference between Gradient Descent method and MLE is as below:
Logistic Regression using Python:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
train_x = train_data.drop(columns=[‘DepVar’],axis=1)
train_y = train_data[‘Survived’]
model = LogisticRegression()
# coefficeints of the trained model
print(‘Coefficient of model :’, model.coef_)
# intercept of the model
print(‘Intercept of model’,model.intercept_)
# predict the target on the train dataset
predict_train = model.predict(train_x)
print(‘Target on train data’,predict_train)
# Accuray Score on train dataset
accuracy_train = accuracy_score(train_y,predict_train)
Also, I have posted the codes for Logistic Regression Model using R language in the given github link. The problem statement was to forecast the whether in Seattle using many independent features.
GitHub Link: https://github.com/satyadeepbehera/DS/blob/master/Seattle%20Weather%20Forecast/SeattleWeatherForecast.r
Assumptions/disadvantages of Logistic Regression:
The assumptions of Logistic regression remain the same as the Linear regression model which are given below.
1. There must be a Linear Relationship between the input and target variable.
2. There should not be any multicolinearity present between input variables.
3. All the input and target variables must be normally distributed.
In this article, we mainly focused on the process of Logistic Regression model. In my next post I will covering more about the output analysis of the Logistic model, the accuracy measures of the model and the regularization of the cost function. Please feel free to leave a comment/suggestion in the comment box below. Thanks Everyone. Have a great day !!!