Logistic Regression is a widely used algorithm in supervised machine learning for binary classification tasks. It's used to model the probability of a binary outcome (1/0, Yes/No, True/False) based on one or more independent variables.
Logistic Regression Algorithm:
Logistic Regression uses the logistic function (also known as the sigmoid function) to model the probability that a given input belongs to a particular class. The logistic function is defined as:
P(Y=1?X)=1+e?(b0?+b1?X)1?
Where:
The logistic regression algorithm estimates the values of b0? and b1? based on the training data, and the model is then used to make predictions on new, unseen data.
Simple Python Project using Logistic Regression:
# Import necessary libraries import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrix # Load the dataset (assuming it's in a CSV file with 'exam1', 'exam2', and 'admitted' columns) data = pd.read_csv('student_data.csv') # Prepare the data X = data[['exam1', 'exam2']] # Features: exam1 and exam2 scores y = data['admitted'] # Target variable: Admission status (1 for admitted, 0 for not admitted) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create and train the logistic regression model model = LogisticRegression() model.fit(X_train, y_train) # Make predictions on the test set y_pred = model.predict(X_test) # Evaluate the model accuracy = accuracy_score(y_test, y_pred) conf_matrix = confusion_matrix(y_test, y_pred) print(f"Accuracy: {accuracy*100:.2f}%") print("Confusion Matrix:") print(conf_matrix)
In this project, we load a dataset with two exam scores and admission status, split it into training and testing sets, and then train a logistic regression model. We evaluate the model's accuracy and display a confusion matrix to assess its performance.
Ensure you have a CSV file with the dataset (student_data.csv) before running this code.