Multiple Linear Regression Project
This project demonstrates how to create a Multiple Linear Regression model to predict car prices based on multiple features. You can further enhance this project by:
1. Collecting more diverse and extensive car data.
2. Feature engineering to create new relevant features.
3. Implementing feature scaling or normalization for better model performance.
4. Trying other regression techniques like Polynomial Regression or Regularization.
Project:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Explore the dataset
print(data.head())
print(data.describe())
# Check for missing values
print(data.isnull().sum())
# Select relevant features (you can adjust this based on your dataset)
X = data[['Horsepower', 'EngineSize', 'Weight', 'FuelEconomy']].values
y = data['Price'].values
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create a linear regression model
model = LinearRegression()
# Fit the model to the training data
model.fit(X_train, y_train)
# Predict car prices on the test set
y_pred = model.predict(X_test)
# Calculate and print metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse:.2f}')
print(f'R-squared (R2): {r2:.2f}')
# Visualize the regression line for one of the features (e.g., Horsepower)
plt.scatter(X_test[:, 0], y_test, color='blue')
plt.scatter(X_test[:, 0], y_pred, color='red', alpha=0.5)
plt.xlabel('Horsepower')
plt.ylabel('Price')
plt.title('Car Price Prediction')
plt.show()
# Predict the price of a new car with the following features
new_car_features = np.array([[200, 2.5, 3500, 25]])
predicted_price = model.predict(new_car_features)
print(f'Predicted Price: ${predicted_price[0]:.2f}')