Report
Report
PROJECT: OPTIMIZING
PRODUCTION EFFICIENCY
TEAM NAMES
Ahmed Mamdouh Khaled Saleh => 22012089
Mohammed Alaa Mohammed Abd-El Wahab => 2202123
Omar El-Said Mohamed El-Said =>2202122
Osama El-Said Abd-El Salam =>20221466241
Abdulrahman Ahmed Ali => 20191322564
PROJECT INTRODUCTION
The "House Price Analysis and Prediction" project aims to explore
the dynamics of the real estate market by analyzing housing data
to identify key factors influencing property prices. The dataset
contains detailed information about various property attributes,
including size, location, amenities, and detected defects.
The primary objective of this project is to:
1. Understand the relationships between property features and
prices.
2. Develop predictive models to estimate house prices based
on specific features.
3. Provide insights into optimizing property valuations for
buyers, sellers, and real estate investors.
Through this analysis, the project seeks to uncover trends and
patterns in the housing market, offering data-driven
recommendations to stakeholders.
THE CODE
## IMPORT REQUIRED LIBRARIES
numpy: A fundamental library for numerical computations in Python. It
provides support for arrays, matrices, and mathematical operations.
STYLING
sns.set_style: Sets the theme for Seaborn visualizations. The
'ticks' style adds ticks to the axes, making the plots more readable
and polished
PURPOSE
This setup prepares the environment for:
1. Data Loading and Preprocessing: Using NumPy and
Pandas.
2. Visualization: With Matplotlib and Seaborn.
3. Model Training and Evaluation: Using scikit-learn for
splitting data, scaling, and applying regression models.
4. Error Metrics: To evaluate and compare model
performance using MAE and MSE.
## LOADING DATA
pd.read_csv: A function from the Pandas library that reads a
CSV (Comma-Separated Values) file and loads it into a
DataFrame.
'D:Manufacturing Analytics\Final Project\Housing.csv': The
file path to the CSV file
PREPROCESSING
The code checks for missing (null) values in the dataset and
provides a summary of how many are present in each column.
PERFORMING SCALING
The fit() method trains the model by adjusting the parameters (coefficients)
to best fit the data, minimizing the error between predicted and actual
values.
MAE is another metric that measures the average of the absolute differences between the actual values
and the predicted values. Unlike MSE, MAE does not square the differences, which makes it less
sensitive to large errors (outliers).The code calculates the MAE, which provides an idea of the average
size of the errors between the predicted values and the actual values, without giving extra weight to larger
errors.
This cell runs the KNN model on the test data (test_x) and stores the predicted house prices in the
variable knn_predictions. These predictions can be further evaluated (e.g., using Mean Squared Error or
Mean Absolute Error) to assess the performance of the model.
This cell visualizes the performance of the KNN model by plotting a scatter
plot of actual vs. predicted house prices. The blue dots represent the
predicted vs. actual values, and the red dashed line represents the ideal fit
where predicted values match the actual values exactly.
If the points are scattered close to the red line, it indicates that the model is
making accurate predictions. The plot helps you assess the model's
prediction performance visually.
COMPARING THE MODELS EFFICIENCY
The R-squared (R²) score is used to assess how well the models
predict the target variable (house prices). It measures the proportion
of the variance in the target variable that is explained by the model.
A bar plot is generated to visually compare the R² scores of both
models, helping to identify which model explains more of the variance
in the data.
The model with the higher R² score is the better performer in terms of
predicting the house prices.
OBJECTIVES
DETECT BOTTLENECKS IN PRODUCTION PROCESSES
Definition of Bottlenecks: Bottlenecks in production processes refer to
steps or stages where the flow of operations slows down or gets delayed,
thus limiting the overall throughput of the process. In data processing,
bottlenecks
can occur in various stages, such as data loading, preprocessing, model
training, and prediction, which can significantly affect performance.
This function measures the execution time of any function (func) passed to
it. It records the time before and after the function is executed, calculates
the duration, and returns the result of the function along with the time take
This section measures the time taken to load the data from the specified
CSV file using pd.read_csv(). Data loading can sometimes be a bottleneck,
especially with large datasets.
This section measures the time spent on preprocessing the data, including
checking for missing values, duplicates, and generating descriptive
statistics. Data preprocessing can often take time, depending on the
dataset's size and complexity
This section measures the time taken to scale the features of the dataset
using the MinMaxScaler. Scaling is important in machine learning,
especially for models sensitive to feature scaling (like KNN or gradient-
based algorithms), but it can be time-consuming for large datasets.
This section measures the time taken to train a linear regression model on
the training data. Model training is often one of the most time-consuming
steps, especially with complex models and large datasets
This section measures the time taken to make predictions using the trained
model on the test data. Prediction times can also be an issue, especially in
real-time applications.
The bar plot compares the time taken for each stage of the machine
learning pipeline. The longer the time, the more likely it is a bottleneck.
After visualizing the execution times for each step in the machine learning
pipeline, it appears that preprocessing is the stage that takes the most
time
The relevant features for prediction are chosen (i.e., area, bedrooms,
bathrooms, stories, parking, price), and the target variable is set to
defects.
o The data is split into training and testing sets (70% for training,
30% for testing). This allows the model to be trained on one
subset of the data and evaluated on another to assess its
performance.
Step 4: Train the Linear Regression Model
1. Model Training:
1. Predicting Defects:
o The trained model is used to predict the defects on the test data
(X_test).
Steps Taken:
Key Insights:
The code visualizes the number of defects detected in each of the three
defect categories: Low Price, Inefficient Layout, and Missing Features.
This is done using a bar chart, which gives a clear and intuitive
representation of the defects across different categories.
Actionable Insight:
Parking Analysis:
Actionable Insights:
Machine Downtime: This metric reflects how much of the machine's time is spent idle. In the
given example, the downtime is simply the remaining time after the active tasks are completed:
Final Insights:
Machine Utilization: The machine was actively used for 35% of the
time for training and prediction, which is a key performance indicator.
A low utilization percentage suggests potential inefficiencies, where
the machine could be used more productively.
Machine Downtime: The machine was idle for 65% of the time,
indicating there may be room for optimization. This downtime could
be due to factors like unoptimized processes or delays in training and
prediction phases.
Actionable Recommendations:
Explanation of Simulation:
2. Simulation Steps:
3. Investment Decisions: Real estate investors can use this information to make
more informed decisions. It allows them to assess which types of properties
(based on bedroom count) offer the best value or return on investment.
4. Price Forecasting: The chart can help in predicting future price trends based on
the number of bedrooms, providing useful data for appraisers and market
analysts.
2. Market Insights: By displaying the average prices for each category, you gain
insights into how the location (prefarea) affects the price.
3. Decision Making: For both buyers and sellers, understanding the impact of
location on pricing can help in making informed decisions about purchasing or
listing properties.