Final Project Report
Final Project Report
Introduction
1.1 Overview: -
The Heart Disease Prediction project leverages machine learning to assess an individual’s risk of
heart disease based on medical and lifestyle factors. It uses a dataset with key attributes like age,
cholesterol levels, blood pressure, and more to train predictive models such as logistic regression,
decision trees, or neural networks. The system provides a user-friendly interface for input and
analysis, helping medical professionals and individuals make informed decisions. This project aims
to enhance early detection, reduce risks, and support healthcare by improving diagnostic accuracy
and preventive care.
Heart disease remains one of the leading causes of mortality worldwide. Early detection and timely
medical intervention are crucial in reducing the risk and severity of heart-related complications. In
this project, we aim to develop a Heart Disease Prediction System using Machine Learning
techniques to assist in the early identification of patients who are at risk.
This project leverages a dataset containing medical parameters such as age, gender, blood pressure,
cholesterol levels, maximum heart rate, chest pain type, and other clinical attributes. By applying
various machine learning algorithms on this data, we build a model that can accurately predict the
presence of heart disease based on input health metrics.
To analyze the correlation between various medical parameters and heart disease.
Toevaluate model accuracy and performance using metrics such as confusion matrix,
accuracy score, and heatmaps.
Toprovide a user-friendly interface that takes patient data as input and displays prediction
results visually through probability charts and graphs.
Page | 1
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
The objective of the Heart Disease Prediction project is to develop a reliable system that utilizes
machine learning algorithms to predict the likelihood of heart disease in individuals. By analysing
medical data such as age, blood pressure, cholesterol levels, and lifestyle factors, the system aims
to provide early detection and risk assessment. This project helps healthcare professionals make
informed decisions, enabling timely intervention and prevention. Additionally, it enhances
awareness among individuals about their heart health, ultimately reducing mortality rates through
proactive measures and improved healthcare management.
a) Operating System:
Since Python and machine learning libraries are cross-platform, you can develop and run this
project on any OS.
Page | 2
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
Heart disease is one of the leading causes of death worldwide. Early detection can help in taking
preventive measures and providing timely treatment. This project aims to predict the probability of
heart disease using Machine Learning (ML) by analyzing patient data stored in a CSV file.
1. Feasibility Study
The feasibility study evaluates whether the project is practical, achievable, and beneficial.
This project aims to predict the probability of heart disease using machine learning models
applied to patient data stored in a CSV file. The study considers multiple factors such as cost,
technology, and resources.
Technical Viability: Uses widely available Python libraries like Scikit-Learn, Pandas, and
Flask/Streamlet for visualization.
Operational Viability: Can be implemented in hospitals, clinics, or as a personal health
assessment tool.
Economic Viability: Requires only a computer with Python installed, making it cost-
effective.
2. Economic Feasibility
Economic feasibility ensures that the project is financially viable and does not require
excessive investment.
Hardware Requirements: A basic computer (Windows/Linux/Mac) with at least 8GB RAM
is sufficient.
Software Requirements: Free and open-source tools such as Python, Jupyter Notebook, and
Scikit-learn reduce costs.
Implementation Costs: Minimal, as Python libraries are free. If deployed on a cloud server,
a small hosting fee may apply.
Return on Investment (ROI): High, as it helps in early disease detection, reducing
healthcare costs for patients and hospitals.
Page | 3
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
3. Technical Feasibility
Technical feasibility examines whether the technology used in the project is sufficient for
implementation.
Programming Language: Python (widely supported and has rich ML libraries).
Machine Learning Algorithms:
o Random Forest (for high accuracy)
o Logistic Regression (for probability estimation)
o Neural Networks (for advanced predictions)
Dataset: Uses CSV files with patient medical records.
4. Operational Feasibility
Operational feasibility determines whether the project can function effectively in real-world
scenarios.
User-Friendly Interface: A web-based interface using Flask/Streamlit makes it easy to use.
Accuracy & Performance: Achieves over 80% accuracy with proper ML model tuning.
Integration with Healthcare Systems: Can be connected to hospital databases for real-time
predictions.
Scalability: Can handle large patient datasets and be expanded with cloud deployment for
remote access.
5. Resource Availability
This section ensures that all necessary resources (hardware, software, and human expertise)
are available.
Hardware Requirements:
o Computer with at least 8GB RAM, i5/i7 processor (for faster model training).
o Optional: GPU (if using deep learning models).
Software & Tools:
o Python (Programming language)
o Scikit-Learn, Pandas, NumPy, Matplotlib (For ML and data analysis)
o Flask/Streamlit (For web-based UI)
Page | 4
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
6. Requirement Analysis
Functional Requirements:
o Import and process patient data from a CSV file
o Train machine learning models
o Predict the probability of heart disease
o Display results in a readable format (text/graph)
Non-Functional Requirements:
o Accuracy and reliability of predictions
o Fast processing time
o Easy-to-use interface (CLI or Web UI)
o Secure handling of patient data (if sensitive data is used)
7. Design Modules
Page | 5
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
Cost Estimation
Software cost comprises a relatively small portion of the overall cost of a computer-based heart
disease prediction system. However, it is a crucial component that needs accurate estimation for
efficient resource allocation and project planning.
A number of factors were taken into account while estimating the cost of this project, including:
o Human Resources (developers, data scientists, testers),
o Technical Requirements (skills, tools, libraries),
o Hardware and Software Availability (laptops, internet, required s/w),
Page | 6
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
o Project Complexity and Model Evaluation Time.
1) Effort Estimation
Effort estimation refers to the total person-hours required for the successful completion of the
Heart Disease Prediction project. This includes activities such as:
o Requirement gathering and analysis
o Dataset collection and preprocessing
o Model selection and training
o Front-end and back-end development
o Integration of ML model with the application
o Testing and debugging
o Preparation of documentation and user manual
The hardware requirement estimation includes the cost of all physical components used during
the development of the Heart Disease Prediction system. This consists of the following:
Development Machines:
o Personal Computers with minimum configuration:
Intel i5 Processor
8 GB RAM
512 GB SSD
GPU (optional for model training acceleration)
Internet Connectivity:
o Required for dataset download, library installations, and deployment.
Page | 7
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
DFD Level 0
DFD Level 1
Page | 8
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
ER-Diagram
+--------------+ +-------------+
| Patient |◄────────────┐ | Visualization|
+--------------+ │ +-------------+
/ Attributes \ │ ▲
/ ID, Age, BP... \ │ │
/ \ ▼ │
+----------------+ +------------+ "Generated From"
| CSV Database |──────▶| Prediction |◄──────────────────┐
+----------------+ +------------+ │
▲ │
│ "Has" │
▼ │
+---------+ │
| Model |◄─────────────────────┘
+---------+
(Algorithm,
Page | 9
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
Accuracy,
etc.)
❖ Activity
Diagram:
Page | 10
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
System Implementation
Field Data
Description
Name Type
age Integer Age of the patient in years
sex Integer Gender (1 = male; 0 = female)
Chest pain type (0 = typical angina, 1 = atypical angina, 2 =
cp Integer
non-anginal pain, 3 = asymptomatic)
trestbps Integer Resting blood pressure (in mm Hg)
chol Integer Serum cholesterol in mg/dl
fbs Integer Fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
Page | 11
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
Field Data
Description
Name Type
Resting electrocardiographic results (0 = normal, 1 = ST-T
restecg Integer
abnormality, 2 = left ventricular hypertrophy)
thalach Integer Maximum heart rate achieved
exang Integer Exercise-induced angina (1 = yes; 0 = no)
oldpeak Float ST depression induced by exercise relative to rest
Slope of the peak exercise ST segment (0 = upsloping, 1 = flat,
slope Integer
2 = downsloping)
ca Integer Number of major vessels colored by fluoroscopy (0–3)
Thalassemia (1 = normal, 2 = fixed defect, 3 = reversible
thal Integer
defect)
Diagnosis of heart disease (0 = no disease; 1 = presence of
target Integer
disease)
b. Data Preprocessing
Missing Value Handling: Fill/drop missing or null values.
Encoding: Convert categorical data to numerical using Label.
Feature Scaling: Normalize data using StandardScaler or MinMaxScaler.
Splitting: Dataset is split into training and test sets using train_test_split().
d. Prediction Module
Takes the processed user input.
Loads the trained model.
Predicts whether the person is likely to have heart disease.
Returns result as:
o "Positive" (Likely to have heart disease)
o "Negative" (Unlikely to have heart disease)
e. Visualization Module
Bar Chart: Displays prediction probability.
Heatmap: Shows correlation between different medical features (using seaborn or
matplotlib).
Page | 13
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
o Accuracy, Precision, Recall, F1-score
Confusion matrix visualized to show TP, FP, TN, FN.
2. Co-relation Module
3. Accuracy Module
3.3 Design
Page | 15
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
2. Co-relation Module
Page | 16
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
3. Accuracy Module
Page | 17
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
Testing
Page | 20
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
4. Testing report
4.1 Importance of Testing
Testing is a critical phase in software development, ensuring that the system performs as
expected and is free of bugs. In the Heart Disease Prediction project, testing is vital to:
o Validate the accuracy of predictions made by the machine learning model.
o Ensure data is correctly read and processed from the CSV file.
o Verify that visual components (bar chart, heat map) work correctly.
o Confirm that the application runs without errors and handles edge cases.
o Build confidence in the system’s reliability and accuracy in real-world scenarios.
Page | 21
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
o Black Box Testing
Testing without knowledge of internal code — used to validate predictions and output
visuals.
o White Box Testing
Testing with understanding of internal logic — used to verify machine learning model logic
and data flow.
o Unit Testing
Individual components (e.g., data loading, model prediction, accuracy function) are tested
separately.
o Integration Testing
Ensures modules (data + model + visualization) work together seamlessly.
Test Case
TC ID Input Expected Output Status
Description
Load CSV file Valid heart.csv Data frame loads with correct
TC_01 Pass
successfully file number of records
Handle file not
TC_02 Invalid file path Display error message Pass
found error
Handle missing/null CSV with Nulls handled or rows
TC_03 Pass
values missing data dropped
Module 3: Visualization
Page | 23
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
Future Scope
The Heart Disease Prediction system using Python and Machine Learning presents great potential for
Page | 24
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
growth and improvement. Below are some of the key future scopes for enhancing the project:
Bibliography
Page | 25
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
Bibliography
1. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., ... &
Duchesnay, É. (2011).
Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
https://scikit-learn.org/
2. Hunter, J. D. (2007).
Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90–95.
https://matplotlib.org/
3. Python Software Foundation.
Python Language Reference, Version 3.9.
https://www.python.org/
4. Various online tutorials and blogs on machine learning and data visualization in Python, including:
o Towards Data Science (https://towardsdatascience.com)
o Kaggle Notebooks (https://www.kaggle.com)
Conclusion
The Heart Disease Prediction project effectively demonstrates how Python and Machine Learning
Page | 26
Department of Information Technology, NSCOEM, Korti-Pandharpur
Heart Disease Prediction
algorithms can be utilized to analyse patient health data and predict the probability of heart disease.
The system leverages a CSV dataset containing various patient health measurements such as age,
cholesterol level, blood pressure, chest pain type, and more.
By preprocessing the data and applying a suitable Machine Learning model, the system predicts
whether a patient is likely to have heart disease. A bar chart is used to visualize the number of
patients with and without heart disease, providing a clear graphical insight into the dataset.
Additionally, a heatmap is generated to show the correlation between different medical features,
helping to understand which factors most influence heart disease prediction.
Finally, the system evaluates the performance of the model by calculating and displaying its accuracy
score, which gives an indication of how well the model is performing on unseen data.
Overall, this project highlights the importance and effectiveness of machine learning in the healthcare
domain, offering a supportive tool for early diagnosis and better decision-making in the prevention and
treatment of heart disease.
Page | 27
Department of Information Technology, NSCOEM, Korti-Pandharpur