Data Science
Data Science
RESEARCH METHODOLOGY
COM417
LECTURER:
INTRODUCTION
In today's fast-paced and data-rich world, the field of data science has emerged as a powerful means to
extract valuable insights from the sea of information. Data science combines various techniques from
statistics, machine learning, and domain expertise to decipher patterns, make predictions, and drive
informed decision-making. As the volume of data generated continues to skyrocket, data science plays a
pivotal role in uncovering hidden trends and transforming raw data into actionable knowledge.
Data science is not just about working with data; it's about understanding the nuances within it. By
employing a systematic approach, data scientists can navigate the complexities of vast datasets, distilling
meaningful information that was previously hidden beneath the surface. This multidisciplinary field
involves collecting, cleaning, and transforming data, followed by exploratory analysis to identify trends,
anomalies, and correlations.
The heart of data science lies in its ability to construct predictive and prescriptive models. Leveraging
techniques like machine learning, data scientists can build models that learn from historical data to predict
future outcomes or recommend optimal courses of action. These models are adaptable and constantly
improving as new data becomes available.
The applications of data science are diverse and far-reaching. From business and finance to healthcare and
social sciences, virtually every industry benefits from the insights generated by data analysis. For
businesses, data science shapes marketing strategies, optimizes operations, and enhances customer
experiences. In healthcare, it aids in diagnosing diseases, predicting outbreaks, and improving patient care.
In finance, it drives investment decisions, risk assessment, and fraud detection. Moreover, data science fuels
scientific research and societal understanding by analyzing complex patterns in social behavior and natural
phenomena.
However, with great power comes great responsibility. Ethical considerations are paramount in the practice
of data science. Issues related to privacy, bias, and transparency must be addressed to ensure the ethical use
of data and algorithms. Striking the right balance between innovation and accountability is essential to
harness the full potential of data science without compromising societal trust.
DEFINITION
Data science is an interdisciplinary field that
combines scientific methods, algorithms,
processes, and systems to extract valuable
insights, knowledge, and actionable information
from large and complex datasets. It involves the
integration of techniques from statistics,
mathematics, computer science, domain
expertise, and data engineering to address real-
world challenges, make informed decisions, and
uncover hidden patterns within the vast sea of
digital information. As an evolving discipline,
data science plays a pivotal role in transforming
raw data into meaningful narratives, propelling innovation, and driving evidence-based advancements
across various industries in the digital age.
Where,
y is the dependent variable whose value we want to predict.
x is the independent variable whose values are used for predicting the dependent variable.
b0 and b1 are constants in which b0 is the Y-intercept and b1 is the slope.
The main aim of this method is to find the value of b0 and b1 to find the best fit line that will be covering
or will be nearest to most of the data points.
2. Logistic Regression
Linear Regression is always used for representing the relationship between some continuous values.
However, contrary to this Logistic Regression works on discrete values. Logistic regression finds the most
common application in solving binary classification problems, that is, when there are only two possibilities
of an event, either the event will occur or it will not occur (0 or 1). Thus, in Logistic Regression, we convert
the predicted values into such values that lie in the range of 0 to 1 by using a non-linear transform function
which is called a logistic function.
The logistic function results in an S-shaped curve
and is therefore also called a Sigmoid function given
by the equation,
?(x) = 1/1+e^-x
data science algorithm - logistic regressions
The equation of Logistic Regression is,
P(x) = e^(b0+b1x)/1 + e^(b0+b1x)
Where b0 and b1 are coefficients and the goal of
Logistic Regression is to find the value of these
coefficients.
3. Decision Trees
Decision trees help in solving both classification and prediction problems. It makes it easy to understand
the data for better accuracy of the predictions. Each node of the Decision tree represents a feature or an
attribute, each link represents a decision and each leaf node holds a class label, that is, the outcome.
5. KNN
KNN stands for K-Nearest Neighbors. This Data Science algorithm employs both classification and
regression problems.
The KNN algorithm considers the complete
dataset as the training dataset. After training
the model using the KNN algorithm, we try
to predict the outcome of a new data point.
7. Random Forests
Random Forests overcomes the overfitting problem of decision trees and helps in solving both classification
and regression problems. It works on the principle of Ensemble learning.
The Ensemble learning methods believe that a large
number of weak learners can work together for giving
high accuracy predictions.
Random Forests work in a much similar way. It
considers the prediction of a large number of
individual decision trees for giving the final outcome.
It calculates the number of votes of predictions of
different decision trees and the prediction with the
largest number of votes becomes the prediction of the
model.
Let us understand this by an example. random forests data science algorithms. In the above image, there
are two classes labeled as A and B. In this random forest consisting of 7 decision trees, 3 have voted for
class A and 4 voted for class B. As class B has received the maximum votes thus the model’s prediction
will be class B.
REFREENCE
1. Techvidvan. Top 10 Data Science Algorithms You Must Know About:
https://techvidvan.com/tutorials/data-science-algorithms/
2. International Journal of Advanced Research in Science, Communication and Technology (IJARSCT):
https://www.researchgate.net/publication/368879805_Big_Data_Analytics_Unveiling_Insights_and_Opp
ortunities
3. Unveiling Insights in the Digital Age: The Power and Promise of Data Science, by By Diya Singla:
https://vocal.media/education/unveiling-insights-in-the-digital-age-the-power-and-promise-of-data-
science
4. From Data to Insights: The Role of Data Science in Today's Digital Age, by MS Concept:
https://www.linkedin.com/pulse/from-data-insights-role-science-todays-digital-age-ml-concepts-com/
5. From Data to Insights: The Role of Data Science in Today's Digital Age:
https://www.simplilearn.com/tutorials/data-science-tutorial/what-is-data-science