0% found this document useful (0 votes)

321 views

Regression Analysis in Machine Learning

Regression analysis is a statistical method used to model relationships between variables. It allows prediction of continuous outcomes like sales, based on independent variables like advertising spend. Linear regression finds the linear relationship that best fits the data points to make predictions. It is commonly used for prediction, forecasting, and determining causal effects. Regression comes in many types but all analyze how independent variables impact dependent variables.

Uploaded by

vepowo Landry

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

321 views

Regression Analysis in Machine Learning

Uploaded by

vepowo Landry

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Regression Analysis in Machine learning

Regression analysis is a statistical method to model the relationship between a dependent

(target) and independent (predictor) variables with one or more independent variables.
More specifically, Regression analysis helps us to understand how the value of the
dependent variable is changing corresponding to an independent variable when other
independent variables are held fixed. It predicts continuous/real values such
as temperature, age, salary, price, etc.

We can understand the concept of regression analysis using the below example:

Example: Suppose there is a marketing company A, who does various advertisement

every year and get sales on that. The below list shows the advertisement made by the
company in the last 5 years and the corresponding sales:

Now, the company wants to do the advertisement of $200 in the year 2019 and wants to
know the prediction about the sales for this year. So to solve such type of prediction
problems in machine learning, we need regression analysis.

Regression is a supervised learning technique which helps in finding the correlation

between variables and enables us to predict the continuous output variable based on the
one or more predictor variables. It is mainly used for prediction, forecasting, time series
modeling, and determining the causal-effect relationship between variables.
In Regression, we plot a graph between the variables which best fits the given datapoints,
using this plot, the machine learning model can make predictions about the data. In
simple words, "Regression shows a line or curve that passes through all the
datapoints on target-predictor graph in such a way that the vertical distance
between the datapoints and the regression line is minimum." The distance between
datapoints and line tells whether a model has captured a strong relationship or not.

Some examples of regression can be as:

o Prediction of rain using temperature and other factors

o Determining Market trends
o Prediction of road accidents due to rash driving.

Terminologies Related to the Regression Analysis:

o Dependent Variable: The main factor in Regression analysis which we want to
predict or understand is called the dependent variable. It is also called target
variable.
o Independent Variable: The factors which affect the dependent variables or which
are used to predict the values of the dependent variables are called independent
variable, also called as a predictor.
o Outliers: Outlier is an observation which contains either very low value or very high
value in comparison to other observed values. An outlier may hamper the result,
so it should be avoided.
o Multicollinearity: If the independent variables are highly correlated with each
other than other variables, then such condition is called Multicollinearity. It should
not be present in the dataset, because it creates problem while ranking the most
affecting variable.
o Underfitting and Overfitting: If our algorithm works well with the training dataset
but not well with test dataset, then such problem is called Overfitting. And if our
algorithm does not perform well even with training dataset, then such problem is
called underfitting.

Why do we use Regression Analysis?

As mentioned above, Regression analysis helps in the prediction of a continuous variable.
There are various scenarios in the real world where we need some future predictions such
as weather condition, sales prediction, marketing trends, etc., for such case we need some
technology which can make predictions more accurately. So for such case we need
Regression analysis which is a statistical method and used in machine learning and data
science. Below are some other reasons for using Regression analysis:

o Regression estimates the relationship between the target and the independent
variable.
o It is used to find the trends in data.
o It helps to predict real/continuous values.
o By performing the regression, we can confidently determine the most important
factor, the least important factor, and how each factor is affecting the other
factors.

Types of Regression
There are various types of regressions which are used in data science and machine
learning. Each type has its own importance on different scenarios, but at the core, all the
regression methods analyze the effect of the independent variable on dependent
variables. Here we are discussing some important types of regression which are given
below:

o Linear Regression
o Logistic Regression
o Polynomial Regression
o Support Vector Regression
o Decision Tree Regression
o Random Forest Regression
o Ridge Regression
o Lasso Regression:

Linear Regression:
o Linear regression is a statistical regression method which is used for predictive
analysis.
o It is one of the very simple and easy algorithms which works on regression and
shows the relationship between the continuous variables.
o It is used for solving the regression problem in machine learning.
o Linear regression shows the linear relationship between the independent variable
(X-axis) and the dependent variable (Y-axis), hence called linear regression.
o If there is only one input variable (x), then such linear regression is called simple
linear regression. And if there is more than one input variable, then such linear
regression is called multiple linear regression.
o The relationship between variables in the linear regression model can be explained
using the below image. Here we are predicting the salary of an employee on the
basis of the year of experience.

o Below is the mathematical equation for Linear regression:

1. Y= aX+b
Here, Y = dependent variables (target variables), X= Independent variables
(predictor variables), a and b are the linear coefficients

Some popular applications of linear regression are:

o Analyzing trends and sales estimates

o Salary forecasting
o Real estate prediction
o Arriving at ETAs in traffic.

Logistic Regression:

o Logistic regression is another supervised learning algorithm which is used to solve

the classification problems. In classification problems, we have dependent
variables in a binary or discrete format such as 0 or 1.
o Logistic regression algorithm works with the categorical variable such as 0 or 1, Yes
or No, True or False, Spam or not spam, etc.
o It is a predictive analysis algorithm which works on the concept of probability.
o Logistic regression is a type of regression, but it is different from the linear
regression algorithm in the term how they are used.
o Logistic regression uses sigmoid function or logistic function which is a complex
cost function. This sigmoid function is used to model the data in logistic regression.
The function can be represented as:

o f(x)= Output between the 0 and 1 value.

o x= input to the function
o e= base of natural logarithm.

When we provide the input values (data) to the function, it gives the S-curve as follows:
o It uses the concept of threshold levels, values above the threshold level are
rounded up to 1, and values below the threshold level are rounded up to 0.

There are three types of logistic regression:

o Binary(0/1, pass/fail)
o Multi(cats, dogs, lions)
o Ordinal(low, medium, high)

Polynomial Regression:

o Polynomial Regression is a type of regression which models the non-linear

dataset using a linear model.
o It is similar to multiple linear regression, but it fits a non-linear curve between the
value of x and corresponding conditional values of y.
o Suppose there is a dataset which consists of datapoints which are present in a non-
linear fashion, so for such case, linear regression will not best fit to those
datapoints. To cover such datapoints, we need Polynomial regression.
o In Polynomial regression, the original features are transformed into
polynomial features of given degree and then modeled using a linear
model. Which means the datapoints are best fitted using a polynomial line.
o The equation for polynomial regression also derived from linear regression
equation that means Linear regression equation Y= b0+ b1x, is transformed into
Polynomial regression equation Y= b0+b1x+ b2x2+ b3x3+.....+ bnxn.
o Here Y is the predicted/target output, b0, b1,... bn are the regression
coefficients. x is our independent/input variable.
o The model is still linear as the coefficients are still linear with quadratic

Note: This is different from Multiple Linear regression in such a way that in Polynomial
regression, a single element has different degrees instead of multiple variables with the same
degree.

Support Vector Regression:

Support Vector Machine is a supervised learning algorithm which can be used for
regression as well as classification problems. So if we use it for regression problems, then
it is termed as Support Vector Regression.

Support Vector Regression is a regression algorithm which works for continuous variables.
Below are some keywords which are used in Support Vector Regression:

o Kernel: It is a function used to map a lower-dimensional data into higher dimensional

data.
o Hyperplane: In general SVM, it is a separation line between two classes, but in SVR,
it is a line which helps to predict the continuous variables and cover most of the
datapoints.
o Boundary line: Boundary lines are the two lines apart from hyperplane, which creates
a margin for datapoints.
o Support vectors: Support vectors are the datapoints which are nearest to the
hyperplane and opposite class.

In SVR, we always try to determine a hyperplane with a maximum margin, so that

maximum number of datapoints are covered in that margin. The main goal of SVR is to
consider the maximum datapoints within the boundary lines and the hyperplane
(best-fit line) must contain a maximum number of datapoints. Consider the below
image:

Here, the blue line is called hyperplane, and the other two lines are known as boundary
lines.

Decision Tree Regression:

o Decision Tree is a supervised learning algorithm which can be used for solving both
classification and regression problems.
o It can solve problems for both categorical and numerical data
o Decision Tree regression builds a tree-like structure in which each internal node
represents the "test" for an attribute, each branch represent the result of the test,
and each leaf node represents the final decision or result.
o A decision tree is constructed starting from the root node/parent node (dataset),
which splits into left and right child nodes (subsets of dataset). These child nodes
are further divided into their children node, and themselves become the parent
node of those nodes. Consider the below image:

Above image showing the example of Decision Tee regression, here, the model is trying
to predict the choice of a person between Sports cars or Luxury car.

o Random forest is one of the most powerful supervised learning algorithms which
is capable of performing regression as well as classification tasks.
o The Random Forest regression is an ensemble learning method which combines
multiple decision trees and predicts the final output based on the average of each
tree output. The combined decision trees are called as base models, and it can be
represented more formally as:
g(x)= f0(x)+ f1(x)+ f2(x)+....

o Random forest uses Bagging or Bootstrap Aggregation technique of ensemble

learning in which aggregated decision tree runs in parallel and do not interact with
each other.
o With the help of Random Forest regression, we can prevent Overfitting in the
model by creating random subsets of the dataset.

Ridge Regression:

o Ridge regression is one of the most robust versions of linear regression in which a
small amount of bias is introduced so that we can get better long term predictions.
o The amount of bias added to the model is known as Ridge Regression penalty.
We can compute this penalty term by multiplying with the lambda to the squared
weight of each individual features.
o The equation for ridge regression will be:
o A general linear or polynomial regression will fail if there is high collinearity
between the independent variables, so to solve such problems, Ridge regression
can be used.
o Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
o It helps to solve the problems if we have more parameters than samples.

Lasso Regression:

o Lasso regression is another regularization technique to reduce the complexity of

the model.
o It is similar to the Ridge Regression except that penalty term contains only the
absolute weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas Ridge
Regression can only shrink it near to 0.
o It is also called as L1 regularization. The equation for Lasso regression will be:

Linear Regression in Machine Learning

Linear regression is one of the easiest and most popular Machine Learning algorithms. It
is a statistical method that is used for predictive analysis. Linear regression makes
predictions for continuous/real or numeric variables such as sales, salary, age, product
price, etc.

Linear regression algorithm shows a linear relationship between a dependent (y) and one
or more independent (y) variables, hence called as linear regression. Since linear
regression shows the linear relationship, which means it finds how the value of the
dependent variable is changing according to the value of the independent variable.
The linear regression model provides a sloped straight line representing the relationship
between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,

Y= Dependent Variable (Target Variable)

X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear Regression model
representation.

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:
o Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Simple
Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Multiple
Linear Regression.

Linear Regression Line

A linear line showing the relationship between the dependent and independent variables
is called a regression line. A regression line can show two types of relationship:

o Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent variable
increases on X-axis, then such a relationship is termed as a Positive linear
relationship.

o Negative Linear Relationship:

If the dependent variable decreases on the Y-axis and independent variable
increases on the X-axis, then such a relationship is called a negative linear
relationship.
Finding the best fit line:
When working with linear regression, our main goal is to find the best fit line that means
the error between predicted values and actual values should be minimized. The best fit
line will have the least error.

The different values for weights or the coefficient of lines (a0, a1) gives a different line of
regression, so we need to calculate the best values for a0 and a1 to find the best fit line,
so to calculate this we use cost function.

Cost function-

o The different values for weights or coefficient of lines (a0, a1) gives the different line
of regression, and the cost function is used to estimate the values of the coefficient
for the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures how a
linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping function, which
maps the input variable to the output variable. This mapping function is also known
as Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the
average of squared error occurred between the predicted values and actual values. It can
be written as:
For the above linear equation, MSE can be calculated as:

Where,

N=Total number of observation

Yi = Actual value
(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is called residual.
If the observed points are far from the regression line, then the residual will be high, and
so cost function will high. If the scatter points are close to the regression line, then the
residual will be small and hence the cost function.

Gradient Descent:

o Gradient descent is used to minimize the MSE by calculating the gradient of the
cost function.
o A regression model uses gradient descent to update the coefficients of the line by
reducing the cost function.
o It is done by a random selection of values of coefficient and then iteratively update
the values to reach the minimum cost function.

Model Performance:
The Goodness of fit determines how the line of regression fits the set of observations. The
process of finding the best model out of various models is called optimization. It can be
achieved by below method:

1. R-squared method:

o R-squared is a statistical method that determines the goodness of fit.

o It measures the strength of the relationship between the dependent and
independent variables on a scale of 0-100%.
o The high value of R-square determines the less difference between the predicted
values and actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
o It can be calculated from the below formula:

Assumptions of Linear Regression

Below are some important assumptions of Linear Regression. These are some formal
checks while building a Linear Regression model, which ensures to get the best possible
result from the given dataset.

o Linear relationship between the features and target:

Linear regression assumes the linear relationship between the dependent and
independent variables.
o Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables. Due
to multicollinearity, it may difficult to find the true relationship between the
predictors and target variables. Or we can say, it is difficult to determine which
predictor variable is affecting the target variable and which is not. So, the model
assumes either little or no multicollinearity between the features or independent
variables.
o Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the values
of independent variables. With homoscedasticity, there should be no clear
pattern distribution of data in the scatter plot.
o Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal
distribution pattern. If error terms are not normally distributed, then confidence
intervals will become either too wide or too narrow, which may cause difficulties
in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without any
deviation, which means the error is normally distributed.
o No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there
will be any correlation in the error term, then it will drastically reduce the
accuracy of the model. Autocorrelation usually occurs if there is a dependency
between residual errors.

Simple Linear Regression in Machine Learning

Simple Linear Regression is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown
by a Simple Linear Regression model is linear or a sloped straight line, hence it is called
Simple Linear Regression.

The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on
continuous or categorical values.

Simple Linear regression algorithm has mainly two objectives:

o Model the relationship between the two variables. Such as the relationship between
Income and expenditure, experience and Salary, etc.
o Forecasting new observations. Such as Weather forecasting according to temperature,
Revenue of a company according to the investments in a year, etc.

Simple Linear Regression Model:

The Simple Linear Regression model can be represented using the below equation:

y= a0+a1x+ ε

Where,

a0= It is the intercept of the Regression line (can be obtained putting x=0)
a1= It is the slope of the regression line, which tells whether the line is increasing or
decreasing.
ε = The error term. (For a good model it will be negligible)
Implementation of Simple Linear Regression Algorithm
using Python
Problem Statement example for Simple Linear Regression:

Here we are taking a dataset that has two variables: salary (dependent variable) and
experience (Independent variable). The goals of this problem is:

o We want to find out if there is any correlation between these two variables
o We will find the best fit line for the dataset.
o How the dependent variable is changing by changing the independent variable.

In this section, we will create a Simple Linear Regression model to find out the best fitting
line for representing the relationship between these two variables.

To implement the Simple Linear regression model in machine learning using Python, we
need to follow the below steps:

Step-1: Data Pre-processing

The first step for creating the Simple Linear Regression model is data pre-processing. We
have already done it earlier in this tutorial. But there will be some changes, which are
given in the below steps:

o First, we will import the three important libraries, which will help us for loading the dataset,
plotting the graphs, and creating the Simple Linear Regression model.

1. import numpy as nm
2. import matplotlib.pyplot as mtp
3. import pandas as pd
o Next, we will load the dataset into our code:

1. data_set= pd.read_csv('Salary_Data.csv')

By executing the above line of code (ctrl+ENTER), we can read the dataset on our Spyder
IDE screen by clicking on the variable explorer option.
The above output shows the dataset, which has two variables: Salary and Experience.

Note: In Spyder IDE, the folder containing the code file must be saved as a working directory,
and the dataset or csv file should be in the same folder.

o After that, we need to extract the dependent and independent variables from the given
dataset. The independent variable is years of experience, and the dependent variable is
salary. Below is code for it:

1. x= data_set.iloc[:, :-1].values
2. y= data_set.iloc[:, 1].values
In the above lines of code, for x variable, we have taken -1 value since we want to remove
the last column from the dataset. For y variable, we have taken 1 value as a parameter,
since we want to extract the second column and indexing starts from the zero.

By executing the above line of code, we will get the output for X and Y variable as:

In the above output image, we can see the X (independent) variable and Y (dependent)
variable has been extracted from the given dataset.

o Next, we will split both variables into the test set and training set. We have 30 observations,
so we will take 20 observations for the training set and 10 observations for the test set.
We are splitting our dataset so that we can train our model using a training dataset and
then test the model using a test dataset. The code for this is given below:

1. # Splitting the dataset into training and test set.

2. from sklearn.model_selection import train_test_split
3. x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 1/3, random_state=0)
By executing the above code, we will get x-test, x-train and y-test, y-train dataset.
Consider the below images:

Test-dataset:

Training Dataset:
o For simple linear Regression, we will not use Feature Scaling. Because Python libraries take
care of it for some cases, so we don't need to perform it here. Now, our dataset is well
prepared to work on it and we are going to start building a Simple Linear Regression
model for the given problem.

Step-2: Fitting the Simple Linear Regression to the Training Set:

Now the second step is to fit our model to the training dataset. To do so, we will import
the LinearRegression class of the linear_model library from the scikit learn. After
importing the class, we are going to create an object of the class named as a regressor.
The code for this is given below:

1. #Fitting the Simple Linear Regression model to the training dataset

2. from sklearn.linear_model import LinearRegression
3. regressor= LinearRegression()
4. regressor.fit(x_train, y_train)

In the above code, we have used a fit() method to fit our Simple Linear Regression object
to the training set. In the fit() function, we have passed the x_train and y_train, which is
our training dataset for the dependent and an independent variable. We have fitted our
regressor object to the training set so that the model can easily learn the correlations
between the predictor and target variables. After executing the above lines of code, we
will get the below output.

Output:

Out[7]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,

normalize=False)

Step: 3. Prediction of test set result:

dependent (salary) and an independent variable (Experience). So, now, our model is ready
to predict the output for the new observations. In this step, we will provide the test dataset
(new observations) to the model to check whether it can predict the correct output or not.

We will create a prediction vector y_pred, and x_pred, which will contain predictions of
test dataset, and prediction of training set respectively.

1. #Prediction of Test and Training set result

2. y_pred= regressor.predict(x_test)
3. x_pred= regressor.predict(x_train)

On executing the above lines of code, two variables named y_pred and x_pred will
generate in the variable explorer options that contain salary predictions for the training
set and test set.

Output:

You can check the variable by clicking on the variable explorer option in the IDE, and also
compare the result by comparing values from y_pred and y_test. By comparing these
values, we can check how good our model is performing.

Step: 4. visualizing the Training set results:

Now in this step, we will visualize the training set result. To do so, we will use the scatter()
function of the pyplot library, which we have already imported in the pre-processing step.
The scatter () function will create a scatter plot of observations.

In the x-axis, we will plot the Years of Experience of employees and on the y-axis, salary
of employees. In the function, we will pass the real values of training set, which means a
year of experience x_train, training set of Salaries y_train, and color of the observations.
Here we are taking a green color for the observation, but it can be any color as per the
choice.

Now, we need to plot the regression line, so for this, we will use the plot() function of
the pyplot library. In this function, we will pass the years of experience for training set,
predicted salary for training set x_pred, and color of the line.

Next, we will give the title for the plot. So here, we will use the title() function of
the pyplot library and pass the name ("Salary vs Experience (Training Dataset)".

After that, we will assign labels for x-axis and y-axis using xlabel() and ylabel() function.

Finally, we will represent all above things in a graph using show(). The code is given below:

1. mtp.scatter(x_train, y_train, color="green")

2. mtp.plot(x_train, x_pred, color="red")
3. mtp.title("Salary vs Experience (Training Dataset)")
4. mtp.xlabel("Years of Experience")
5. mtp.ylabel("Salary(In Rupees)")
6. mtp.show()

Output:

By executing the above lines of code, we will get the below graph plot as an output.
In the above plot, we can see the real values observations in green dots and predicted
values are covered by the red regression line. The regression line shows a correlation
between the dependent and independent variable.

The good fit of the line can be observed by calculating the difference between actual
values and predicted values. But as we can see in the above plot, most of the
observations are close to the regression line, hence our model is good for the
training set.

Step: 5. visualizing the Test set results:

In the previous step, we have visualized the performance of our model on the training set.
Now, we will do the same for the Test set. The complete code will remain the same as the
above code, except in this, we will use x_test, and y_test instead of x_train and y_train.

Here we are also changing the color of observations and regression line to differentiate
between the two plots, but it is optional.

1. #visualizing the Test set results

2. mtp.scatter(x_test, y_test, color="blue")
3. mtp.plot(x_train, x_pred, color="red")
4. mtp.title("Salary vs Experience (Test Dataset)")
5. mtp.xlabel("Years of Experience")
6. mtp.ylabel("Salary(In Rupees)")
7. mtp.show()

Output:

By executing the above line of code, we will get the output as:

In the above plot, there are observations given by the blue color, and prediction is given
by the red regression line. As we can see, most of the observations are close to the
regression line, hence we can say our Simple Linear Regression is a good model and able
to make good predictions.

Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Machine Learning With Python
100% (1)
Machine Learning With Python
14 pages
Poly
100% (1)
Poly
108 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Bank Customer Churn Analysis - Jupyter Notebook
No ratings yet
Bank Customer Churn Analysis - Jupyter Notebook
11 pages
Regression Notes
100% (1)
Regression Notes
20 pages
ML Week 3 Logistic Regression
60% (10)
ML Week 3 Logistic Regression
6 pages
Machine Learning Guide Line
No ratings yet
Machine Learning Guide Line
10 pages
Churn Modeling
100% (1)
Churn Modeling
11 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Machine Learning Project Report
100% (1)
Machine Learning Project Report
4 pages
Churn For Bank Customers
No ratings yet
Churn For Bank Customers
28 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
Chapter 5.3-Mulitple Linear Regression
No ratings yet
Chapter 5.3-Mulitple Linear Regression
26 pages
Data Mining Project Shivani Pandey
100% (1)
Data Mining Project Shivani Pandey
40 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
EDA Assignment
No ratings yet
EDA Assignment
15 pages
K Means
100% (2)
K Means
329 pages
Ensemble Techniques Project
100% (2)
Ensemble Techniques Project
28 pages
Statistics in Details
100% (2)
Statistics in Details
283 pages
Clustering (Unit 3)
100% (2)
Clustering (Unit 3)
71 pages
05 Logistic - Regression
No ratings yet
05 Logistic - Regression
7 pages
ML Notes
100% (2)
ML Notes
125 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
Dealing With Missing Data in Python Pandas
100% (1)
Dealing With Missing Data in Python Pandas
14 pages
Machine Learning Solution
No ratings yet
Machine Learning Solution
6 pages
ML Summary PDF
No ratings yet
ML Summary PDF
5 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
Random Forest
No ratings yet
Random Forest
32 pages
Machine Learning - Home - Coursera Quiz PDF
100% (1)
Machine Learning - Home - Coursera Quiz PDF
5 pages
Import As
100% (1)
Import As
27 pages
Introduction To Machine Learning
100% (1)
Introduction To Machine Learning
46 pages
Machine Learning Mini-Project Report
No ratings yet
Machine Learning Mini-Project Report
26 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Pandas
100% (1)
Pandas
1,131 pages
ML Practical File
100% (2)
ML Practical File
43 pages
Combined ML
100% (1)
Combined ML
705 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Cheatsheet Machine Learning Tips and Tricks PDF
No ratings yet
Cheatsheet Machine Learning Tips and Tricks PDF
2 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
100% (1)
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
12 pages
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
100% (2)
Machine Learning: Lecture 13: Model Validation Techniques, Overfitting, Underfitting
26 pages
Student Booklet For Sep 2015 v6
100% (1)
Student Booklet For Sep 2015 v6
50 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
LPTHW
100% (1)
LPTHW
220 pages
Getting Started - TensorFlow
0% (1)
Getting Started - TensorFlow
14 pages
Pandas Practice Questions
No ratings yet
Pandas Practice Questions
2 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Week 1 Quiz
100% (1)
Week 1 Quiz
28 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Variable Selection
No ratings yet
Variable Selection
15 pages
Project 5 - Cars
100% (1)
Project 5 - Cars
22 pages
DataScience Interview Questions
100% (1)
DataScience Interview Questions
66 pages
Bagging and Random Forest Presentation1
100% (2)
Bagging and Random Forest Presentation1
23 pages
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
From Everand
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
Abhishek Vijayvargia
No ratings yet
Effective Amazon Machine Learning
From Everand
Effective Amazon Machine Learning
Alexis Perrier
No ratings yet
Software Analysis An Design Tools
No ratings yet
Software Analysis An Design Tools
13 pages
UML-Generalization: Features of Java - Javatpoint
No ratings yet
UML-Generalization: Features of Java - Javatpoint
5 pages
Research Methodology Final
100% (2)
Research Methodology Final
70 pages
Predicate Calculus AI
No ratings yet
Predicate Calculus AI
43 pages
Machine Learning Tutorial
100% (1)
Machine Learning Tutorial
44 pages
Probability Theory AI
No ratings yet
Probability Theory AI
16 pages
BIA B350F - 2022 Autumn - Specimen Exam Paper
No ratings yet
BIA B350F - 2022 Autumn - Specimen Exam Paper
13 pages
CESC Q4 Lesson 2 PDF
No ratings yet
CESC Q4 Lesson 2 PDF
35 pages
Termidor SC Sds 1546960116
No ratings yet
Termidor SC Sds 1546960116
13 pages
Eyebolts: Types of Eyebolt
No ratings yet
Eyebolts: Types of Eyebolt
2 pages
Southwest Airlines Presentation Final
No ratings yet
Southwest Airlines Presentation Final
42 pages
Operating System Marking Guide
No ratings yet
Operating System Marking Guide
9 pages
Home Software Hardware Support Downloads Purchase Training Account Support Logout (Back To Main Menu)
No ratings yet
Home Software Hardware Support Downloads Purchase Training Account Support Logout (Back To Main Menu)
6 pages
Session Plan: A. Select Materials To Be Hauled B. Haul Materials C. Mix Mortar /concrete
No ratings yet
Session Plan: A. Select Materials To Be Hauled B. Haul Materials C. Mix Mortar /concrete
4 pages
Comparative Regulatory Guidelines of Biosensors in India
No ratings yet
Comparative Regulatory Guidelines of Biosensors in India
89 pages
5 Example of Food Beverage Services
No ratings yet
5 Example of Food Beverage Services
4 pages
Poly MRP Price List 2024 FNL Cut As On 24jan
No ratings yet
Poly MRP Price List 2024 FNL Cut As On 24jan
12 pages
Inter-Autonomous System Mpls VPN: Configuration and Troubleshooting
No ratings yet
Inter-Autonomous System Mpls VPN: Configuration and Troubleshooting
41 pages
Action Plan in Research
100% (9)
Action Plan in Research
2 pages
Installation Ima Builder On Cpanel
No ratings yet
Installation Ima Builder On Cpanel
4 pages
Unit-2 Lecture Notes
No ratings yet
Unit-2 Lecture Notes
33 pages
TSD04 U-Values (Jan 2023)
No ratings yet
TSD04 U-Values (Jan 2023)
4 pages
Question Answer BOSH COSH PCAB
No ratings yet
Question Answer BOSH COSH PCAB
5 pages
Rev111-Chapter 2-4-Factors Affecting Property Value
No ratings yet
Rev111-Chapter 2-4-Factors Affecting Property Value
8 pages
SA 387 Grade 91 Class 2
No ratings yet
SA 387 Grade 91 Class 2
4 pages
Installation Manual Dishwasher Ge Adora DDT595
No ratings yet
Installation Manual Dishwasher Ge Adora DDT595
48 pages
Wmo 1127 en
No ratings yet
Wmo 1127 en
40 pages
FUND RELEASE ORDER FORM
No ratings yet
FUND RELEASE ORDER FORM
2 pages
Ws 5 Busmath
No ratings yet
Ws 5 Busmath
3 pages
VP Operations Transportation Services in Chicago IL Resume Terrence Armstrong
No ratings yet
VP Operations Transportation Services in Chicago IL Resume Terrence Armstrong
2 pages
Alumni_Who_Have_Not_Filled_Convocation_Form
No ratings yet
Alumni_Who_Have_Not_Filled_Convocation_Form
16 pages
1.5 - Trims and Accessories
100% (3)
1.5 - Trims and Accessories
7 pages
Trigonometry for JEE (Advanced), 3rd edition SOLUTIONS G. Tewani - eBook PDF pdf download
100% (11)
Trigonometry for JEE (Advanced), 3rd edition SOLUTIONS G. Tewani - eBook PDF pdf download
55 pages
Baguio V Masweng
100% (1)
Baguio V Masweng
2 pages
Lab19 - Understanding Queue Storage - Azure
No ratings yet
Lab19 - Understanding Queue Storage - Azure
26 pages
Idaho Virtual Academy Testimony Regarding Stride/K-12
No ratings yet
Idaho Virtual Academy Testimony Regarding Stride/K-12
25 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Regression Analysis in Machine Learning

Uploaded by

Regression Analysis in Machine Learning

Uploaded by

Regression Analysis in Machine learning

Regression analysis is a statistical method to model the relationship between a dependent

Example: Suppose there is a marketing company A, who does various advertisement

Regression is a supervised learning technique which helps in finding the correlation

Some examples of regression can be as:

o Prediction of rain using temperature and other factors

Terminologies Related to the Regression Analysis:

Why do we use Regression Analysis?

o Below is the mathematical equation for Linear regression:

Some popular applications of linear regression are:

o Analyzing trends and sales estimates

o Logistic regression is another supervised learning algorithm which is used to solve

o f(x)= Output between the 0 and 1 value.

There are three types of logistic regression:

o Polynomial Regression is a type of regression which models the non-linear

Support Vector Regression:

o Kernel: It is a function used to map a lower-dimensional data into higher dimensional

In SVR, we always try to determine a hyperplane with a maximum margin, so that

Decision Tree Regression:

o Random forest uses Bagging or Bootstrap Aggregation technique of ensemble

o Lasso regression is another regularization technique to reduce the complexity of

Linear Regression in Machine Learning

Mathematically, we can represent a linear regression as:

Y= Dependent Variable (Target Variable)

Types of Linear Regression

Linear Regression Line

o Positive Linear Relationship:

o Negative Linear Relationship:

N=Total number of observation

o R-squared is a statistical method that determines the goodness of fit.

Assumptions of Linear Regression

o Linear relationship between the features and target:

Simple Linear Regression in Machine Learning

Simple Linear regression algorithm has mainly two objectives:

Simple Linear Regression Model:

Step-1: Data Pre-processing

1. # Splitting the dataset into training and test set.

Step-2: Fitting the Simple Linear Regression to the Training Set:

1. #Fitting the Simple Linear Regression model to the training dataset

Out[7]: LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,

Step: 3. Prediction of test set result:

1. #Prediction of Test and Training set result

Step: 4. visualizing the Training set results:

1. mtp.scatter(x_train, y_train, color="green")

Step: 5. visualizing the Test set results:

1. #visualizing the Test set results

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.