0% found this document useful (0 votes)

89 views

AL Notes

This document provides an overview of various Python programming concepts covered across multiple tutorials and classes. Key topics discussed include data types, operators, strings, lists, dictionaries, NumPy arrays, Pandas DataFrames, Matplotlib and Seaborn for visualization, functions, Object Oriented Programming concepts like classes and inheritance. More advanced topics covered include exception handling, custom exceptions, parameterization, and Tableau dashboarding concepts like actions, blending, pivoting, splitting columns.

Uploaded by

Shikha Jayaswal

Available Formats

Download as XLSX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views

AL Notes

Uploaded by

Shikha Jayaswal

Available Formats

Download as XLSX, PDF, TXT or read online on Scribd

You are on page 1/ 61

https://cashify.udemy.

com/course/python-for-data-science-and-machine-learning-bootcamp/learn/lecture/5733434

Class To revise 37, 38 39

amp/learn/lecture/5733434#overview
Udemy https://cashify.udemy.com/course/complete-python-scripting-for-automation/learn/lecture/

Class 10 \n next line

\b \b\b\ >> back space
\t tab space
python\'s script'

Class 12 del x to delete a variable

Class 17 my_string.lower() for Lower

my_string.swapcase() letters lower will become upper & upper will become lower
my_string.title() Every word's 1st letter will be in caps
my_string.capitalize() 1st
my letter
_stringor=sentance
"Python"will be in caps
print(join("*",my_string)) my_string.join("*") >> Output P*y*t*h*o*n

Class 19 print(my_str.zfill(10)) total 10 values will be there, and blank will be replaced by 0

Class 36
or-automation/learn/lecture/15328078#overview

will become lower

ill be replaced by 0
LinkedIn https://www.linkedin.com/learning/python-for-data-science-essential-training-part-1/what-you-sho
Krish Nayak https://www.youtube.com/watch?v=bPrmA1SEN2k&list=PLZoTAELRMXVNUL99R4bDlVYsncUNvwU

Tutorial 1 Operators
type(1)
Data Types
Format Paste
len('Pranay')

Tutorial 2 Data Types

Shift + Tab
Logical Operators
List
List Functions

Tutorial 3 Sets
Dictionaries

Tuple

Tutorial 4 Array
Numpy
arr.shape
Creating Array
(Rows, Columns)
Indexing
np.linspace(1,10,50)
copy function
np.arrange(1,10)
np.ones(4) || np.ones(2,5)
np.random.rand(3,3)
np.random.randn(4,4)
np.random.randint(0,100,8).res
hape(4,2)
Tutorial 5 & 6 Pandas
pd.DataFrame()
pd.to_csv('path/link//
file_name.csv')
.loc | .iloc
df.isnull().sum()
pd DF functions
String to csv

Tutorial 7 pd.read_json()
html (page link) data read
Read Excel
Pickling

Tutorial 8 Matplotlib
For multiple graphs
More graphs code
Pie charts
Tutorial 9 Univeriate Vs Bivariate Vs
Multivariate Analysis
Distribution of Plot in Seaborn
Seaborn
Tutorial 10 Seaborn
Tutorial 11 EDA (Codes)

Tutorial 12 Functions
Print Vs Return
Add Function
Default value in function
Even Odd Sum function

Tutorial 13 Lambda Function

Tutorial 15 Map Function

Tutorial 16 Filter Function

Filter with Lambda
Tutorial 17 List Comprehension
Tutorial 18 String Formatting
Tutorial 19 List Iterables vs Iterables
Tutorial 20 OOPS
Class

Class 24 Advanced Python

Exception Handling

Class 25 Custom Exception Handling

Class 26 Inheritance In Python

din.com/learning/python-for-data-science-essential-training-part-1/what-you-should-know?autoSkip=true&autoplay=true&resum
ube.com/watch?v=bPrmA1SEN2k&list=PLZoTAELRMXVNUL99R4bDlVYsncUNvwUBB

1+1 = 2, 2*5 = 10, 5**2 = 25, 10/2 = 5, 10%2 = 0

Check Datatype
Integer, Float, String, Bool
print("My Name is {First} & last Name is {Last}".format(first = first_name, last = last_name))
Gives Length >> 6
x.isalnum() >> alpha numeric >> Num + Alphabet
x.istitle(): 1st letter caps
To show detail
and
or
Can be mutable, changeable, ordered sequence
Can- Add
contain any value,
multiple valuescomma
in list seperated
lst*5
- elements will be repeted 5 times (Not Multiplied with 5)
set_2.intersection_update(set_1)
for-i common records in both sets & output will be saved
in dict_1.items():
print(i) #will print only both
dict_1["Car1"]
Cannot be changed = "Renault" # will replace "Car1" value
()

Contain same Datatypes

import numpy as np
arr.reshape(3,5)
- If there are total 15 elements, then it can be reshaped only with multiple of 15. Eg: (3,5) , (5,3), (1,15)
- Array is too fast

arr[0:2, 0:2]
Give value between 1 to 10 in 50 parts. Eg: [1, 1.18, 1.36 .... 9.81, 10]

will give value in array from 1 to 9

create array with value 1
array with random values, 9 elements
array with
Create random
array integervalues
from 0& to
normal distribution
100 with 8 elements
4 rows & 2 columns

import pandas as pd
To create DF
To export data in csv
pd.read_csv('path', sep = ',') # It can be changed from "," to any other if required
df.iloc[:,:] # will show all columns & rows

df["Col_1"].value_counts()df["Col_1"].unique()df[[Col1, Col2]]df.head()df.info() # shape, memory us

from io import StringIO, BytesIO

pd.read_html('url', match = "Any Word to match, if there are multiplte tables on page", header=0)
pd.read_excel("path//fileNm.xlsx", sheet = 1)
excel file converted/saved into pickels

plt.plot()
plt.subplot(2,2,2) ##multiple
graph is options:
shown on color,
2nd line format, line width, etc
index
plt.plot(x,y)
plt.hist(y)
plt.boxplot(Num_Col)
plt.axis('equal')
plt.show()
2 Feature (F1, F2) >> Bivariate Analysis
>2 Feature
joinplot (F1, F2, F3, .... Fn) >> Multivariate Analysis
pairplot
sns.pairplot(df, hue = 'Gnder') # Will show M/F seperate graph in every chart
sns.distplot(df['Y_var']) # Creates histogram with distribution line
sns.violinplot("Gender", "Age", DF) # Gives Violin shape graph
https://github.com/krishnaik06/EDA1/blob/master/EDA.ipynb

starts with "def" keyword

But if inplace of print, "return" was there, then it show the value while printing x
val # Output will be 10
hello(*lst, **dict_args)
odd_sum += i # output >> ('Pranay', 'Singh') {'age': 26, 'dob': 1997}
return even_sum, odd_sum
- Works faster from other function
-addition
Only Single Operator
= lambda a,b: at
a+ba time
addition(12,24) # Output >> 36

- Without using loop, it runs all the records of list

list(map(odd_even,lst))
list(filter(even,lst)) # Show only Even records
list(filter(lambda num: num%2 == 0, lst))
[i*i for i in lst if i%2==0]
string("Pranay", 25)
next(itr)

Class can contain different functions.

Eg:def
Cardrive(self):
>> Window, Door, Mirror, etc.
return("This Car is {} car".format(self.enginetype))

finally: # It will run always, if code is correct or any exception is there

print("Code ran successfully!")

Py File
w?autoSkip=true&autoplay=true&resume=false&u=138505057

Eg: (3,5) , (5,3), (1,15)

# shape, memory usagedf.describe() # count, mean, std, min, max, percentil # Only int & float col

e", header=0)
t

h
centil # Only int & float columns takendf.Col1[df.Col1 > 100]
Percentile
get_dummies() If we remove 1st col, then how we will identify if it was significant or not
Pass Vs Break command
Udemy Tableau Classes https://cashify.udemy.com/course/tableau10/learn/lecture/5618178#overview
Hover on 1st map & it show filter on another she
Class 31 Action Filter "Dashboard" >> "Actions"
Highliter

Class 37 Data Blending

Joining Data Vs Data Blending

Class 49 Creating Bins

Class 50 Parameters with Top 10,20,30... Values

Class 58 Data Enterpreter In Data Source Sheet, Check Box after importing

Class 59 Pivot In Data Source Sheet, Mark columns in Data and

Class 60 Splitting Column into Multiple columns In Data Source Sheet, Right Click on column & SP

Class 67 Analytics Tab in SHeets Cluster

Advanced Tableau Classes https://cashify.udemy.com/course/tableau10-advanced/learn/lecture/5843098#ov

Class 10 Grouping the Data

Class 11 Sets

Class 13 Combining Sets

Class 14 Controlling sets with Parameters

Class 48 Increase size of small bubbles in chart

Class 51 Tableau Animation

- Exclude
Class 57 LOD Calculation (Level of Detail) - Fixed
LOD Syntax { INCLUDE [Customer Name] : SUM([Sales]) }
5618178#overview
how filter on another sheets in DB
ns"

heck Box after importing Data, Removes Null Rows & Columns

Mark columns in Data and select pivot

ight Click on column & SPlit

/lecture/5843098#overview

ame] : SUM([Sales]) }
Python Packages
Apache Spark
RDD

Logging

apportion

Databricks

JSON

By Jatin
https://towardsdatascience.com/a-complete-data-science-roadmap-in-2021-77a15d6be1d9
https://medium.com/coriers/data-engineering-roadmap-for-2021-eac7898f0641

For Python Practice

https://pynative.com/python-if-else-and-for-loop-exercise-with-solutions/#h-exercise-1-print-first-10-natural-numbers

For SQL Practice

sqlzoo
leetcode
Used to connect with very huge datasets
For Python we use library: Pyspark
Resilient
Import Distributed Dataframe
pyspark
from pyspark.sql import SparkSession
create a session >> from pyspark.sql import SparkSession

Basically used to capture errors & warnings and show it in graphical way
Helps to create text files

Collabering extra values to previous values in same ratio

AWS, cloud, azure

JavaScript Object Notation

-data-science-roadmap-in-2021-77a15d6be1d9
ng-roadmap-for-2021-eac7898f0641

oop-exercise-with-solutions/#h-exercise-1-print-first-10-natural-numbers-using-while-loop
EG >> A: 50, B: 30, C: 20, D: 10 || Callaboration of D in others
A: 50+5, B: 30+3, C: 20+2
Constraints

Trigger

Index

Window function
Rank() vs Dence_Rank()
CTE (Common Table Expression)
Create Views
NOT NULL
UNIQUE
PRIMARY KEY
FOREIGN KEY
CHECK: This constraint helps to validate the values of a column to meet a particular condition. That is, it helps to ensure that the value store
DEFAULT: This constraint specifies a default value for the column when no value is specified by the user.

- UPDATE TRIGGER
- DELETE TRIGGER
- BEFORE UPDATE
- BEFORE DELETE

- Indexes are used to retrieve data from the database very fast

- with NTILE()
- with lag() & lead()

- Enable users to maintain complex queries via increased readability & simplification
- Can be accessed in SELECT, INSERT, DELETE, UPDATE, MERGE statement
- To create views, for making Tableau reports
tored in a column meets a specific condition. || Like Dropdown in Excel
https://cashify.udemy.com/course/sql-and-postgresql/learn/lecture/22800007#overview

Class 3 Add Data

Read Data
Update Data
Delete Data

Class 5 Creating Table

Class 7 Insert Data

Class 9 Math Operators

Alias name

12 String Operators

13 Where condition

14 Comparision Operators

22 UPDATE

23 DELETE

28 Relationships

30 Primary Keys
Foreign Keys

32 SERIAL

40 DROP

On DELETE Option what happens

urse/sql-and-postgresql/learn/lecture/22800007#overview

Insert into cities (name, country, population)

Values ('Lucknow', 'India', 15000)
select * cities
Update from cities
set population = 20000
Delete from cities
where name = 'Lucknow'
where name='Lucknow'
name varchar(50),
country varchar (50)
population integer,
INSERT INTO cities (name, country, population, area)
VALUES ('Lucknow', 'India', 5600, 2400),
('DC', 'US', 3400, 300);
+ Add, - Subtract, * Multiply, / Divide, ^ Exponent, |/ Square Root, @ Absolute Value, % Remainder

|| Join 2 strings, concat(), lower(), length(), Upper()

Eg: concate(name, ' ',country) = name||' '||country

> < = <> != >= <= IN NOTIN BETWEEN

UPDATE cities
SET population = 200000
where name = 'Tokyo'
DELETE FROM cities
where name = 'Tokyo'
One to many
Many to Many
- Unique
- Not Null
- Primary Keyhave
1 Table can of Other table Key
1 Primary id SERIAL PRIMARY KEY,
- 1 Table can have multiple Foregin Keys username VARCHAR(50)
id SERIAL PRIMARY KEY,
username VARCHAR(50) );

Delete complete table

DROP TABLE photos;

When a tbl 1 PRIMARY KEY is used as FOREIGN KEY in other table, & if we try to delete record from tbl 1
% Remainder

SERIAL PRIMARY KEY,

ername VARCHAR(50)
Classes Topics
Class 12 (1) Why Statistics
Structure
Population
Sample
Parameter
What is Statistics
How Samples are
created
Types of Stats
Descriptive Stats

Class 12 (2) Variability

Standard Deviation
Squared Deviation
Variance Formula
Data Shape
Type of Shape
Normal Distribution
Asymetrical
Distribution
Negative Skewness
Positive Skewness
Kurtosis

Class 13 (1) Sample Types

Traditional Way
Data Science Way
Central Limit
Theorem
3/6 Sigma Rule
Inferential Stats
Probability
z score
AUC

Hypothesis Testing
Significance Level

Z Test

Class 14 (1) ND & SND

Significance Level
Confidence Level
Hypothesis Testing
AUC
z value (Hypo Test)
Standard Error
Common z scores
used
Class 14 (2) Error Types
T Test
Z Test used
Degree of Freedom
t Test types
Dependent t Test
Independent t Test

Class 15 (1) ANOVA

Consider groups to be
different from each
other
F-value
Parametric vs Non
Parametric Test
Chi Square Test
Correlation
Test we have to
perfrom in Py
Py Implementation
Scatter Plot Py

Stats by Krish Nayak

Class 2 Stats

Class 4 Random Variable

Description
1. Describe features/vars/cols
2. Relay b/wStats,
Descriptive 2 cols/ features & vars
Sampling
Probabilities
Total no Distr
Subset of Population
When mathematical calculation is performed on population, outcome is Parameter
When mathematical calculation is performed on sample data, outcome is Statistics
Random Sampling
Descriptive Stats: To better understand the data
Mostly it is Univariate
Measure of Frequency : Count of distinct (Hist chart)
Std Deviation: SQRT of Variance || Variance : Avg Squared Dev
SQRT[SIGMA( Sample - Sample Mean )^2 / No. of Samples -1] || SQRT(sigma(X-Xbar)^2/n-1)
(X-Xbar)^2
Parameter / a^2 : sigma(X-u)^2 / N || u = Population mean , N = No. of Population
Each dataset have shape. Eg: Histogram
Symetrical : When divide from center, gives mirror opposite or 50% of whole data || Eg: Rectangular, Uniform, Gaussian
etc
Also known as Gaussian Distribution
-- Bell shaped
Skewed curve
Distribution
-- Mean != Median != Mode
Left Skewed
-- Mean < Median < Mode
Right Skewed
-Leptokurtic
Mode < Median < Mean High Peack || Eg: Income data collected from Govt quarter only
: Abnomrally

Traditional Way, Data Science Way

Eg: Simply asking from any 6 people from population will be sample
Eg: 6 people are collecting different samples of any 10 (Sub Sample), then average of these Sub samples will be converted
Data Science Way
For Normal Distributed Data
Rule 1 : +- STD_1
Probability, will cover
Hypothesis 68.2% of Data
Testing
- Percentile
-(xAKA
- u) P-value
/ STD || u=Mean, z score table || Only for Normal Distributed
Area under the curve

- Proving something Statistically

-Normally
Null Hypothesis
5% or 1%(u = xbar), Alternative Hypothesis (u != xbar)
- No. of sample should >30
- Significance level ?

- ND : Normal Distribution
-- SND : Standard
It is decided Normal
by the Distribution,
client same as Normal
or by Data Scientist, usually Distribution,
following arebut here u = 0 & STD = 1
cosiderd
1 - Signifcance Level || Eg: Significance Level = 0.05 or 5%, Confidence level = 0.95 or 95%
- One Tailed Test : When Significance at 1 side of distribution
-P-value
Two Tailed Test : When Significance level is at both sides of distribution
tells AUC
(xbar - u) / (STD / SQRT(n)) || u = Population Mean, xbar = Sample Mean, n = Total no.
STD / SQRT(n)
- 90% : 1.645
- 95% : 1.96
- Type 1 Error : Rejecting Null while it should be accepted
-Used Typewhen:
2 Error : Accepting Null while it should be rejected
-Used n < 30, u & STD is known
when:
-Formula
n > 30 : n - 1
- Dependent (Paired)
-- Independent
Subjects b/w (2 Sample)are same
2 samples
-- Equal
No. ofno. of samples
samples should be almost same
- STD should be same
- ANOVA : Analysis of Variance
-- Use F distribution
Variance within group is low then better
-Variance
Varianceb/wb/wthe
group
groupis high then better
/ Variance within the group
- Parametric Test : Traditional Method, Proper probability distribution involved, Concepts of mean are there.
- Used only for categorical Vars
-- Diff b/w
Helps to observed
understand & expected
nature relationship b/w 2 num samples
-- Check
T test +ve, -ve or no relationship
- Onescipy.stats
- import Sample T test >> 1 Sample vs 1 value
as stats
- import matplotlib
- from matplotlib import pyplot as plt

https://www.youtube.com/watch?v=Vtvj6fPZ1Ww&list=PLZoTAELRMXVMhVyr3Ri9IQ-t5QPBtxzJO&index=2
Descriptive Stats: Include Central tendency, summarizing in form of number & graph, focus on describing the visible chara
dataset
Discrete Random Variable
LAMBDA Function >> Code >> To call out come calculations, functions, etc
Amazon EventBridge
EC2 Bucket, which contains several folders, inside that CSV file is there
S3 Files Can be pushed in S3 by: Boto 3, Py Code, etc
CloudWatch Monitoring >> Uses Consumption >> How much load on data/server, etc (Like Google Analytics)
IAM Users >> Login Credentials, No of users, etc

Interview Q
Entropy
Deflection
ke Google Analytics)
Classes Topics
Class 16 X & Y Variable
Linear Regression
Logistic Regression
B/s Problem
Learning Setup
Discipline
Parametric & Non
Parametric
y = mx + c
Linear Regression
equation
Statistical way (b & c
Formula)
ML Process
SSE (Sum of Squared
Error)
OLS
OLS py
R-squared
SSR
R^2 (Accuracy)
SST
Adjusted R-Squared
Bita
Generalization
Model Validation
Underfitting &
Overfitting

Class 17 Linear Regression

pandas_profiling
Coeff of Variance
Pre-Modeling / EDA

Y_Var Assumptions
Data Preparation Level
2: Assumptions
Data Preparation Level
3: Feature Reduction
RFE
F - Regression
K - best
VIF
Corr b/w X & X
Data Preparation Level 4
Model Implementation
Post Modeling

Class 18 MAPE
RMSE / MSE
Correlation
Decile Analysis
Logistic Regression /
Classification
GLM
Logistic
Confusion Metrix
Threshold Value
Concordance
Discordance
Decile Analysis
KS Value

Class 19 Bankloans Data

Feature Reduction
Techniques
WOE
Sommer's D (Ginni)
Sommer's D Py
VIF
Model Implementation
Model Evaluation
Threshold Value
Model Evaluation

IQR
Outlier
Quartile

fuzzywuzzy
fuzz.ratio
fuzz.partial_ratio
fuzz.token_sort_ratio
process.extract

Label Encoding
BM25
NMSLIB
Tocken Vectorizer
Description

affecting sales
-- Optimisation Reinforsement >>Learning
Used to maximise
>> Includes or "Sticks
minimise || Eg:orA"rewards
& carrots" company&wants to maximise
punishment" || itsDeep
salesLearning
setup
- Bayesian (ANN,>> etc)Naïve Baies
-- Ensemble Parametric >>Random >> Linear &Forest,Logistic Bagging, Boosting
- Non-Parametric >> DT, KNN
For
- b :multiple X vars : y(Slope),
beta or cofficient = m1x1 c+ =m2x2 + m3x3…..
constant or alpha + c(Intercept)
-- Bita Beta:(b) Every unit increase in X Var, Y will change
: corr*(stdev_y/stdev_x)
-5:Constant (c)
Identify Objective : Mean_y >> - Eg:
(b*Mean_x)
Minimise Error [Sum of Squared Error (SSE)]
6: Converts whole prb into Optimization Prob
(Y
- Ordinary - YBar)^2Least || Squared
Y : Actual Y, Ybar : Predicted Y
Regression
-- Remove OLS Regression X vars which>> Comeup has >0.05withP-value
best fit line which has Minimum SSE
- Pred_Y : model.predict(df)
- R-squared - Predicted_Y : Accuracy- Avg_Y|| It should be high while building model
- Best fit line - Base model || Base Model : Straight line at Avg
-SSR Sum / SST
of Squared Total
- SSE + SSR
Adj
in stats R^2model, every increase in unit Y, X will increase. But it's not correct way, 1st we need to convert
Bita value in standard format
How model fits in real scenario
Train
-- Underfitting & Test or: Model Devlopment & Validation
performing
Used to see HTML summary of DFbad on Training Data
- Understand Overfitting
Go anaconda : Data
Model
prompt performed
>> Installwell on Training but>>
pandas_profiling failed
pip on testing
install pandas_profiling OR conda install
-pandas_profiling df.describe()
- Boxplot to identify Outliers
import pandas_profiling
-- Histogram reportrow
Drop to check
where Y varY var
is||is normally
= pandas_profiling.ProfileReport(df)
null >>not, dist or not = 0, subset = ['Y_Var'])
df.dropna(axis
-- Check Normally Distributed
special char
report.to_file('report.html') If then perform log transformation to bring it normal
- Dividing X_Vars
Col name
Data into
should Y var, Numwith
be correlated
correction >>
& Cat dataset
Y_Var
df.columns.str.replace(".", "_")
-- Outlier Variance
X_Vars & /missing
shoulAvg(x)not value
be Treatement
||correlated
Variance >>for
with (x Num_Vars
each - Avg(x))^2
other
-- Replace Changing Data
missing types >> df.ColNm.astype("float64")
values with mode for Cat_Vars
No missing values
df.ColNm.str.split("-", expand
-- Create No outliers dummy for Cat Vars >> =pd.get_dummies(df_cat,
True) drop_first = True)
-- Combine RFE
Not >>
sufferingDatasets
Recursive from >>
Feature pd.concat([df_Num,
Elimination
hetroscedasticity (Variance df_cat_dummies,
is increasing overdf_Y], axis=1]
a(Hypothesis
period of tym)
-- Eg: Check
Value We Y want
Var
should is3beX <2,
vars
normal >>
or
drop It internally
not:
var >2 runs several types iteration Tests) >> Y ~ X1 + X2 +
-X3… FData
- Regression
is on same >>scale
aka Univariate
(Used in Regression
KNN, K-Means, PCA, etc)
-- Check K - If np.log()
Best Collenarity can't normalise,
b/w X & X then there are several other ways to do it
--- If Select
- Can
VIF
Correlation
top Xisvars
search
value b/w >2 with
onXthen minhave
google,
& X we
P-Val
eg: toordrop
squared,drops X_Var
cube,
that etc.1with
Var by 1Max P-Val VIF
& re-run & again runs iteration
--- from If
It there
VIFrun
are 10 X vars, and we
statsmodels.stats.outliers_influence
>>iteration
Variationon Y_Var vs
Inflation every X_Var and gives P-Value & F-value & drop single X var in each
Factor
want 3 vars, then
import it will run 7 iterations,
variance_inflation_factor
iteration
- from Y ~ X1, patsy
Y ~ import
X2, Y ~ dmatrices
X3, ….
-- Here from= we
VIF can provide number of import
sklearn.feature_selection
[variance_inflation_factor(df.values, X_Vars RFEwe i) want
for cani(nin=range(df.shape[1])]
?)
-arrays As
Andsorting on Min
F-classifier or PChi
- Value
Square ortest
Maxwhich
F - Value,
want we
to choose number of X_Vars||we want
perform
df.values will give
-- Here from of all columns
sklearn.feature_selection import f_regression
- VIF = Kpd.Series(VIF) means number of X_Vars
-- col from sklearn.feature_selection
= pd.Series(df.columns) import SelectKBest, f_classif, chi2
-- VIF =be pd.concat([col, VIF],variables
axis = 1)
-- Will Split
import Data taking
intoonlyTrainingthose
statsmodels.formula.api & Testingaswhich smf
fall under - 0.5 to 0.5
-- from Its range is -1 to 1
modelsklearn.model_selection
- Implement = smf.ols(Y
prediction ~ X1model
+ X2 +on import
X3…. train_test_split
, data
Train = train).fit()
& Test || If= we taken log Y, then we have to take its exponential
train_test_split(df,
-- print(model.summary2()) test_size = 0.3, random_state 123)
Train_Pred = np.exp(model.predict(train))
-- Drop Test_Pred X var =which has highest P-value & re-run iteration
nop.exp(model.predict(test))
- Mean Absolute
Check accuracy, MAPE, SSE, Percent ErrorRMSE,
(0 to 100)
Correlation it should similar for both Train & Test / positive & high
-- Should be as low
from sklearn import metrics as possible
-- SortGives Y over
& Pred_Yall error
metrics.mean_squared_error(Y, ||inGroup
data themY_Pred) in 10 parts
-- Take
np.mean(np.abs((Actual_Y
Avg of each
rootYof&itPred_Y part
to measure & make -
RMSEPred_Y)
a line / Actual_Y))*100
chart || Line should overlap each other
- Corr b/w
If it shows any differenceshould be positive
at any level & very
than high to identify
it is easy
-- Train & Test
stats.stats.pearsonr(Y,RMSE / MSE should
Pred_Y) be similar
- pd.qcut(train['Decile_no'], 10, labels = False) || will give numbering which help to sort df
-- Training
Group by&itTesting on deciles shouldwithhaveAvg similar values
of Y & Pred_Y
- It should be similar for Train & Test
- It uses a function to normalize Y values
- It has 2 functions to do so:
True Positive
- Probit False Positive (Precision)
- Sigmoid
False Curve >>True
Negative
- Logit S shaped
Negative curve
-- Logit exp(mx + c) / 1 + ex(mx +
funtion internally converts
(Senstivity) c)
(Specificity) = Probability
Y (0 & 1) to normal (0.29, 0.78, etc) as probability
-- Here, Then it we can comeup with best fitDistribution,
Y value is 0 or 1 aka Bernauli line so cannot come up with best fit line
-- But there are function
Senstivity >> TP / (TP + FN) which brings it into
|| Should function
Normal be as high>> asGLM
possible
- Specificity When we
A cutoffonly >>
use
point TN
GLM /
which (FP +
with TN)
Logit function,
decided value ||
it is
to be(0,0 1)Should
known
or 1 be
as as high
Logisticas possible
Regression
-- Solves Binary / /(TP
Binomial Problems
- Precision ROC >>
(Receiving
Multiclass
TPOperator
Problems
+ FP)
Curve) >> Gives optimal Threshold value
-- Accuracy Totsl 1's / >>Total(TP1's+ +TN) / (TP
Total 0's+ TN + FP + FN)
-- F1 Score
Concordance >> (2*Recall) + Precision / (Precision + Recall) || Higher is better
-- AUC Concordance >> Sum (Concordance) / Sum (Discordance) || If >1 then good else bad
- Discordance >> 1 - Concordance
-Max[Abs(Bad In 1st 4 to 5 Deciles,
- Good)]we>>should havebe
It should 70intotop80% of customers
5 deciles
- Lift chart
-- Classification
EDA
-- Some times b/w
Correlation we get X &2(WOE)
Ydifferent
should be dataset
partially >>fulfilled
Prev & >>Newdf.corrwith(df.Y)
Weight
-- Sometimes
import of Evidence
we get in a single
statsmodels.formula.api dataas Actuals
sm & New
Correlation Db/w X
--- model Sommer's (Ginni)
=reduction
sm.logit('Y_Var ~ X1, X2, ...', df).fit()
-- Feature
VIF
p = model.predict(df)
-- RFE (Concordance - Discordance) / (Concordancep) + Discordance + Tie)
-- X ORvarwith
AUC_score Random
should
>> 2 * AUC -
Forest Classifier
=categorical
metrics.roc_auc_score(Y_var,
1
SommersD
If X_VarD=include
2 * AUC_score then -1
- Eg:Sommer's should income,
be high so it’swe will bin
a good modelit in 4 parts
-ThisSommer's
is for allDX_Vars,
>> Y ~but X1,weY ~need
X2, ….
with|| Just lyk F X_Vars
individually Regression
|| Code in image >>
Take vars which
- Confusion Metrixhas higher sommersD value
- import statsmodels.formula.api as sm
- Senstivity
-- model = sm.logit('Y_Var
- Specificity ~Train
X1, X2, ...', df).fit()
- Calculate
Thres = Sommer's
Total 1's
print(model.summary2())
- Precision / D for1's
(Total +
||
& Test
Total 0's) ||X vars
Remove
it should be high & similar
which has high P-value and re run model
-- AUC
< Thresscore
= 0 || >Thesh = 1
- Accuracy
- Concordance (Have to explore from google)
- Py >> print(metrics.classification_report(Actual_Y, Pred_Y))
- OR >> Max(Senstivity + Specificity)
- Decile Analysis (KS Value)

Interquartile Ratio
- -Upper:
Q3 - Q1 Q3 + 1.5*IQR
fuzz.ratio('geeksforgeeks',
-pip Lower: Q1 - 1.5*IQR 'geeksgeeks')
-87 Q1:install
1/4 *fuzzywuzzy
(n+1)
query
pip = 'geeks for geeks'
Q3:install
-choices python-Levenshtein
3/4= *['geek
(n+1)for geek', 'geek geek', 'g. for geeks']
fuzz.ratio('GeeksforGeeks',
fuzz.partial_ratio("geeks for'GeeksforGeeks')
geeks", "geeks for geeks!")
from
100 fuzzywuzzy import fuzz
#from
100Get a list of
fuzzywuzzy matches
importordered
process by score, default limit to 5
fuzz.token_sort_ratio("geeks
fuzz.ratio('geeks for geeks',
process.extract(query, for geeks",
'Geeks
choices) "for geeks
For Geeks ') geeks")
100
80
[('geeks geeks', 95), ('g. for
fuzz.partial_ratio("geeks forgeeks',
geeks",95), ('geek
"geeks for geek', 93)]
geeks")
64
fuzz.token_sort_ratio("geeks
# If we want only the top onefor geeks", "geeks for for geeks")
88
process.extractOne(query, choices)
('geeks geeks', 95)
- Nominal Encoding:
- Ordinal Encoding:
Classes Topics
Class 21 Customer Segmentation
Types of Techniques
Regression Problems
Classification Problems
Segmentation Problems
Forecasting Problem

Gradient Descent Algorithm

Gradient Ascent Algorithm

Class 22 Bias
Regularization
Cross Validation - Kfold
Validation

KNN
Similarity Metrics
Scaling of Data

Distance
Correlation
Cosine Similarity
Z_transformation (Standard
Scaler)
Min-Max Scaler

Weightage
Uniform & Distance
Parameters for KNN
GridsearchCV
Best Fit Line
KNN Imputation

Class 23 Packages Used in Python

Feature Selection
KNN for Classification Problem
Standardise Data
GridsearchCV
KNN py
KNN for regression

Naïve Bayes
Probability Understanding in
detail
Limitations of NB
Types of NB in py
NB py

Class 24 Decision Tree

DT Vs Others
Decision Boundry
Decision Tree Classifier
Nodes
Splitting Criteria
Stopping Criteria
Tunning Parameters
Gini
Entropy
Algorithms (Types of DT)

DT Regressor
DT Segmentation

DT as Feature Reduction
Technique
DT in Grouping Vars

Class 25 Advantages of DT
Disadvantages of DT
Py Implementation
pydotplus
Graphviz
DT Tunning parameters
DT Tunning Parameters
DT Feature importance

Ensemble Learning
Classification of ensemble models
Homogenious Ensembling
Hetrogenious Ensembling
Bagging
Bag vs Out of Bag
Tunning Parameters of EL
Bagging & Random Forest
Py Implementation

Class 26 Boosting Algorithm

Adaboost
Gradient Boosting
Description
Dividing Customers
- Segmentation into groups such that within group there is similarity & b/w groups there is dissimilarity
Problems
-- Forecasting problems'
Bagging Regressor, Random Forest Regressor, Xgboost Regression
-- K-Nearest
K-Nearest Neighbors
Neighbors Regressor, SupportVector
Classfier, Support VectoClassifer,
Regressor, Artificial
Artificial Neural
Neural Network
Network Regressor
Classifer
-- Naive Bayes
Scientific Classifier
Segmentation
- K-Means/Medians
- Using Regression Clustering, Hierarchial Clustering, DBScan Clustering (Density Based Clustering)
- All Regressor Techniques can be used
- Used to minimise SSE by changing Beta (X) values
- It helps to adjust Beta's such that in next iteration, value of objective function will be decreased
- Helps to solve maximization problem

- Helps to reduce problm of overfitting by validating model while building it

- Eg: BiasTrain:
means 700 rows & 20 Vars , Test: 300 rows & 20 vars , take K=5
Errors
- It will divide 700 into 5 parts (i.e., 140 each)
- Helps to give
- Refer Fig. reduce
M4 as problem
final model of overfitting
and can givebymax
giving less importance
accuracy on Test to insignificant variables & high importance to signi
- If data size is less, then K value should be high, & vice versa
- KNN Classifier : Classification Problem
-- KNN Regressor : Regression Problem
Distance
-- KNN Imputation : Missing Value Imputation
Correlation
-- Cosine Similarity (Standard Scaler)
Z_transformation
- Min-Max Scaler
- SQRT(x1-y1)^2 + (x2-y2)^2 + (x3-y3)^2 + ….
-- Rank Here Xobservations
& Y both arebasedX_Varsonof similarity
differentfor given observation
rows
-- Low means similar
- Corr Findingwith Row
similarity1 data vs other row data
by angle
-- High means similar
Less the angle, high the similarity
- High means similar
Z_Age
- Distance = (Age - mean(age))
is less / std(age)
weight is more, vice versa
-Transformed_Age
Weight = 1/Distance = Age - min(Age) / range(Age) || Range = Max - Min
-- If we put K=5 = Weight / sum(All Vars
Weight_final Weight)
-- It will look for 5 neighbours,
Sigma_Weight_Final = AcutalandY1 predict & take avg
* Weight_final X1+of.....
it
-K : 3,4,5,6,7,,,,
Weighted Average = Sigma_Weight_Final > 0.5 then 1 else 0
-- Simple
WeightsAvg : Weights
: Uniform are uniform
or Distance
- Weighted Avg : weights depends on distance
-- Whichever combination gives higher accuracy, that is best
Knumpy is low >>& pandas
More fluctating line
-To isfind
highBest
>> parameters
Kpandas_profiling More smoother line
-Soscipynormally
-- If any value isvalue
matplotlib
K is taken
missing then b/w
try to5 find
to 15out which rows are similar to it for rest of columns
-- Find
seabornnearest 3 or 5 variables and take avg of it
- statsmodel >> Linear models, Time Series
- sklearn >> DT, Bagging (RF, Adaboost, Gradietn, etc), KNN, SVM, ANN, NB
- xgboost
- keras >> ANN
from sklearn.ensemble import RandomForestClassifier
rfe = RFE(RandomForestClassifier(), n_features_to_select= 6)
- from sklearn.preprocessing import StandardScaler
-RandomForestClassifier from sklearn.neighbors isimport KNeighborsCalssifier
- std_data Help
from to
= StandardScaler()
identify not influenced
best paramteters
sklearn.model_selection
by Multicollienearity
import GridSearchCV
-- std_data para_grid == std_data.fit(train_X)
{'n_neighbors' : [3,4,5,…10], 'weights' : ['uniform', 'distance']}
-- Knn_Model train_X_std_data
GS =
Conditional
= KNeighbhorsCalssifier(n_neighbors=5,
Probability = std_data.transform(train_X)
GridSearchCV((KNeighborsClassifier(),
(When there are para_grid,
several
weights='distance')
scoring='accuracy')
conditions)
- train_X_std_data Knn_model.fit(train_X_std_data, = pd.DataFrame(train_X_std,train_Y) columns = train_X.columns)
-- Bayes GS.fit(train_X_std_data,
Theorem (Conditional
Knn_model.predict_proba(train_X) train_y)
Probability) ||>>to P(Y/X)
predict= P(X/Y)*P(Y)/P(X)
probabilities || P: Probability
-- GS.best_params_ Knn_model.predict(train_X) || To predict 0 & 1
-- 1. from sklearn.neighbors import= KNeighborsRegressor
P(Cust=Bad)+P(Cust=Good) 1
-- for Knn_model.predict(test_X)
- 2. Xscoring & Y are >> 'r2'
2 independent vars
- For solving
Try-yourself
P(X or classification
Y) = P(X)*P(Y) Problems
- Can only predict if new data has same class in previous data
-- Naïve 3.
If new
Bayes
X = [X1,data
Classifer
X2,X3]
contains , Y any other || class X1, which
X2,X3 are independent
is not present vars then it can't predict
in X_Vars
data,
- Bernaulli Naïve - : Naïve
P(X/Y) Assumptions
If X_Vars are (There
NB=:P([X1,X2,X3]/Y) categorical
= is no&multicollinerity
Binary
P(X1/Y)*(PX2/Y)*P(X3/Y) (0,1) : All are independent)
-- All Bayes X_Vars
Multinomial should
NB : IfbeX_Vars
: Conditional categorical,
Probability Eg: if Income
are categorical is there then we have to bin it 4 or 5 parts
& multinomial
- Or Numerical
Gaussian NB : variable
IfGood
X_Vars should
are num be & normallynormal
distributed
- Calculates P of and P of Bad, follow
then higher one distribution
is selected
-- NB from workssklearn.naive_bayes
wellMixed
if data(Num,Cat) import MultinomialNB, GaussianNB, BernaulliNB
is categorical/descrete
- If X_Vars
nb_model are
= GaussianNB() : Convert num vars in categorical & use Multinomial or binomial depends on X_Vars
-- If X_Vars are num thentrain_Y)
nb_model.fit(train_X, do log of all vars to bring in normal distribution
- nb_model.predict(train_X)
-NB Tree Structure
Classifier whichhave
doesen't helpsany to take decision
parameters
-- Conditional Classification Problem:
Decisions
Take care of Non Linear Relationships / Decision
Rule Based Tree Classifier
Decisions
| While Linear model take care of it
-- Take Regression care ofProblem:
Interaction Decision
b/w vars Tree Regressor| Can't handle interaction b/w vars
-Eg:
-Line Segmentation
It Call Center
dosen't havehas Problem:
any Dept Objective
3 assumptions>> Customer Segmentation
Facing
| There issue,arecalled Service Center
assumptions & get following
(Correlation, etc) options to select >>
which
Prepaid,require helps
Broadband, to divide
DTH data
-- Dosen't Linear Decision much data
BoundryServices >> preparation | Require lot of data preparation
-- max_depth - Postpaid , Prepaid
Non - Linear Decision
- Network Boundry>>
- Min Numbers >> /Get Billing
by significance level >> P-value < 0.05
-- Chi nodes Square >> No. of -A Support
leaf nodes guy assigned
-- Ginni Splitting Criteria
Root
- Stopping No. of vars
NodeCriteria >> No.
(Top) >> Child of varsNodewe (B/w
want topto perform
& last) >> (That means
Leaf Nodex (last)
vars will be taken, eg: gender, income, age, income)
- Entropy Min no. of Obs in each nodes
- Diff Information
CHAID from>>used Gain
Chisquare
previously Automatic Interaction Detection Tree | Splitting Criteria: Chisquare (Multi Split)
Whichever
->> CART 1 - >> combinations
Classification
Probability(1's %) - give
&any high Train &
Regression
(Probability(0's %)test
Tree Accuracy
|| On Node level| base:2)
Splitting: Gini/Entropy (Binary Split)
To stop criteria,
Entropy Formula we >> can put
-(1's%) * of them,or
log(1's% combination
base:2) - 0's% *of them
log(0's%,
- C5.0 / Id3 >> Classification Trees | Splitting: Information Gain (Binary Split)
GridSerchCV()
>> Ginni to get=best &combination
-Multi Min 0 Min
=Split value
&>>Max A = 10 can
node
Max Ginni = 0.5
distibute into multiple childGinni
nodes
- Best Splitting Criteria >> Splitting criteria giving least
Binary >> A node can distibute into 2 child nodes only
->>Split Best onfitting
the basis = Min Entropy
of Y_Var num
- Splitting
Mainly CART Crieteria
is used, >>it is faster
- F-Value >> Split on Highest
- MSE >> Split on Lowest
-DT Stopping
Classificationcriteria can is same
be used as segmentation
- For eg: Perticular group is giving most bad customers
- Root Node >> Can be said as most imp var
- Child node >> Can be said as 2nd, 3rd,.. imp vars
-- LeafDT while Nodecreating>> Can be said
nodes, as least
divides Varimp
intovars
2/3/.. groups
- Non So it Linear
can also models
be used to bin num vars
- Easy to implement (No multicollenearity required, no feture reduction required)
-- Easy
Don'ttouse understand
all the vars&in Explain
DT rules
-Splitting
Model
- High chances Building is very
of Overfitting quick
Criteria:
- If -tree
For is big, it is hard
Regressor: f-value,to explain
MSE in form of tree
- For Classifier: Gini, Entropy,
- from sklearn.tree import DecisionTreeRegressor(), Information Gain DecisionTreeClassifer()
param_grid
- PackageCriteria
Stopping = {'criterion':
in python to view
is same ['gini',
fordecision 'entropy'],
both: tree graph
-Package
max depth 'max_depth': [3,4,5,6],
to export graphviz object
'max_features': [3,4,5,6],
- #nodes
- Min objct in each node 'max_leaf_nodes': [4,5,6,7,8,9]}
- Min objct Split further
Collective Model:
clf_tree.feature_importances_
- Build multiple models seperately
- ConsolidateEnsembling
Homogenious output into >> singleAll output
individual models same algos
- Final Output will be compared from original data to calculate accuracies
- Automate process
Hetrogenious
Bootstrap
- Parallel Processing Ensembling
Aggregating Algo >> >> Bagging
All individual models
Algo (Help to may
handleuseproblem
differentofalgos
Overfitting) >> Bagging, Random Forest
- In multiple Processing
Sequential samples, Data Algowill >>not have duplicate
Boosting Algo (Help records
to handle problem of Underfitting) >> Gradient Boost, XGBoost
- Samples may have Overlap records
-- Manual
Getting single Process output by aggregating all outputs
- Use Decision Tree Method
sklearn.ensemble
Eg: 700 samples
- -Decision
Bagging_classifierwere there,
Tree Multiple
tunning for ensemble 500 records are picked & 200 are left.
Parameters
Bagging
-- -500 are Build
Bag
Bagging_Regressor models from Train Data using all columns/Vars, Random Forest build models on few Vars
Number of Models
-- -200 are Out of Bag (OOB)
Random_Forest_Classifier
- -Bag
RandomSize >> 2/3rd
Trees of data
(Trees
Random_Forest_Regressor are different) >> When Using few Vars instead of all in every model
- no. - of vars
These to
are consider
collection inofeach sample
Trees = forest
- Adaboost
- It is known as Random
- GradientBoostClassifier Forest
- -Weak Learners (Icorrect Predict) will be boosted
GradientBoostRegressor
- Buld Multiple models & Decrease weight of correctly predicted elements & Increase weight for incorrectly predicted elem
-- Adaboost
Gives Accuracy >> Adaptive boosting
at the end, whichever model gives high accuracy will be selected
- Gradient Boosting
- Weight1 * Error1 + W2*E2 + ....
-
gnificant vars
NLP https://www.youtube.com/watch?v=zlUpTlaxAKI&list=PLKnIA16_RmvZo7fp5kkIth6nRTeQQsjfX

Class 1 NLP
Real World NLP Apps
Common NLP Task
Approaches to NLP
Heuristic Approaches
Machine learning
Approaches
Deep Learning
Approaches
Challenges in NLP

Class 2
ww.youtube.com/watch?v=zlUpTlaxAKI&list=PLKnIA16_RmvZo7fp5kkIth6nRTeQQsjfX
- It is a subfiled of CS, AI & Human Language
-- Search Engines
-- Chatboats
Text Parsing
-- Speech
MachinetoLearning
text: Voice Type
Methods
-- Deep Learning Methods
Wordnet
-- LDA
Open Mind Common Sence
-- Hidden Markov
Transformers Models
(Too much used, boosted NLP)
-- Autoencoders
Creativity
- Diversity
EDA Process - Import important libraries
- If X vars are more, then remove few of them manually
- If Column contains >25% null, then drop that column
- If Y Var is missing, remove that row
-- Separate
Outlier & Num_Vars, Cat_Vars
missing values capping& Y_Var
on Num_Vars
- Capping by mode on Cat_Vars
* Creating Dummy Vars (0,1 flag) >> When Independent Vars
* Creating Label encoding (Numbering the vars) >> When dependent vars
-- Concat both data
Check Y_Var (Num Vars,
is normally Cat Num
distributed vars)
or not
(Regression Problem) - If5.not
VIFthen take log on it
* Soomer's D (Gini) >> Classification Problem
- Taking Unique Col names after above process
- Train Test Split
- Rebuild final Model
(Regression Problem) - Convert Y value normal (exp) if converted into log
(Regression Problem) - Compare it with Test Data
* AUC ROC Score >> metrics.roc_auc_score()
(Classification Problem) * Gini
- Decide cutoff point & Make data into booliean form
- Compare it with Test Data
https://www.youtube.com/watch?v=qCR2Weh64h4&list=PLzMcBGfZo4-lUA8uGjeXhBUUzPYc6vZRn

Tutorial 1
Variability Std Deviation: SQRT of Variance || Variance : Avg Squared Dev
Standard Deviation SQRT[SIGMA( Sample - Sample Mean )^2 / No. of Samples -1] || SQRT(sigma(X-Xbar)^2/n-1)
Squared Deviation (Mean - X)^2
Variance Formula Parameter / a^2 : sigma(X-u)^2 / N || u = Population mean , N = No. of Population
3/6 Sigma Rule For Normal Distributed Data
Standard Error Rule/1SQRT(n)
STD : +- STD_1 will cover 68.2% of Data
Error Types - Type 1 Error : Rejecting Null while it should be accepted
Test we have to perfrom -- TType
test2 Error : Accepting Null while it should be rejected
in Py - Onescipy.stats
Sample T test >> 1 Sample vs 1 value
Py Implementation - import as stats

SSE (Sum of Squared

Error) (Y - YBar)^2 || Y- Avg_Y
- Predicted_Y : Actual Y, Ybar : Predicted Y
SSR - Best fit line - Base model || Base Model : Straight line at Avg
R^2 (Accuracy) SSR
- Sum/ SST
of Squared Total
SST - SSE + SSR

Data Preparation Level - Correlation b/w X & X

3: Feature Reduction -- Sort
VIF >>over
Gives Variation Inflation
all error
Y & Pred_Y dataFactor
||inGroup them in 10 parts
MAPE --True
np.mean(np.abs((Actual_Y
TakePositive
Avg
root of
ofeach
it to part
measure - Pred_Y)
&Positive
make
RMSE / Actual_Y))*100
a line(Precision)
chart || Line should overlap each other
-- Corr b/w Y & Pred_YFalse should be positive & very high to identify
RMSE / MSE False If it shows
Train & Test
Negativeany difference
RMSE True/ MSE at any level
should
Negative be than
similar it is easy
-- stats.stats.pearsonr(Y,
pd.qcut(train['Decile_no'], Pred_Y)
10, labels = False) || will give numbering which help to sort df
Correlation -- Training (Senstivity) (Specificity)
Group by&itTesting
on decilesshouldwithhave
Avg similar values
of Y & Pred_Y
Decile Analysis -- It should be
Senstivity >>similar
TP / (TP for+ Train
FN) & Test || Should be as high as possible
-- Specificity >> TN / (FP
A cutoff point which decided + TN) value to be || 0 Should
or 1 be as high as possible
-- Precision >> TP / (TP + FP)
ROC (Receiving Operator Curve) >> Gives optimal Threshold value
-- Accuracy
Totsl 1's / >>Total(TP1's+ +TN) / (TP
Total 0's+ TN + FP + FN)
Confusion Metrix -- F1 Score >> (2*Recall) + Precision / (Precision + Recall) || Higher is better
Concordance
Threshold Value -Max[Abs(Bad
AUC
- Weight of Evidence - Good)](WOE) >> It should be in top 5 deciles
KS Value -- Lift chart
Feature Reduction Sommer's D (Ginni)
Techniques - VIF
Interquartile Ratio
IQR
- -Upper:
Q3 - Q1
Q3 + 1.5*IQR
Outlier -- Lower:
Q1: 1/4Q1 - 1.5*IQR
* (n+1)
Quartile - Q3: 3/4 * (n+1)
Feature reduction method for Classification problem
KNN for Regression parameters
DT for Regression
Naïve Bayes
How we can find Skewness from Boxplot?
Which part of SQL query run first then 2nd and so on
Function to find out count of vowels from a string
Why collenear vars are removed?

SQL
Remove Duplicate Records
Lead / Lag Date Function
Replacing Mbl No with 'xxx'
3rd Highest Salary
To check data is not present in both tbls
Only Columns of table
Split KG & Gram
Generalization (Bottom to Up)
Specialization (Up to bottom)

Tableau
LOD Syntax
Sets

Python
Iterators Vs Generators
- orderby
- limit

WHERE row_num = 1
); date_part('days',lag(date,1) over(partition by customer_id order by date desc) - date) as order_Date_diff
from order_table ) a
CONCAT(SUBSTR(phone, 1, LENGTH(phone) - 5), 'xxxxx') as update_mbl
from tbl_nm
from tbl_nm) a
where
LEFT JOINrn =table1
3 ON table2.id = table1.id
WHERE table1.id IS NULL;
select * from Employee
limitsplit_part(weight::text,
0 '.', 2) AS second_part
from employee
Creating new (Higher Level) entity by combining lower level entities
- Eg: Tbl1: Employee , Tbl2: Customer >> Tbl3: Name, Add, Ph no.
Vice versa of Generalization

-{ To Create
FIXED Iterators
[Customer we use
Name] iter() keyword
: SUM([Sales]) } || Iter process value 1 by 1
-Helped
To Create
to filter out Top, by condition, etc along with function || Yield keyword save local variables
Generator we use yeild keyword

- It can be run using next() or for loop

- Generators helps to write fast & compact code compared to iterators

- Iterator is more memory efficient
- Generator is also a type of iterator
https://cashify.udemy.com/course/python-coding/learn/lecture/5488068#overview
For Practice https://pynative.com/python-if-else-and-for-loop-exercise-with-solutio
Section 2: Core Programming Principles
type(x) show var type
Dtype int, float, string, bool,
--- It will give a line

import math
-and-for-loop-exercise-with-solutions/#h-exercise-1-print-first-10-natural-numbers-using-while-loop

Python Cheat Sheet 2.0
100% (1)
Python Cheat Sheet 2.0
10 pages
Himsen H21-32 - Techology Review PDF
100% (1)
Himsen H21-32 - Techology Review PDF
50 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (3)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
9 pages
IBM Cognos Analytics 11 Dashboards and Stories User Guide
No ratings yet
IBM Cognos Analytics 11 Dashboards and Stories User Guide
206 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
PythonForMachineLearning
No ratings yet
PythonForMachineLearning
66 pages
CS1010S Lecture 11 - Visualising Data
No ratings yet
CS1010S Lecture 11 - Visualising Data
68 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
CheatSheet
No ratings yet
CheatSheet
15 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Python Cheat Sheet For Excel Users
No ratings yet
Python Cheat Sheet For Excel Users
5 pages
Python For Data Analysis
67% (3)
Python For Data Analysis
39 pages
Python For Data Science Cheat Sheet 2.0
No ratings yet
Python For Data Science Cheat Sheet 2.0
11 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
python interviews
No ratings yet
python interviews
154 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
L6 and 7-Data Preprocessing-coding
No ratings yet
L6 and 7-Data Preprocessing-coding
34 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
data science practicals
No ratings yet
data science practicals
47 pages
Python For Data Science Cheat Sheet 2.0
100% (1)
Python For Data Science Cheat Sheet 2.0
11 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
sowmi DS
No ratings yet
sowmi DS
27 pages
dv_lab_manual_modified
No ratings yet
dv_lab_manual_modified
31 pages
Pandas
No ratings yet
Pandas
5 pages
Creation of Series Using List, Dictionary & Ndarray
No ratings yet
Creation of Series Using List, Dictionary & Ndarray
65 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
01 Introduction to Python
No ratings yet
01 Introduction to Python
36 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
Pythonfor Data Analysiskkkk
No ratings yet
Pythonfor Data Analysiskkkk
43 pages
QP DAV 3rd Sem Dec 2023
No ratings yet
QP DAV 3rd Sem Dec 2023
12 pages
EDA_CODE_SNIPPETS
No ratings yet
EDA_CODE_SNIPPETS
17 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
100% (1)
Comprehensive Guide Data Exploration Sas Using Python Numpy Scipy Matplotlib Pandas
12 pages
Practical file 12th
No ratings yet
Practical file 12th
19 pages
I.p file
No ratings yet
I.p file
20 pages
rajni_ip_file_final
No ratings yet
rajni_ip_file_final
42 pages
jenisha INTERNSHIP REPORT-2.docx (1)
No ratings yet
jenisha INTERNSHIP REPORT-2.docx (1)
19 pages
Khadeeja_DS_PRACTICAL 4
No ratings yet
Khadeeja_DS_PRACTICAL 4
24 pages
ICT2103 Full Book-Part-3
No ratings yet
ICT2103 Full Book-Part-3
14 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
IP Practical File Project
No ratings yet
IP Practical File Project
60 pages
Practical Assignment4 1
No ratings yet
Practical Assignment4 1
6 pages
Fundamentals of Data Science Students
No ratings yet
Fundamentals of Data Science Students
52 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
CSE445 NSU Week_3
No ratings yet
CSE445 NSU Week_3
48 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Data Analysis Tools
No ratings yet
Data Analysis Tools
26 pages
Ilovepdf Merged (2) Merged
No ratings yet
Ilovepdf Merged (2) Merged
65 pages
Cheat Sheet Template
No ratings yet
Cheat Sheet Template
3 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
XII IP PRACTICAL LIST 2022-23-1
No ratings yet
XII IP PRACTICAL LIST 2022-23-1
23 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Bagadi-Stress Strain Hognestad Curve
No ratings yet
Bagadi-Stress Strain Hognestad Curve
9 pages
Ifive Mini 4S Upgrade Tutorial
No ratings yet
Ifive Mini 4S Upgrade Tutorial
4 pages
SSC CHSL 2017 Tier-1 All Shift Question Paper With Answer Key (WWW - Superpathshala.com)
No ratings yet
SSC CHSL 2017 Tier-1 All Shift Question Paper With Answer Key (WWW - Superpathshala.com)
1,580 pages
Increased Screen Time and Its Association
No ratings yet
Increased Screen Time and Its Association
11 pages
Ren 2019
No ratings yet
Ren 2019
26 pages
Chung - The effects of experience on confidence assessment of auditor
No ratings yet
Chung - The effects of experience on confidence assessment of auditor
18 pages
FIBROINTEGRATION
No ratings yet
FIBROINTEGRATION
11 pages
PFN A2 Implants and Technique
No ratings yet
PFN A2 Implants and Technique
84 pages
Gps01 1569793 Osram Dulux Classic B
No ratings yet
Gps01 1569793 Osram Dulux Classic B
4 pages
HW3 Description
No ratings yet
HW3 Description
46 pages
KWS - Oas
No ratings yet
KWS - Oas
2 pages
Manual Esmerilhadeira Lixadeira STANLEY GR29 UW
No ratings yet
Manual Esmerilhadeira Lixadeira STANLEY GR29 UW
22 pages
An Effective Solution For Elemental Sulfur Deposition in Natural Gas Systems
No ratings yet
An Effective Solution For Elemental Sulfur Deposition in Natural Gas Systems
12 pages
Storage of Renewable Energy Using Phase Change Materials (PCMS)
No ratings yet
Storage of Renewable Energy Using Phase Change Materials (PCMS)
9 pages
Phase Change
No ratings yet
Phase Change
2 pages
Rapid Prototyping
No ratings yet
Rapid Prototyping
48 pages
Mind Map For Chapter Four. The Mind Map of The Types of Solution
No ratings yet
Mind Map For Chapter Four. The Mind Map of The Types of Solution
4 pages
Time Value of Money
No ratings yet
Time Value of Money
40 pages
Composites Whitepaper 5 Laminate Theory 0
No ratings yet
Composites Whitepaper 5 Laminate Theory 0
17 pages
MRP Report Chapterwise - 2 PDF
No ratings yet
MRP Report Chapterwise - 2 PDF
95 pages
Free Online Course On PLS-SEM Using SmartPLS 3.0 - Moderator and MGA
No ratings yet
Free Online Course On PLS-SEM Using SmartPLS 3.0 - Moderator and MGA
31 pages
SPEE Recommended Evaluation Practice #6 - Definition of Decline Curve Parameters
No ratings yet
SPEE Recommended Evaluation Practice #6 - Definition of Decline Curve Parameters
7 pages
"Web Designing": A Seminar Report On
No ratings yet
"Web Designing": A Seminar Report On
30 pages
John Maulbetsch Cooling
No ratings yet
John Maulbetsch Cooling
32 pages
Power BI & SQL Interview Questions
No ratings yet
Power BI & SQL Interview Questions
10 pages
Pex 10 01 20.P1.0021
No ratings yet
Pex 10 01 20.P1.0021
5 pages
Investment Appraisal V2
No ratings yet
Investment Appraisal V2
69 pages
Effect of Tempeature On Construction Materials-Unit 1 - R1
No ratings yet
Effect of Tempeature On Construction Materials-Unit 1 - R1
63 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

AL Notes

Uploaded by

AL Notes

Uploaded by

https://cashify.udemy.

Class To revise 37, 38 39

Class 10 \n next line

Class 12 del x to delete a variable

Class 17 my_string.lower() for Lower

will become lower

Tutorial 2 Data Types

Tutorial 13 Lambda Function

Tutorial 15 Map Function

Tutorial 16 Filter Function

Class 24 Advanced Python

Class 25 Custom Exception Handling

Class 26 Inheritance In Python

1+1 = 2, 2*5 = 10, 5**2 = 25, 10/2 = 5, 10%2 = 0

Contain same Datatypes

will give value in array from 1 to 9

df["Col_1"].value_counts()df["Col_1"].unique()df[[Col1, Col2]]df.head()df.info() # shape, memory us

starts with "def" keyword

- Without using loop, it runs all the records of list

Class can contain different functions.

finally: # It will run always, if code is correct or any exception is there

Eg: (3,5) , (5,3), (1,15)

Class 37 Data Blending

Class 49 Creating Bins

Class 50 Parameters with Top 10,20,30... Values

Class 59 Pivot In Data Source Sheet, Mark columns in Data and

Class 67 Analytics Tab in SHeets Cluster

Advanced Tableau Classes https://cashify.udemy.com/course/tableau10-advanced/learn/lecture/5843098#ov

Class 10 Grouping the Data

Class 13 Combining Sets

Class 14 Controlling sets with Parameters

Class 48 Increase size of small bubbles in chart

Class 51 Tableau Animation

Mark columns in Data and select pivot

ight Click on column & SPlit

For Python Practice

For SQL Practice

Collabering extra values to previous values in same ratio

AWS, cloud, azure

JavaScript Object Notation

Class 3 Add Data

Class 5 Creating Table

Class 7 Insert Data

Class 9 Math Operators

On DELETE Option what happens

Insert into cities (name, country, population)

|| Join 2 strings, concat(), lower(), length(), Upper()

> < = <> != >= <= IN NOTIN BETWEEN

Delete complete table

SERIAL PRIMARY KEY,

Class 12 (2) Variability

Class 13 (1) Sample Types

Class 14 (1) ND & SND

Class 15 (1) ANOVA

Stats by Krish Nayak

Class 4 Random Variable

Traditional Way, Data Science Way

- Proving something Statistically

Class 17 Linear Regression

Class 19 Bankloans Data

Gradient Descent Algorithm

Class 23 Packages Used in Python

Class 24 Decision Tree

Class 26 Boosting Algorithm

- Helps to reduce problm of overfitting by validating model while building it

SSE (Sum of Squared

Data Preparation Level - Correlation b/w X & X

- It can be run using next() or for loop

- Generators helps to write fast & compact code compared to iterators

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.