AL Notes
AL Notes
com/course/python-for-data-science-and-machine-learning-bootcamp/learn/lecture/5733434
Class 19 print(my_str.zfill(10)) total 10 values will be there, and blank will be replaced by 0
Class 36
or-automation/learn/lecture/15328078#overview
ill be replaced by 0
LinkedIn https://www.linkedin.com/learning/python-for-data-science-essential-training-part-1/what-you-sho
Krish Nayak https://www.youtube.com/watch?v=bPrmA1SEN2k&list=PLZoTAELRMXVNUL99R4bDlVYsncUNvwU
Tutorial 1 Operators
type(1)
Data Types
Format Paste
len('Pranay')
Tutorial 3 Sets
Dictionaries
Tuple
Tutorial 4 Array
Numpy
arr.shape
Creating Array
(Rows, Columns)
Indexing
np.linspace(1,10,50)
copy function
np.arrange(1,10)
np.ones(4) || np.ones(2,5)
np.random.rand(3,3)
np.random.randn(4,4)
np.random.randint(0,100,8).res
hape(4,2)
Tutorial 5 & 6 Pandas
pd.DataFrame()
pd.to_csv('path/link//
file_name.csv')
.loc | .iloc
df.isnull().sum()
pd DF functions
String to csv
Tutorial 7 pd.read_json()
html (page link) data read
Read Excel
Pickling
Tutorial 8 Matplotlib
For multiple graphs
More graphs code
Pie charts
Tutorial 9 Univeriate Vs Bivariate Vs
Multivariate Analysis
Distribution of Plot in Seaborn
Seaborn
Tutorial 10 Seaborn
Tutorial 11 EDA (Codes)
Tutorial 12 Functions
Print Vs Return
Add Function
Default value in function
Even Odd Sum function
arr[0:2, 0:2]
Give value between 1 to 10 in 50 parts. Eg: [1, 1.18, 1.36 .... 9.81, 10]
import pandas as pd
To create DF
To export data in csv
pd.read_csv('path', sep = ',') # It can be changed from "," to any other if required
df.iloc[:,:] # will show all columns & rows
pd.read_html('url', match = "Any Word to match, if there are multiplte tables on page", header=0)
pd.read_excel("path//fileNm.xlsx", sheet = 1)
excel file converted/saved into pickels
plt.plot()
plt.subplot(2,2,2) ##multiple
graph is options:
shown on color,
2nd line format, line width, etc
index
plt.plot(x,y)
plt.hist(y)
plt.boxplot(Num_Col)
plt.axis('equal')
plt.show()
2 Feature (F1, F2) >> Bivariate Analysis
>2 Feature
joinplot (F1, F2, F3, .... Fn) >> Multivariate Analysis
pairplot
sns.pairplot(df, hue = 'Gnder') # Will show M/F seperate graph in every chart
sns.distplot(df['Y_var']) # Creates histogram with distribution line
sns.violinplot("Gender", "Age", DF) # Gives Violin shape graph
https://github.com/krishnaik06/EDA1/blob/master/EDA.ipynb
Py File
w?autoSkip=true&autoplay=true&resume=false&u=138505057
# shape, memory usagedf.describe() # count, mean, std, min, max, percentil # Only int & float col
e", header=0)
t
h
centil # Only int & float columns takendf.Col1[df.Col1 > 100]
Percentile
get_dummies() If we remove 1st col, then how we will identify if it was significant or not
Pass Vs Break command
Udemy Tableau Classes https://cashify.udemy.com/course/tableau10/learn/lecture/5618178#overview
Hover on 1st map & it show filter on another she
Class 31 Action Filter "Dashboard" >> "Actions"
Highliter
Class 58 Data Enterpreter In Data Source Sheet, Check Box after importing
Class 60 Splitting Column into Multiple columns In Data Source Sheet, Right Click on column & SP
Class 11 Sets
heck Box after importing Data, Removes Null Rows & Columns
/lecture/5843098#overview
ame] : SUM([Sales]) }
Python Packages
Apache Spark
RDD
Logging
apportion
Databricks
JSON
By Jatin
https://towardsdatascience.com/a-complete-data-science-roadmap-in-2021-77a15d6be1d9
https://medium.com/coriers/data-engineering-roadmap-for-2021-eac7898f0641
Basically used to capture errors & warnings and show it in graphical way
Helps to create text files
-data-science-roadmap-in-2021-77a15d6be1d9
ng-roadmap-for-2021-eac7898f0641
oop-exercise-with-solutions/#h-exercise-1-print-first-10-natural-numbers-using-while-loop
EG >> A: 50, B: 30, C: 20, D: 10 || Callaboration of D in others
A: 50+5, B: 30+3, C: 20+2
Constraints
Trigger
Index
Window function
Rank() vs Dence_Rank()
CTE (Common Table Expression)
Create Views
NOT NULL
UNIQUE
PRIMARY KEY
FOREIGN KEY
CHECK: This constraint helps to validate the values of a column to meet a particular condition. That is, it helps to ensure that the value store
DEFAULT: This constraint specifies a default value for the column when no value is specified by the user.
- UPDATE TRIGGER
- DELETE TRIGGER
- BEFORE UPDATE
- BEFORE DELETE
- Indexes are used to retrieve data from the database very fast
- with NTILE()
- with lag() & lead()
- Enable users to maintain complex queries via increased readability & simplification
- Can be accessed in SELECT, INSERT, DELETE, UPDATE, MERGE statement
- To create views, for making Tableau reports
tored in a column meets a specific condition. || Like Dropdown in Excel
https://cashify.udemy.com/course/sql-and-postgresql/learn/lecture/22800007#overview
12 String Operators
13 Where condition
14 Comparision Operators
22 UPDATE
23 DELETE
28 Relationships
30 Primary Keys
Foreign Keys
32 SERIAL
40 DROP
When a tbl 1 PRIMARY KEY is used as FOREIGN KEY in other table, & if we try to delete record from tbl 1
% Remainder
Hypothesis Testing
Significance Level
Z Test
- ND : Normal Distribution
-- SND : Standard
It is decided Normal
by the Distribution,
client same as Normal
or by Data Scientist, usually Distribution,
following arebut here u = 0 & STD = 1
cosiderd
1 - Signifcance Level || Eg: Significance Level = 0.05 or 5%, Confidence level = 0.95 or 95%
- One Tailed Test : When Significance at 1 side of distribution
-P-value
Two Tailed Test : When Significance level is at both sides of distribution
tells AUC
(xbar - u) / (STD / SQRT(n)) || u = Population Mean, xbar = Sample Mean, n = Total no.
STD / SQRT(n)
- 90% : 1.645
- 95% : 1.96
- Type 1 Error : Rejecting Null while it should be accepted
-Used Typewhen:
2 Error : Accepting Null while it should be rejected
-Used n < 30, u & STD is known
when:
-Formula
n > 30 : n - 1
- Dependent (Paired)
-- Independent
Subjects b/w (2 Sample)are same
2 samples
-- Equal
No. ofno. of samples
samples should be almost same
- STD should be same
- ANOVA : Analysis of Variance
-- Use F distribution
Variance within group is low then better
-Variance
Varianceb/wb/wthe
group
groupis high then better
/ Variance within the group
- Parametric Test : Traditional Method, Proper probability distribution involved, Concepts of mean are there.
- Used only for categorical Vars
-- Diff b/w
Helps to observed
understand & expected
nature relationship b/w 2 num samples
-- Check
T test +ve, -ve or no relationship
- Onescipy.stats
- import Sample T test >> 1 Sample vs 1 value
as stats
- import matplotlib
- from matplotlib import pyplot as plt
https://www.youtube.com/watch?v=Vtvj6fPZ1Ww&list=PLZoTAELRMXVMhVyr3Ri9IQ-t5QPBtxzJO&index=2
Descriptive Stats: Include Central tendency, summarizing in form of number & graph, focus on describing the visible chara
dataset
Discrete Random Variable
LAMBDA Function >> Code >> To call out come calculations, functions, etc
Amazon EventBridge
EC2 Bucket, which contains several folders, inside that CSV file is there
S3 Files Can be pushed in S3 by: Boto 3, Py Code, etc
CloudWatch Monitoring >> Uses Consumption >> How much load on data/server, etc (Like Google Analytics)
IAM Users >> Login Credentials, No of users, etc
Interview Q
Entropy
Deflection
ke Google Analytics)
Classes Topics
Class 16 X & Y Variable
Linear Regression
Logistic Regression
B/s Problem
Learning Setup
Discipline
Parametric & Non
Parametric
y = mx + c
Linear Regression
equation
Statistical way (b & c
Formula)
ML Process
SSE (Sum of Squared
Error)
OLS
OLS py
R-squared
SSR
R^2 (Accuracy)
SST
Adjusted R-Squared
Bita
Generalization
Model Validation
Underfitting &
Overfitting
Y_Var Assumptions
Data Preparation Level
2: Assumptions
Data Preparation Level
3: Feature Reduction
RFE
F - Regression
K - best
VIF
Corr b/w X & X
Data Preparation Level 4
Model Implementation
Post Modeling
Class 18 MAPE
RMSE / MSE
Correlation
Decile Analysis
Logistic Regression /
Classification
GLM
Logistic
Confusion Metrix
Threshold Value
Concordance
Discordance
Decile Analysis
KS Value
Feature Reduction
Techniques
WOE
Sommer's D (Ginni)
Sommer's D Py
VIF
Model Implementation
Model Evaluation
Threshold Value
Model Evaluation
IQR
Outlier
Quartile
fuzzywuzzy
fuzz.ratio
fuzz.partial_ratio
fuzz.token_sort_ratio
process.extract
Label Encoding
BM25
NMSLIB
Tocken Vectorizer
Description
affecting sales
-- Optimisation Reinforsement >>Learning
Used to maximise
>> Includes or "Sticks
minimise || Eg:orA"rewards
& carrots" company&wants to maximise
punishment" || itsDeep
salesLearning
setup
- Bayesian (ANN,>> etc)Naïve Baies
-- Ensemble Parametric >>Random >> Linear &Forest,Logistic Bagging, Boosting
- Non-Parametric >> DT, KNN
For
- b :multiple X vars : y(Slope),
beta or cofficient = m1x1 c+ =m2x2 + m3x3…..
constant or alpha + c(Intercept)
-- Bita Beta:(b) Every unit increase in X Var, Y will change
: corr*(stdev_y/stdev_x)
-5:Constant (c)
Identify Objective : Mean_y >> - Eg:
(b*Mean_x)
Minimise Error [Sum of Squared Error (SSE)]
6: Converts whole prb into Optimization Prob
(Y
- Ordinary - YBar)^2Least || Squared
Y : Actual Y, Ybar : Predicted Y
Regression
-- Remove OLS Regression X vars which>> Comeup has >0.05withP-value
best fit line which has Minimum SSE
- Pred_Y : model.predict(df)
- R-squared - Predicted_Y : Accuracy- Avg_Y|| It should be high while building model
- Best fit line - Base model || Base Model : Straight line at Avg
-SSR Sum / SST
of Squared Total
- SSE + SSR
Adj
in stats R^2model, every increase in unit Y, X will increase. But it's not correct way, 1st we need to convert
Bita value in standard format
How model fits in real scenario
Train
-- Underfitting & Test or: Model Devlopment & Validation
performing
Used to see HTML summary of DFbad on Training Data
- Understand Overfitting
Go anaconda : Data
Model
prompt performed
>> Installwell on Training but>>
pandas_profiling failed
pip on testing
install pandas_profiling OR conda install
-pandas_profiling df.describe()
- Boxplot to identify Outliers
import pandas_profiling
-- Histogram reportrow
Drop to check
where Y varY var
is||is normally
= pandas_profiling.ProfileReport(df)
null >>not, dist or not = 0, subset = ['Y_Var'])
df.dropna(axis
-- Check Normally Distributed
special char
report.to_file('report.html') If then perform log transformation to bring it normal
- Dividing X_Vars
Col name
Data into
should Y var, Numwith
be correlated
correction >>
& Cat dataset
Y_Var
df.columns.str.replace(".", "_")
-- Outlier Variance
X_Vars & /missing
shoulAvg(x)not value
be Treatement
||correlated
Variance >>for
with (x Num_Vars
each - Avg(x))^2
other
-- Replace Changing Data
missing types >> df.ColNm.astype("float64")
values with mode for Cat_Vars
No missing values
df.ColNm.str.split("-", expand
-- Create No outliers dummy for Cat Vars >> =pd.get_dummies(df_cat,
True) drop_first = True)
-- Combine RFE
Not >>
sufferingDatasets
Recursive from >>
Feature pd.concat([df_Num,
Elimination
hetroscedasticity (Variance df_cat_dummies,
is increasing overdf_Y], axis=1]
a(Hypothesis
period of tym)
-- Eg: Check
Value We Y want
Var
should is3beX <2,
vars
normal >>
or
drop It internally
not:
var >2 runs several types iteration Tests) >> Y ~ X1 + X2 +
-X3… FData
- Regression
is on same >>scale
aka Univariate
(Used in Regression
KNN, K-Means, PCA, etc)
-- Check K - If np.log()
Best Collenarity can't normalise,
b/w X & X then there are several other ways to do it
--- If Select
- Can
VIF
Correlation
top Xisvars
search
value b/w >2 with
onXthen minhave
google,
& X we
P-Val
eg: toordrop
squared,drops X_Var
cube,
that etc.1with
Var by 1Max P-Val VIF
& re-run & again runs iteration
--- from If
It there
VIFrun
are 10 X vars, and we
statsmodels.stats.outliers_influence
>>iteration
Variationon Y_Var vs
Inflation every X_Var and gives P-Value & F-value & drop single X var in each
Factor
want 3 vars, then
import it will run 7 iterations,
variance_inflation_factor
iteration
- from Y ~ X1, patsy
Y ~ import
X2, Y ~ dmatrices
X3, ….
-- Here from= we
VIF can provide number of import
sklearn.feature_selection
[variance_inflation_factor(df.values, X_Vars RFEwe i) want
for cani(nin=range(df.shape[1])]
?)
-arrays As
Andsorting on Min
F-classifier or PChi
- Value
Square ortest
Maxwhich
F - Value,
want we
to choose number of X_Vars||we want
perform
df.values will give
-- Here from of all columns
sklearn.feature_selection import f_regression
- VIF = Kpd.Series(VIF) means number of X_Vars
-- col from sklearn.feature_selection
= pd.Series(df.columns) import SelectKBest, f_classif, chi2
-- VIF =be pd.concat([col, VIF],variables
axis = 1)
-- Will Split
import Data taking
intoonlyTrainingthose
statsmodels.formula.api & Testingaswhich smf
fall under - 0.5 to 0.5
-- from Its range is -1 to 1
modelsklearn.model_selection
- Implement = smf.ols(Y
prediction ~ X1model
+ X2 +on import
X3…. train_test_split
, data
Train = train).fit()
& Test || If= we taken log Y, then we have to take its exponential
train_test_split(df,
-- print(model.summary2()) test_size = 0.3, random_state 123)
Train_Pred = np.exp(model.predict(train))
-- Drop Test_Pred X var =which has highest P-value & re-run iteration
nop.exp(model.predict(test))
- Mean Absolute
Check accuracy, MAPE, SSE, Percent ErrorRMSE,
(0 to 100)
Correlation it should similar for both Train & Test / positive & high
-- Should be as low
from sklearn import metrics as possible
-- SortGives Y over
& Pred_Yall error
metrics.mean_squared_error(Y, ||inGroup
data themY_Pred) in 10 parts
-- Take
np.mean(np.abs((Actual_Y
Avg of each
rootYof&itPred_Y part
to measure & make -
RMSEPred_Y)
a line / Actual_Y))*100
chart || Line should overlap each other
- Corr b/w
If it shows any differenceshould be positive
at any level & very
than high to identify
it is easy
-- Train & Test
stats.stats.pearsonr(Y,RMSE / MSE should
Pred_Y) be similar
- pd.qcut(train['Decile_no'], 10, labels = False) || will give numbering which help to sort df
-- Training
Group by&itTesting on deciles shouldwithhaveAvg similar values
of Y & Pred_Y
- It should be similar for Train & Test
- It uses a function to normalize Y values
- It has 2 functions to do so:
True Positive
- Probit False Positive (Precision)
- Sigmoid
False Curve >>True
Negative
- Logit S shaped
Negative curve
-- Logit exp(mx + c) / 1 + ex(mx +
funtion internally converts
(Senstivity) c)
(Specificity) = Probability
Y (0 & 1) to normal (0.29, 0.78, etc) as probability
-- Here, Then it we can comeup with best fitDistribution,
Y value is 0 or 1 aka Bernauli line so cannot come up with best fit line
-- But there are function
Senstivity >> TP / (TP + FN) which brings it into
|| Should function
Normal be as high>> asGLM
possible
- Specificity When we
A cutoffonly >>
use
point TN
GLM /
which (FP +
with TN)
Logit function,
decided value ||
it is
to be(0,0 1)Should
known
or 1 be
as as high
Logisticas possible
Regression
-- Solves Binary / /(TP
Binomial Problems
- Precision ROC >>
(Receiving
Multiclass
TPOperator
Problems
+ FP)
Curve) >> Gives optimal Threshold value
-- Accuracy Totsl 1's / >>Total(TP1's+ +TN) / (TP
Total 0's+ TN + FP + FN)
-- F1 Score
Concordance >> (2*Recall) + Precision / (Precision + Recall) || Higher is better
-- AUC Concordance >> Sum (Concordance) / Sum (Discordance) || If >1 then good else bad
- Discordance >> 1 - Concordance
-Max[Abs(Bad In 1st 4 to 5 Deciles,
- Good)]we>>should havebe
It should 70intotop80% of customers
5 deciles
- Lift chart
-- Classification
EDA
-- Some times b/w
Correlation we get X &2(WOE)
Ydifferent
should be dataset
partially >>fulfilled
Prev & >>Newdf.corrwith(df.Y)
Weight
-- Sometimes
import of Evidence
we get in a single
statsmodels.formula.api dataas Actuals
sm & New
Correlation Db/w X
--- model Sommer's (Ginni)
=reduction
sm.logit('Y_Var ~ X1, X2, ...', df).fit()
-- Feature
VIF
p = model.predict(df)
-- RFE (Concordance - Discordance) / (Concordancep) + Discordance + Tie)
-- X ORvarwith
AUC_score Random
should
>> 2 * AUC -
Forest Classifier
=categorical
metrics.roc_auc_score(Y_var,
1
SommersD
If X_VarD=include
2 * AUC_score then -1
- Eg:Sommer's should income,
be high so it’swe will bin
a good modelit in 4 parts
-ThisSommer's
is for allDX_Vars,
>> Y ~but X1,weY ~need
X2, ….
with|| Just lyk F X_Vars
individually Regression
|| Code in image >>
Take vars which
- Confusion Metrixhas higher sommersD value
- import statsmodels.formula.api as sm
- Senstivity
-- model = sm.logit('Y_Var
- Specificity ~Train
X1, X2, ...', df).fit()
- Calculate
Thres = Sommer's
Total 1's
print(model.summary2())
- Precision / D for1's
(Total +
||
& Test
Total 0's) ||X vars
Remove
it should be high & similar
which has high P-value and re run model
-- AUC
< Thresscore
= 0 || >Thesh = 1
- Accuracy
- Concordance (Have to explore from google)
- Py >> print(metrics.classification_report(Actual_Y, Pred_Y))
- OR >> Max(Senstivity + Specificity)
- Decile Analysis (KS Value)
Interquartile Ratio
- -Upper:
Q3 - Q1 Q3 + 1.5*IQR
fuzz.ratio('geeksforgeeks',
-pip Lower: Q1 - 1.5*IQR 'geeksgeeks')
-87 Q1:install
1/4 *fuzzywuzzy
(n+1)
query
pip = 'geeks for geeks'
Q3:install
-choices python-Levenshtein
3/4= *['geek
(n+1)for geek', 'geek geek', 'g. for geeks']
fuzz.ratio('GeeksforGeeks',
fuzz.partial_ratio("geeks for'GeeksforGeeks')
geeks", "geeks for geeks!")
from
100 fuzzywuzzy import fuzz
#from
100Get a list of
fuzzywuzzy matches
importordered
process by score, default limit to 5
fuzz.token_sort_ratio("geeks
fuzz.ratio('geeks for geeks',
process.extract(query, for geeks",
'Geeks
choices) "for geeks
For Geeks ') geeks")
100
80
[('geeks geeks', 95), ('g. for
fuzz.partial_ratio("geeks forgeeks',
geeks",95), ('geek
"geeks for geek', 93)]
geeks")
64
fuzz.token_sort_ratio("geeks
# If we want only the top onefor geeks", "geeks for for geeks")
88
process.extractOne(query, choices)
('geeks geeks', 95)
- Nominal Encoding:
- Ordinal Encoding:
Classes Topics
Class 21 Customer Segmentation
Types of Techniques
Regression Problems
Classification Problems
Segmentation Problems
Forecasting Problem
Class 22 Bias
Regularization
Cross Validation - Kfold
Validation
KNN
Similarity Metrics
Scaling of Data
Distance
Correlation
Cosine Similarity
Z_transformation (Standard
Scaler)
Min-Max Scaler
Weightage
Uniform & Distance
Parameters for KNN
GridsearchCV
Best Fit Line
KNN Imputation
Naïve Bayes
Probability Understanding in
detail
Limitations of NB
Types of NB in py
NB py
DT Regressor
DT Segmentation
DT as Feature Reduction
Technique
DT in Grouping Vars
Class 25 Advantages of DT
Disadvantages of DT
Py Implementation
pydotplus
Graphviz
DT Tunning parameters
DT Tunning Parameters
DT Feature importance
Ensemble Learning
Classification of ensemble models
Homogenious Ensembling
Hetrogenious Ensembling
Bagging
Bag vs Out of Bag
Tunning Parameters of EL
Bagging & Random Forest
Py Implementation
Class 1 NLP
Real World NLP Apps
Common NLP Task
Approaches to NLP
Heuristic Approaches
Machine learning
Approaches
Deep Learning
Approaches
Challenges in NLP
Class 2
ww.youtube.com/watch?v=zlUpTlaxAKI&list=PLKnIA16_RmvZo7fp5kkIth6nRTeQQsjfX
- It is a subfiled of CS, AI & Human Language
-- Search Engines
-- Chatboats
Text Parsing
-- Speech
MachinetoLearning
text: Voice Type
Methods
-- Deep Learning Methods
Wordnet
-- LDA
Open Mind Common Sence
-- Hidden Markov
Transformers Models
(Too much used, boosted NLP)
-- Autoencoders
Creativity
- Diversity
EDA Process - Import important libraries
- If X vars are more, then remove few of them manually
- If Column contains >25% null, then drop that column
- If Y Var is missing, remove that row
-- Separate
Outlier & Num_Vars, Cat_Vars
missing values capping& Y_Var
on Num_Vars
- Capping by mode on Cat_Vars
* Creating Dummy Vars (0,1 flag) >> When Independent Vars
* Creating Label encoding (Numbering the vars) >> When dependent vars
-- Concat both data
Check Y_Var (Num Vars,
is normally Cat Num
distributed vars)
or not
(Regression Problem) - If5.not
VIFthen take log on it
* Soomer's D (Gini) >> Classification Problem
- Taking Unique Col names after above process
- Train Test Split
- Rebuild final Model
(Regression Problem) - Convert Y value normal (exp) if converted into log
(Regression Problem) - Compare it with Test Data
* AUC ROC Score >> metrics.roc_auc_score()
(Classification Problem) * Gini
- Decide cutoff point & Make data into booliean form
- Compare it with Test Data
https://www.youtube.com/watch?v=qCR2Weh64h4&list=PLzMcBGfZo4-lUA8uGjeXhBUUzPYc6vZRn
Tutorial 1
Variability Std Deviation: SQRT of Variance || Variance : Avg Squared Dev
Standard Deviation SQRT[SIGMA( Sample - Sample Mean )^2 / No. of Samples -1] || SQRT(sigma(X-Xbar)^2/n-1)
Squared Deviation (Mean - X)^2
Variance Formula Parameter / a^2 : sigma(X-u)^2 / N || u = Population mean , N = No. of Population
3/6 Sigma Rule For Normal Distributed Data
Standard Error Rule/1SQRT(n)
STD : +- STD_1 will cover 68.2% of Data
Error Types - Type 1 Error : Rejecting Null while it should be accepted
Test we have to perfrom -- TType
test2 Error : Accepting Null while it should be rejected
in Py - Onescipy.stats
Sample T test >> 1 Sample vs 1 value
Py Implementation - import as stats
SQL
Remove Duplicate Records
Lead / Lag Date Function
Replacing Mbl No with 'xxx'
3rd Highest Salary
To check data is not present in both tbls
Only Columns of table
Split KG & Gram
Generalization (Bottom to Up)
Specialization (Up to bottom)
Tableau
LOD Syntax
Sets
Python
Iterators Vs Generators
- orderby
- limit
WHERE row_num = 1
); date_part('days',lag(date,1) over(partition by customer_id order by date desc) - date) as order_Date_diff
from order_table ) a
CONCAT(SUBSTR(phone, 1, LENGTH(phone) - 5), 'xxxxx') as update_mbl
from tbl_nm
from tbl_nm) a
where
LEFT JOINrn =table1
3 ON table2.id = table1.id
WHERE table1.id IS NULL;
select * from Employee
limitsplit_part(weight::text,
0 '.', 2) AS second_part
from employee
Creating new (Higher Level) entity by combining lower level entities
- Eg: Tbl1: Employee , Tbl2: Customer >> Tbl3: Name, Add, Ph no.
Vice versa of Generalization
-{ To Create
FIXED Iterators
[Customer we use
Name] iter() keyword
: SUM([Sales]) } || Iter process value 1 by 1
-Helped
To Create
to filter out Top, by condition, etc along with function || Yield keyword save local variables
Generator we use yeild keyword
import math
-and-for-loop-exercise-with-solutions/#h-exercise-1-print-first-10-natural-numbers-using-while-loop