0% found this document useful (0 votes)

11 views

Data pre Processing

Data preprocessing involves cleaning, transforming, and organizing raw data for effective analysis. Key steps include data cleaning to address missing and noisy data, data transformation for suitable mining formats, and data reduction to enhance storage efficiency. Techniques such as normalization, clustering, and dimensionality reduction are essential for preparing data for analysis.

Uploaded by

Jaya Vishnu Priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Data pre Processing

Uploaded by

Jaya Vishnu Priya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Data pre Processing

• Data Preprocessing refers to the steps taken to

clean, transform, and organize raw data into a
structured format before analysis.

•It ensures that the data is ready for accurate and

effective analysis.
1. Data Cleaning:

• The data can have many irrelevant and missing

parts. To handle this part, data cleaning is done. It

• involves handling of missing data, noisy data etc.

(a). Missing Data:
• This situation arises when some data is missing in the
data. It can be handled in various ways.
• Some of them are:
1. Ignore the tuples:
• This approach is suitable only when the dataset we
have is quite large and multiple values are missing
within a tuple.
2.Fill the Missing values:
• There are various ways to do this task. You can
choose to fill the missing values
• manually, by attribute mean or the most probable
value.
(b) Noisy Data:
• Noisy data is a meaningless data that can’t be
interpreted by machines. It can be generated due to
faulty data collection, data entry errors etc. It can
be handled in following ways :
1. Binning Method:
Binning is the process of dividing continuous data
into smaller, defined ranges or "bins.“
2. Regression:
Here data can be made smooth by fitting it to a
regression function. The regression used may be
linear or multiple.
3. Clustering:

• This approach groups the similar data in a cluster.

The outliers may be undetected or it will fall outside
the clusters.
2. Data Transformation:
This step is taken in order to transform the data in
appropriate forms suitable for mining process.
This involves following ways:
1. Normalization:
• It is done in order to scale the data values in a
specified range (-1.0 to 1.0 or 0.0 to 1.0)
2. Attribute Selection:
• In this strategy, new attributes are constructed from
the given set of attributes to help the mining
process.
3. Discretization:
This is done to replace the raw values of numeric
attribute by interval levels or conceptual levels.

4. Concept Hierarchy Generation:

Here attributes are converted from lower level to
higher level in hierarchy. For Example
The attribute “city” can be converted to “country”.
3. Data Reduction:
• Since data mining is a technique that is used to
handle huge amount of data. While working with
huge volume of data, analysis became harder in
such cases.
• In order to get rid of this, we uses data reduction
technique. It aims to increase the storage
efficiency and reduce data storage and analysis
costs.
The various steps to data reduction are:

1. Data Cube Aggregation:

Aggregation operation is applied to data for the
construction of the data cube.

2. Attribute Subset Selection:

The highly relevant attributes should be used, rest
all can be discarded. For performing attribute
selection, one can use level of significance and p-
value of the attribute. the attribute having p-value
greater than significance level can be discarded.
3. Numerosity Reduction:
This enable to store the model of data instead of whole data,
for example: Regression Models

4. Dimensionality Reduction:
This reduce the size of data by encoding mechanisms. It can
be lossy or lossless.
If after reconstruction from compressed data, original data
can be retrieved, such reduction are called lossless
reduction else it is called lossy reduction.
The two effective methods of dimensionality reduction are:
Wavelet transforms and PCA (Principal Component Analysis).

SPPID Users Guide
93% (15)
SPPID Users Guide
705 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
4 pages
Data Preprocessing
No ratings yet
Data Preprocessing
3 pages
Data Preprocessing in Data Mining
No ratings yet
Data Preprocessing in Data Mining
3 pages
Lecture Notes Data Mining Data Warehousing Unit-2: Data Preprocessing
No ratings yet
Lecture Notes Data Mining Data Warehousing Unit-2: Data Preprocessing
3 pages
Data Preprocessing Unit 2
No ratings yet
Data Preprocessing Unit 2
3 pages
ml4
No ratings yet
ml4
17 pages
What Is Big Data Analytics
No ratings yet
What Is Big Data Analytics
3 pages
Data Mining UNIT II
No ratings yet
Data Mining UNIT II
19 pages
Notes - Unit01 - Data Science and Big Data Analytics
No ratings yet
Notes - Unit01 - Data Science and Big Data Analytics
7 pages
Major Issues in Data Mining
No ratings yet
Major Issues in Data Mining
5 pages
Data Mining
No ratings yet
Data Mining
5 pages
Data Preprocessing 013333
No ratings yet
Data Preprocessing 013333
8 pages
Unit 2 Data Warehouse and Data Mining
No ratings yet
Unit 2 Data Warehouse and Data Mining
19 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
25 pages
BUSINESS INTELLIGENCE NOTES Unit 4
No ratings yet
BUSINESS INTELLIGENCE NOTES Unit 4
10 pages
7.data Preprocessing
No ratings yet
7.data Preprocessing
12 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
25 pages
unit 2 Preprocessing in Data Mining
No ratings yet
unit 2 Preprocessing in Data Mining
6 pages
Unit 3 Dw&DM Notes Mr. Rohit Pratap Singh
No ratings yet
Unit 3 Dw&DM Notes Mr. Rohit Pratap Singh
22 pages
data preprocessing
No ratings yet
data preprocessing
8 pages
3.data Pre-Processing Concepts
No ratings yet
3.data Pre-Processing Concepts
8 pages
R Programming Unit-2
No ratings yet
R Programming Unit-2
29 pages
Data Preprocessing
No ratings yet
Data Preprocessing
28 pages
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
No ratings yet
Data Pre-Processing: - Data Cleaning - Data Integration - Data Transformation - Data Reduction - Data Discretization
55 pages
03 Data Preparation
No ratings yet
03 Data Preparation
28 pages
02 Data Warehouse
No ratings yet
02 Data Warehouse
18 pages
Data Preprocessing
No ratings yet
Data Preprocessing
33 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
52 pages
OJCST_Vol13_N2-3_p_78-81
No ratings yet
OJCST_Vol13_N2-3_p_78-81
4 pages
Data Mining
No ratings yet
Data Mining
22 pages
ICS 2408 - Lecture 2 - Data Preprocessing
No ratings yet
ICS 2408 - Lecture 2 - Data Preprocessing
29 pages
Data Integration and Data Reduction
No ratings yet
Data Integration and Data Reduction
27 pages
Unit 2 DWDM
No ratings yet
Unit 2 DWDM
14 pages
Chapter 3 - Data Pre-Processing Notes
No ratings yet
Chapter 3 - Data Pre-Processing Notes
8 pages
DWDM 3
No ratings yet
DWDM 3
12 pages
Bi Lesson 6
No ratings yet
Bi Lesson 6
36 pages
Lecture 3 Unit 1
No ratings yet
Lecture 3 Unit 1
61 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Module 2
No ratings yet
Module 2
42 pages
14. Preprocessing-Cleaning & Reduction
No ratings yet
14. Preprocessing-Cleaning & Reduction
42 pages
CH1-data Preprocessing
No ratings yet
CH1-data Preprocessing
49 pages
Practical 1 ML_removed
No ratings yet
Practical 1 ML_removed
5 pages
Normalization
No ratings yet
Normalization
35 pages
Data Warehousing - CH3
No ratings yet
Data Warehousing - CH3
15 pages
Stages in Data Mining
No ratings yet
Stages in Data Mining
11 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
55 pages
Unit - III DW
No ratings yet
Unit - III DW
14 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
Preprocessing
No ratings yet
Preprocessing
62 pages
Data transformation in data mining
No ratings yet
Data transformation in data mining
6 pages
Data Pre Processing - NG
No ratings yet
Data Pre Processing - NG
43 pages
Down 2
No ratings yet
Down 2
61 pages
CS-DM Module-2
No ratings yet
CS-DM Module-2
29 pages
Preprocessing
No ratings yet
Preprocessing
52 pages
DWM Module 2
No ratings yet
DWM Module 2
9 pages
CS-DM MODULE-2
No ratings yet
CS-DM MODULE-2
30 pages
Data Binning
No ratings yet
Data Binning
9 pages
Spatial and Temporal Data Mining
No ratings yet
Spatial and Temporal Data Mining
52 pages
BDA Class1
No ratings yet
BDA Class1
33 pages
Basic Concepts in Data Structures
From Everand
Basic Concepts in Data Structures
K.Meenendranath Reddy
No ratings yet
Data Quality
No ratings yet
Data Quality
14 pages
NLP EXP 3 (b) - Word Generation
No ratings yet
NLP EXP 3 (b) - Word Generation
2 pages
NLP EXP 3(a) - Word Analysis
No ratings yet
NLP EXP 3(a) - Word Analysis
2 pages
Natural Language Toolkit - Getting Started
No ratings yet
Natural Language Toolkit - Getting Started
5 pages
NLP EXP 8
No ratings yet
NLP EXP 8
2 pages
NLP EXP 6
No ratings yet
NLP EXP 6
1 page
NLP EXP 4
No ratings yet
NLP EXP 4
2 pages
Graph Coloring In Design And Analysis Of Algorithms
No ratings yet
Graph Coloring In Design And Analysis Of Algorithms
12 pages
Multiprotocol Label Switching_Routing And Applications
No ratings yet
Multiprotocol Label Switching_Routing And Applications
12 pages
UI PROJECT REPORT
No ratings yet
UI PROJECT REPORT
21 pages
Backup Cluster Configuration: Task Details Commands Description
No ratings yet
Backup Cluster Configuration: Task Details Commands Description
4 pages
MDLZ Brand Book
No ratings yet
MDLZ Brand Book
55 pages
Functions: Function Declaration
No ratings yet
Functions: Function Declaration
18 pages
Ebs 12 2 13 and Ecc November 2023
No ratings yet
Ebs 12 2 13 and Ecc November 2023
11 pages
TG CC 500 Manual
No ratings yet
TG CC 500 Manual
27 pages
Web API Controller
No ratings yet
Web API Controller
108 pages
Data Structures Lab Manual - Ii Bca
No ratings yet
Data Structures Lab Manual - Ii Bca
43 pages
Lab Setup
No ratings yet
Lab Setup
13 pages
Install Inst, Premier Safe Shutdown: Artwork Consists of Two (2) 8 Inch X 11 Inch Pages
No ratings yet
Install Inst, Premier Safe Shutdown: Artwork Consists of Two (2) 8 Inch X 11 Inch Pages
3 pages
PEC-104-REVIEWER
No ratings yet
PEC-104-REVIEWER
14 pages
Itsm - SR Sys Engineer SCCM
No ratings yet
Itsm - SR Sys Engineer SCCM
3 pages
The Forrester Wave™- Enterprise Firewall Solutions, Q4 2024 | 001a000001AEscYAAT
No ratings yet
The Forrester Wave™- Enterprise Firewall Solutions, Q4 2024 | 001a000001AEscYAAT
27 pages
CEGP013091: 49.248.216.238 13/01/2023 13:36:06 Static-238
No ratings yet
CEGP013091: 49.248.216.238 13/01/2023 13:36:06 Static-238
3 pages
Data Website Anu
No ratings yet
Data Website Anu
7 pages
RBS6000 Family Integration Procedure - Korek
100% (1)
RBS6000 Family Integration Procedure - Korek
17 pages
g3 Satellite Ground Station - Final Report
No ratings yet
g3 Satellite Ground Station - Final Report
24 pages
ACS009K Najji
No ratings yet
ACS009K Najji
2 pages
CNS 2
No ratings yet
CNS 2
193 pages
Evaluative Commentary Elc231
No ratings yet
Evaluative Commentary Elc231
4 pages
Invoice 12634
No ratings yet
Invoice 12634
1 page
Manual AMS 9530
No ratings yet
Manual AMS 9530
86 pages
MCQ of CISO
No ratings yet
MCQ of CISO
85 pages
History of Indian Logic - S.C.Vidyabhusana. 2. Introduction To Logic - Irving. M.Copi & Carl Cohen
No ratings yet
History of Indian Logic - S.C.Vidyabhusana. 2. Introduction To Logic - Irving. M.Copi & Carl Cohen
23 pages
CPV Di01 Ge Di01 8
No ratings yet
CPV Di01 Ge Di01 8
187 pages
Rpa Unit 1
No ratings yet
Rpa Unit 1
13 pages
Accenture Strategy & Consulting - Cheatsheet
No ratings yet
Accenture Strategy & Consulting - Cheatsheet
1 page
Mol Before Final Print
100% (1)
Mol Before Final Print
99 pages
Ppt-Math 10-Quarter 2 Week 2
No ratings yet
Ppt-Math 10-Quarter 2 Week 2
13 pages
UTStarcom iAN-02EX
No ratings yet
UTStarcom iAN-02EX
65 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data pre Processing

Uploaded by

Data pre Processing

Uploaded by

Data pre Processing

• Data Preprocessing refers to the steps taken to

•It ensures that the data is ready for accurate and

• The data can have many irrelevant and missing

• involves handling of missing data, noisy data etc.

• This approach groups the similar data in a cluster.

4. Concept Hierarchy Generation:

1. Data Cube Aggregation:

2. Attribute Subset Selection:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.