Data Mining
Data Mining
Overview
Technologies
Introduction
Applications
By
Dr. Nora Shoaip
Lecture1
Damanhour University
Faculty of Computers & Information Sciences
Department of Information Systems
2023 - 2024
Outline
Course overview
• Textbook and Course Coverage
• Lab work, Assignments, and Exams
• Course Project and Evaluation
Introduction
• Why Data Mining?
• What is Data Mining?
• Knowledge Discovery Process
• What Kinds of Data Can Be Mined?
• What Kinds of Patterns Can Be Mined?
• What Kinds of Technologies Are Used?
• What Kinds of Applications are Targeted?
• Major Issues in Data Mining
2
Objectives
21
3
Textbook and Course
Coverage
21
4
• Instructor: Dr. Nora Shoaip
Associate Professor FCI- Damanhour University
Department of Information Systems
Schedule
• Teaching Assistant : Dr. Hasnaa Sayed Ahmed
Assistant Teacher at FCI- Damanhour University
Department of Information Systems
5
Course Coverage
1 Introduction
3 Data Preprocessing
8 Outlier Analysis
6
Lab Coverage
1 Python Fundamentals
2 Pandas & Files
3 Data Manipulation with pandas
4 Frequent Itemsets algorithms implementations in pythons
a priori algorithms –Fp-Growth
5 Classification algorithms implementations in pythons
Decision Tree - K-Nearest Neighbor -Naïve Bayesian – linear
regression
6 Cluster algorithms implementations in pythons
7 Data visualization with different charts in pythons
8 Outlier
7
Assessment
No. Description Week No. Marks
1 Quizzes Weekly 10
2 Midterm Exam week 7 15
3 Mini-Project From week 3 to week 10 10
4 Practical Exam From week 14 to week 15 15
Total Term-Work 50
Final Exam 50
Total Mark 100
8
Introduction
• Why Data Mining?
• What Is Data Mining?
• What Kinds of Data Can Be Mined?
• What Kinds of Patterns Can Be Mined?
• What Kinds of Technologies Are Used?
• What Kinds of Applications Are Targeted?
• Major Issues in Data Mining
9
Why Data Mining
what is Data Mining
Why Data Mining?
The era of Explosive Growth of Data: in the petabytes!
Automated data collection and availability: tools, database systems, Web,
computerized society
Major sources of abundant data
Business: Web, transactions, stocks, …
Science: Remote sensing, bioinformatics, …
Society and everyone: news, digital cameras, social feeds
The ability to economically store and manage petabytes of data online
The Internet and computing Grid that makes all these archives universally accessible
Linear growth of data management tasks with data volumes
Massive data volumes, but still little insight!
Solution! Data mining—The automated analysis of massive data sets
11
What is Data Mining?
Data mining (knowledge discovery from data)
o Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful) patterns
or knowledge from huge amount of data
o Data mining: a misnomer?
• Alternative names
o Knowledge discovery (mining) in databases (KDD),
knowledge extraction, data/pattern analysis,
information harvesting, business intelligence, etc.
• Is everything “data mining”?
o Simple search and query processing
o (Deductive) expert systems
21
12
Knowledge Discovery Process
21
13
What Kinds of Data Can Be Mined?
Database-oriented data sets and applications
Relational database, data warehouse, transactional database
Advanced data sets and advanced applications
Data streams and sensor data
Time-series data, temporal data, sequence data (incl. bio-sequences)
Structure data, graphs, social networks and multi-linked data
Object-relational databases
Heterogeneous databases and legacy databases
Spatial data and spatiotemporal data
Multimedia database
Text databases
The World-Wide Web
21
14
Generalization
What
Kinds of Association/ Correlation analysis
Patterns
Classification
Can Be
Mined? Cluster analysis
Outlier analysis
15
Generalization
• Information integration and data warehouse
construction
o Data cleaning, transformation, integration, and
multidimensional data model
25
19
Time and Ordering
• Sequence, trend and evolution analysis
Trend, time-series, and deviation analysis: e.g., regression
and value prediction
Sequential pattern mining
e.g., first buy digital camera, then buy large SD memory cards
Periodicity analysis
Similarity-based analysis
• Mining data streams
Ordered, time-varying, potentially infinite, data streams
25
20
Evaluation of Knowledge
Are all mined knowledge interesting?
One can mine tremendous amount of “patterns” and knowledge
Some may fit only certain dimension space (time, location, …)
Some may not be representative, may be transient, …
Evaluation of mined knowledge → directly mine only interesting
knowledge?
Descriptive vs. predictive
Coverage
Typicality vs. novelty
Accuracy
Timeliness
…
25
21
What Kinds of Technologies Are Used?
Pattern
Recognitio
High- n
Performanc Statistics
e
Computing
Novelty
Machine Data Visualizati
Learning Mining on
Streams, real-
time, spatio-
temporal, Application Heterogeneity
DB
s
multimedia, Systems
text, web, … Algorith
ms
Scalability Dimensionality
21
22
What Kinds of Applications are Targeted?
• Web page analysis: from web page classification, clustering to
PageRank & HITS algorithms
• Collaborative analysis & recommender systems
• Basket data analysis to targeted marketing
• Biological and medical data analysis: classification, cluster
analysis (microarray data analysis), biological sequence
analysis, biological network analysis
• Data mining and software engineering
• From major dedicated data mining systems/tools (e.g., SAS, MS
SQL-Server Analysis Manager, Oracle Data Mining Tools) to
invisible data mining
21
23
Major Issues in Data Mining
Mining Methodology
Mining various and new kinds of knowledge
Mining knowledge in multi-dimensional space
Boosting the power of discovery in a networked
environment
Handling noise, uncertainty, and incompleteness of data
Pattern evaluation and pattern- or constraint-guided mining
User Interaction
Interactive mining
Incorporation of background knowledge
Presentation and visualization of data mining results
21
24
Major Issues in Data Mining
Efficiency and Scalability
Efficiency and scalability of data mining algorithms
Parallel, distributed, stream, and incremental mining
methods
Diversity of data types
Handling complex types of data
Mining dynamic, networked, and global data repositories
Data mining and society
Social impacts of data mining
Privacy-preserving data mining
Invisible data mining
21
25
Thank you