PGDDS Syllabus Final (2025)
PGDDS Syllabus Final (2025)
PERIYAKULAM
PG DEPARTMENT OF COMPUTER SCIENCE
POST GRADUATE DIPLOMA IN DATA SCIENCE (PGDDS)
For each course, there will be Continuous Internal Assessment (CIA) and Semester
Examination (External). The Weightage ratio is
Paper Internal External Total
Theory 25 75 100
Practical 40 60 100
1
The Internal Components are: Passing Minimum
Semester Examination
Practical
Theory 50% out of 75 Marks
Internal Test (2) 15
(i.e. 37.5 Marks)
Lab Work 10
Practical 50% out of 60 Marks
Record 10
(i.e. 30 Marks)
Attendance 05
Total 40
Part - B
2 Questions × 5 Marks = 10 Marks
(Internal Choice and One Question from Each Unit)
Part - C
2 Questions × 10 Marks = 20 Marks
(Open Choice, Two Questions out of Three)
1
P.G. DIPLOMA PROGRAMME OUTCOMES
PO.
NO. UPON COMPLETION OF THIS PROGRAMME THE STUDENTS WILL BE ABLE
TO
PO-1. Endow with in-depth knowledge, analyze and apply the understanding of their
discipline for the betterment of self and society.
PO-2. Synthesize ideas from various disciplines, enhance the interdisciplinary knowledge
and extend it for research.
PO-3. Gain confidence and skills to communicate orally/ verbally in research platforms and
state clear research finding.
PO-4. Develop problem solving and computational skills and gain confidence to appear for
the competitive examination.
PO-5. Enhance knowledge regarding research by accumulating practical knowledge in
specific areas of research.
PO-6. Achieve idealistic goals and enrich the values to tackle the societal challenges.
2
PGDDS COURSE PATTERN
(Affiliated to Mother Teresa University, Kodaikanal)
30 18
25PGDDS04 Machine Learning Algorithms 6 4
25PGDDS05 Data Analytics Using R 6 4
25PGDDS06 Internet of Things 6 4
30 18
60 36
TOTAL
3
FUNDAMENTALS OF DATA SCIENCE
Semester : I Hours: 6
Code : 25PGDDS01 Credits: 4
UNIT I
Introduction: Basics of Data Science. Decision Theory: Descriptive Analytics -
Diagnostic Analytics - Predictive Analytics - Prescriptive Analytics. Estimation
Theory: Basic Terminologies - Good Estimator. Coordinate Systems:
Geographic Coordinate System - Projected Coordinate System - Local Coordinate
System- Projections. Linear Transformations: Steps in Linear Transformation -
Common Transformations. Graph Theory: Fundamentals of Graphs - Types of
Graphs. Algorithms: Regression Algorithms - K-Nearest Neighbors (KNN)
Algorithm - Clustering Algorithms - Artificial Neural Networks. Machine
Learning: Categories of Machine Learning - Common Machine Learning
Frameworks. (18 Hours)
UNIT II
Data Collection, Modelling, and Compilation: Data Collection - Cleaning and
Organizing Data - Modeling Machine Learning. Data Analysis: Data Analysis
Methods. Data Presentation and Visualization: Types of Data Presentation
Frequency Distribution - Data Visualization in R. Data Science Software
Tools: RapidMiner. Programming Languages for Data Science: R for Data
Science - Python for Data Science. Applications of Data Science. (18
Hours)
UNIT III
Understanding Big Data: Concepts and Terminology - Big Data Characteristics
- Different Types of Data. Business Motivations and Drivers for Big Data
Adoption: Marketplace Dynamics - Business Architecture - Business Process
Management - Information and Communications Technology - Internet of
4
Everything (loE). (18 Hours)
UNIT IV
Big Data Adoption and Planning Considerations: Organization Prerequisites -
Data Procurement – Privacy – Security – Provenance - Limited Realtime Support
- Distinct Performance Challenges - Distinct Governance Requirements -
Distinct Methodology – Clouds - Big Data Analytics Lifecycle. Enterprise
Technologies and Big Data Business Intelligence: Online Transaction
Processing (OLTP) - Online Analytical Processing (OLAP) – Extract
Transform Load (ETL) - Data Warehouse’s – Data Marts – Traditional Bl –
Big Data Bl. (18 Hours)
UNIT V
Big Data Storage Concepts: Clusters – File Systems and Distributed File Systems
– NoSQL – Sharding – Replication - Sharding and Replication – CAP Theorem –
ACID – BASE. Big Data Processing Concepts: Parallel Data Processing -
Distributed Data Processing – Hadoop – Processing Workloads – Cluster –
Processing in Batch Mode - Processing in Realtime Mode. (18 Hours)
UNIT IV : Chapters : 3, 4
UNIT V : Chapters : 5, 6
5
DESCRIPTIVE STATISTICS AND PROBABILITY
Semester : I Hours: 6
Code : 25PGDDS02 Credits: 4
Upon completion of this course students will be able to
Acquire the basic knowledge of statistics history and analyze statistical data
graphically using frequency distributions.
Analyse the concept of correlation and regression for relating two or more related
variables and impart the skills to develop Association of Attributes.
UNIT I
UNIT II
UNIT III
Theory of Probability: Introduction - Basic Terminology - Mathematical or Classical
Probability - Statistical or Empirical Probability – Subjective Probability -
Mathematical Tools - Axiomatic Approach to Probability – Some Theorems on
Probability - Conditional Probability – Multiplication Theorem of Probability -
Independent Events - Baye’s Theorem. (18 Hours)
UNIT IV
7
DATA ANALYSIS USING PYTHON
Semester : I Hours:6
Subject Code: 25PGDDS03 Credits:4
Interpret Python syntax and semantics and be fluent in the use of Python
programming statements.
UNIT I
Introduction: Software Development - History of Python Programming Language -
Thrust Areas of Python - Installing Anaconda Python Distribution - Installing
PyCharm IDE to Set Up a Python Development Environment - Creating and
Running Your First Python Project - Installing and Using Jupyter Notebook. Parts
of Python Programming Language: Identifiers - Keywords – Statements and
Expressions - Variables – Operators – Precedence and Associativity - Data Types
- Indentation - Comments - Reading Input - Print Output - Type Conversions.
(18 Hours)
UNIT II
Control Flow Statements: The if Decision Control Flow Statement- The if…else
Decision Control Flow Statement - The if…elif…else Decision Control Statement -
Nested if Statement - The while Loop - The for Loop - The continue and break
Statements - Catching Exceptions Using try and except Statement. Functions: Built-
In Functions - Commonly Used Modules - Function Definition and Calling the
Function - The return Statement and void Function - Scope and Lifetime of
Variables - Default Parameters - Keyword Arguments - *args and **kwargs -
Command Line Arguments. (18 Hours)
UNIT III
Strings: Creating and Storing Strings - Basic String Operations - Accessing
Characters in String by Index Number - String Slicing and Joining - String
Methods - Formatting Strings. Lists: Creating Lists - Basic List Operations -
Indexing and Slicing in Lists - Built-In Functions Used on Lists - List Methods - The
8
del Statement. Dictionaries: Creating Dictionary - Accessing and Modifying
key:value Pairs in Dictionaries - Built-In Functions Used on Dictionaries -
Dictionary Methods - The del Statement. (18 Hours)
UNIT IV
Tuples and Sets: Creating Tuples - Basic Tuple Operations - Indexing and Slicing in
Tuples - Built-In Functions Used on Tuples - Relation between Tuples and Lists -
Relation between Tuples and Dictionaries - Tuple Methods - Using zip() Function -
Sets - Set Methods – Frozenset. (18 Hours)
UNIT V
Data Manipulation with Pandas: Installing and Using Pandas – Data Indexing
and Selection – Operations on Data in Pandas – Handling Missing Data.
Visualization with Matplotlib: General Matplotlib Tips - Two Interfaces for
the Price of One - Simple Line Plots - Visualizing Errors - Density and Contour
Plots - Histograms, Binnings, and Density, Customizing Matplotlib:
Configurations and Stylesheets - Three-Dimensional Plotting in Matplotlib -
Geographic Data with Basemap - Visualization with Seaborn. (18 Hours)
:8
2. “Python Data Science Handbook - Essential Tools for Working with
Data”, Jake VanderPlas, O’Reilly Media, Inc., First Edition, 2017.
UNIT V : Chapters : 3, 4
BOOKS OF REFERENCE
1. "Introduction to computing & Problem Solving with Python ", Jeeva Jose, P.
Sojan Lal, Khanna Book Publishing Co. (P) LTD., 2020.
2. “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and
IPython”, William McKinney, O'Reilly, Second Edition, 2017.
9
DATA ANALYSIS USING PYTHON - LAB
Semester: I Hours: 6
Code : 25PGDDSP1 Credits: 3
Implement fundamental data structures like lists, tuples, and dictionaries, and
use their associated methods for manipulation.
LIST OF PRACTICALS
10
DATA ANALYSIS USING SPREADSHEET – LAB
Semester : I Hours: 6
Code : 25PGDDSP2 Credit: 3
Visualize data using charts, graphs, and pivot tables to derive insights and present
findings effectively.
LIST OF PRACTICALS
Introduction to Spreadsheets
1. Reading data into Excel using various formats
2. Basic functions in Excel (Arithmetic, Logical functions)
3. Using formulas in Excel and their copy and paste using absolute and relative
referencing
Spreadsheet Functions to Organize Data
4. IF and the nested IF functions
5. VLOOKUP and HLOOKUP
6. The RANDBETWEEN function
11
MACHINE LEARNING ALGORITHMS
Semester : II Hours: 6
Code : 25PGDDS04 Credits: 4
Upon completion of this course students will be able to
UNIT I
Introduction to Machine Learning: Types of Machine Learning - Application of
Machine Learning - Hypothesis Space - Inductive Bias - Evaluation and Cross
Validation. (18 Hours)
UNIT II
Basic Machine Learning Algorithms: Linear Regression - Decision Tree -
Basic Decision Tree Learning Algorithm - K-nearest Neighbour - Collaborative
Filtering - Overfitting.
(18 Hours)
UNIT III
Dimensionality Reduction: Introduction of Dimensionality Reduction-
Feature and Feature Engineering-Feature Transformation - Feature Subset
Selection- Bayesian Concept of Learning: Importance of Bayesian
Methods - Bayes Theorem – Bayes’ Theorem and Concept Learning-
Bayesian Belief Network. (18 Hours)
UNIT IV
Logistic Regression and Support Vector Machine - Logistic Regression
- Introduction to Support Vector Machine - Kernel Methods for Non-
linearity. Basics of Neural Network: Introduction to Neural Network-
Biological Neurons – Architecture of Neural Network - Implementation of
ANN - Backpropagation Algorithm - Deep Learning. (18 Hours)
UNIT V
Computation and Ensemble Learning: Introduction to Computation Learning
- Sample Complexity: Finite Hypothesis Space - Introduction to Ensembles -
12
Basic Concepts of Clustering: Introduction to Clustering - Hierarchical
Clustering- Agglomerative Hierarchical Clustering. (18 Hours)
UNIT I : Chapter :1
UNIT II : Chapter :2
UNIT III : Chapters : 3, 4
UNIT IV : Chapters : 5, 6
UNIT V : Chapters : 7, 8
13
DATA ANALYTICS USING R
Semester : II Hours: 6
Code : 25PGDDS05 Credits: 4
Apply analytical knowledge with the R interface and language for different fields.
Cultivating cognitive skills acquired on existing data and perform all conventional
statistical analysis tests using R knowledge on data management.
UNIT I:
UNIT II
(18 Hours)
UNIT IV
Dimensionality Reduction Techniques: Dimensionality Reduction –
Independent and Dependent Variables. Relationship between Variables:
Correlation: Application of Factor Analysis using R Programming –
Multicollinearity. Factor Analysis: Eigen Value – Scree Plot – Unrotated Factor
Matrix – Rotated Factor Matrix. Unsupervised Learning Algorithms:
Introduction – Association Rule Mining: Transaction Dataset – Support –
14
Confidence – Lift – Apriori Algorithm – Association Rule – Plotting of Rules.
Conjoint Analysis: Full and Fractional Factorial Design – Choice Cards –
Attribute Importance. (18 Hours)
UNIT V :
Supervised Learning Algorithms: Decision Tree and Random Forest:
Decision Tree – Tree Structure – Criteria for Splitting Decision Node.
Classification and Regression Technique: Control Parameters – Pruning
the Tree – Model Performance Measures – Insights from Decision Rules.
Random Forest: Control Parameters – Out of Bag Error Rate – Tuning the
Random Forest – Variable Importance Plot – Model Performance Measures.
Supervised Learning Algorithm: K-Nearest Neighbors: Similarity Based on
Distance Function – Select Appropriate K Value – KNN Model
Building – Model Performance Measures. Naive Bayes Algorithm: Types
of Naïve Bayes Theorem – Building Naïve Bayes Classifier – Model
Performance Measures. (18 Hours)
15
INTERNET OF THINGS
Semester: II Hours: 6
Code : 25PGDDS06 Credits: 4
UNIT III
Cloud for IoT: Introduction – IoT with Cloud – Challenges – Selection of Cloud
Service Provider: An Overview – Introduction to Fog Computing – Cloud
Computing: Security Aspects. (18 Hours)
UNIT IV
DATA ANALYTICS - Visualising the power of Data from IoT: Introduction – Data
Analysis – Machine Learning – Types of Machine Learning Models – Model
Building Process – Modelling Algorithms – Model Performance - Big Data Platform
– Big Data Pipeline – Real Life Projects – Recommendations in IoT Gadgets.
(18 Hours)
UNIT V
Application Building with IoT: Introduction – Smart Perishable Tracking with IoT
and Sensors – Smart Healthcare-Elderly Fall Detection with IoT and Sensors – Smart
16
Inflight Lavatory Maintenance with IoT – IoT-Based Application to Monitor Water
Quality – Smart Warehouse Monitoring - Smart Retail – IoT Possibilities in the Retail
Sector – Prevention of Drowsiness of Drivers by IoT – Based Smart Driver Assistant
Systems – System to Measure Collision Impact in an Accident with IoT – Integrated
Vehicle Health Management. (18 Hours)
Unit I : Chapters : 1, 2
Unit II : Chapters : 3, 4
Unit III : Chapter : 5
Unit IV : Chapter : 6
Unit V : Chapter : 7
17
DATA ANALYTICS USING R - LAB
Semester : II Hours: 6
Code : 25PGDDSP3 Credits: 3
Upon completion of this course students will be able to
Understand the basic syntax and programming concepts in R, including data
types, variables, and operators.
Visualize data using R libraries such as ggplot2, creating bar charts, scatter plots,
histograms, and line graphs.
Solve real-world problems by working with structured and unstructured
datasets in R.
LIST OF PRACTICALS
18
INTERNET OF THINGS - LAB
Semester : II Hours: 6
Code : 25PGDDSP4 Credits: 3
Upon completion of this course students will be able to
Understand the basic concepts and architecture of IoT, including hardware
components, sensors, and actuators.
Establish communication between IoT devices and cloud services using protocols
such as MQTT and HTTP
Develop end-to-end IoT applications that integrate sensing, processing, and
communication.
LIST OF PRACTICALS
29