0% found this document useful (0 votes)
2 views

Grade 10 Unit 4 - Data Science

Uploaded by

suhanidevgan2009
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Grade 10 Unit 4 - Data Science

Uploaded by

suhanidevgan2009
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Unit 4: Data Science

 Introduction to Data Science


 Applications of Data Science
 Data Collection
Sources of Data
Types of Data
 Data Access
 Activities: Game: Rock, Paper & Scissors
Introduction to Data Science
AI can be classified into three broad domains:

• Data science is the study of data to extract meaningful insights for business.
• Data Sciences majorly work around analysing the data and when it comes to AI, the analysis helps in
making the machine intelligent enough to perform tasks by itself.
• Data science is a concept to combine statistics, data analysis, machine learning and their related
methods in order to understand and analyse actual phenomena with data.
• Data Science employs techniques and theories drawn from many fields within the context of
Mathematics, Statistics, Computer Science, and Information Science to analyze large amounts of data.
Applications of Data Science
Data Science is a branch of computer science where we study how to store, use and analyze data for deriving information from it.
There exist various applications of Data Science in today’s world. Some of them are:
1.Fraud and Risk Detection:
 Fraud and risk detection is crucial for protecting businesses, customers, and individuals from
financial losses and other negative impacts.
 By using data analysis and intelligent algorithms, organizations can identify and respond to
fraudulent activities and potential risks more effectively, enhancing security and trust.
Eg: when a customer approaches for a bank loan, Data science analyse the customer’s data like
customer profiling, their past debts and if they have settled debt properly or they failed to do so.
 The earliest applications of data science were in Finance. Companies were fed up of bad debts and
losses every year. However, they had a lot of data which use to get collected during the initial
paperwork while sanctioning loans. They decided to bring in data scientists in order to rescue
them from losses.

2. Genetics & Genomics:


 Genetics is the branch of biology that deals with studies heredity, which involves the passing of
traits from parents to offspring.
 Genomics,, is a broader field that includes the study of how a set of genes(genome) behave.
 Data science empowers geneticists and genomic researchers to handle, analyze, and interpret large-
scale genetic datasets, leading to insights into the structure, function, and evolution of genomes.
Genomic data scientists develop comprehensive models. These models can do things like predict the risk of common
diseases based on an individual's genetic makeup.
Genomic data is used to diagnose and monitor genetic diseases like cancer, genetic disorders, and inherited diseases.
Specific genetic markers are identified and monitored to determine the progression of a disease and treatment.
Preventive health care also uses genomics research to treat issues early and improve outcomes.
Scientists use human genomic data to investigate diseases or medical conditions, identify and assess drug targets, and
develop new treatments. Genomic data helps them develop effective drugs and personalized treatments as well as screen
and test potential drugs.
As soon as we acquire reliable personal genome data, we will achieve a deeper understanding of the human
DNA. The advanced genetic risk prediction will be a major step towards more individual care.
3.Internet Search:
Many search engines like Yahoo, Bing, Ask, AOL, and so on
(including Google) make use of data science algorithms to deliver the
best result for our searched query in the fraction of a second. Google
processes more than 20 petabytes of data every day, had there been no
data science. Google wouldn’t have been the ‘Google’ we know today.
4.Targeted Advertising The display banners on various websites to the digital billboards at the airports – almost
all of them are decided by using data science algorithms. This is the reason why digital
ads have been able to get a much higher CTR (Call-Through Rate) than traditional
advertisements. They can be targeted based on a user’s past behaviour. Eg: When a
person receives ads of fashions other may receive ads on electronics, based on their past
activities.
5.Website Recommendations
Amazon not only help us find relevant products from billions of products available with
them but also add a lot to the user experience.
A lot of companies have intensely used this engine to promote their products in accordance
with the user’s interest and relevance of information. Internet giants like Amazon, Twitter,
Google Play, Netflix, LinkedIn, IMDB and many more use this system to improve the user
experience. The recommendations are made based on previous search results for a user.
6.Airline Route Planning
Airline companies started using Data Science to identify the strategic areas of improvements and to over come the heavy
losses. Now, while using Data Science, the airline companies can:
 Predict the delay in the flight
 Decide which class of airplanes to buy
6.Airline Route Planning (Cont…)
 Whether to directly land at the destination or take a halt in between (For example, A flight can have a direct route from New
Delhi to New York. Alternatively, it can also choose to halt in any country.)
 Effectively drive customer loyalty programs
System Maps
 A system map is a visual representation shows all the components in the process and boundaries of a system and the components
of the environment at a specific point in time.
 Systems mapping is an effective tool that we can use for understanding and redesigning systems.
 It provides the relationship of various factors and impact on the Project goal.
Use of System Map
 System Map helps us to find relationships between different elements of the problem which we have scoped.
 System Map helps in strategizing the solution for achieving the goal of our project.
 System Map is used to understand complex issues with multiple factors that affect each other.
 The main use of a system map is to help structure a system and communicate the result to others
.
Components of System Map:
S.No Component Represents
1. Circle Elements of the Problem
2. Arrows Relationship
3. Longer arrow Longer time for a change to happen. Also called as time delay.
4. Arrow with + sign Both the elements are directly related to each other
5. Arrow with - sign Both the elements are inversely related to each other
System Map to show the stress Management.

System Map to show the effect of increase


in number of vehicles on the road..
Data Collection
• Data collection is an exercise which does not require even a tiny bit of technological knowledge.
• When it comes to analysing the data, it becomes a tedious process for humans as it is all about
numbers and alpha-numerical data. That is where Data Science came into the picture.
• Data collection not only gives us a clearer idea around the dataset, but also adds value to it by
providing deeper and clearer analyses around it.
Some examples of datasets which you must already be aware of are:

Sources of Data
There exist various sources of data from where we can collect any type of data required and the data
collection process can be categorised in two ways:
 Offline Data Collection - Sensors, Surveys , Interviews, Observations
 Online Data Collection - Open-sourced Government Portals, Reliable Websites (Kaggle), World
Organisations’ open-sourced statistical websites
While accessing data from any of the data sources, following points should be kept in mind:
1. Data which is available for public usage only should be taken up.
2. Personal datasets should only be used with the consent of the owner.
3. One should never breach someone’s privacy to collect data.
4. Data should only be taken form reliable sources as the data collected from random sources can be
wrong or unusable.
5. Reliable sources of data ensure the authenticity of data which helps in proper training of the AI
model.
Types of Data
For Data Science, usually the data is collected in the form of tables. These tabular datasets can be stored in
different formats. Some of the commonly used formats are:
1. CSV: CSV stands for comma separated values. It is a simple file format used to store tabular data. Each
line of this file is a data record and reach record consists of one or more fields which are separated by
commas. Since the values of records are separated by a comma, hence they are known as CSV files.
2. Spreadsheet: A Spreadsheet is a piece of paper or a computer program which is used for accounting and
recording data using rows and columns into which information can be entered. Microsoft excel is a
program which helps in creating spreadsheets.
3. SQL: SQL is a programming language also known as Structured Query Language. It is a domain-specific
language used in programming and is designed for managing data held in different kinds of DBMS
(Database Management System) It is particularly useful in handling structured data.
Data Access
After collecting the data we should be able to use it for programming purposes, we should know how to
access the same in a Python code. To make our lives easier, there exist various Python packages which help
us in accessing structured data (in tabular form) inside the code. Some of the Python packages are:
1. NumPy 2. Matplotlib 3. Pandas 4.Statistics

1.NumPy
• NumPy stands for Numerical Python, is the fundamental package for Mathematical and logical
operations on arrays in Python.
• NumPy works around numbers and gives a wide range of arithmetic operations around numbers giving
us an easier approach in working with them.
• NumPy also works with arrays. An array is nothing but a set of multiple values which are of same
datatype that is its a homogenous collection of Data.
• In NumPy, the arrays used are known as ND-arrays (N-Dimensional Arrays) as NumPy comes with a
feature of creating n-dimensional arrays in Python.
Difference Between Arrays and List
NumPy Arrays Lists
Homogenous collection of Data. It Heterogenous collection of Data. It
can contain only one type of data. contain multiple types of data.
Cannot be directly initialized. Can Can be directly initialized as it is a part
be operated with Numpy package of Python syntax.
only.
Widely used for arithmetic Widely used for data management
operations
Arrays take less memory space Lists acquire more memory space
Functions like concatenation, Functions like concatenation,
appending etc are not possible with Appending etc are possible with lists
arrays.
Can be accessed only through Can be accessed directly used in Python
package support. without any package support.
Example : Example:
import numpy A=[1,2,3,4,5,6,7,8,9,0]
A=numpy.array([1,2,3,4])
Matplotlib
 Matplotlib is an amazing visualization library in Python for 2D plots of arrays.(NumPy arrays)
 One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in
easily digestible visuals.

 Matplotlib comes with a wide variety of plots which us helps to understand trends, patterns, and to
make correlations.
Pandas [ panel data ]
Pandas is a software library written for the Python programming language for data manipulation and
analysis.
Pandas offers data structures and operations for manipulating numerical tables and time series.
Panel data is an econometrics term for data sets that include observations over multiple time periods for
the same individuals.
Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL
values. This is called cleaning the data.
Pandas is well suited for many different kinds of data:
• Pandas is a Python library used for working with data sets.
• Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
• Ordered and unordered (not necessarily fixed-frequency) time series data. data actually need not be
labelled at all to be placed into a Pandas data structure.

What is data mining ?


In simple words, data mining is defined as a process used to extract usable data from a larger set of
any raw data. It implies analysing data patterns in large batches of data using one or more software.
Data mining has applications in multiple fields, like science and research.
1. What is Data Science? List out the Applications of Data Science and explain
2. What is System map? List its components and uses.
3. List down all possible Sources of Data collection with example
4. What all essential points should we consider while accessing data from
different data sources?
5. What types of data formats that are commonly utilized in the field of data
science?
6. List out few Python packages which help us in accessing structured data?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy