Unit1
Unit1
Noida
Unit: 1
Dr.
5 Lab – I 0 0 2 25 25 50 1
6 Internship Assessment 0 0 2 50 50 1
Total 700 14
12 December 2023 3
CONTENT
Course objective
B. TECH. (Data Science)
Course objective:
The objective of this course is to understand the fundamental concepts of Data Science,
learn about various types of data formats and its manipulations. It helps students to
learn exploratory data analysis and visualization techniques in addition to R
programming language.
CO 1 Understand the fundamental concepts of data analytics in the areas that plays major K1
role within the realm of data science.
CO 2 Explain and exemplify the most common forms of data and its representations. K2
CO 5 Illustrate various visualization methods for different types of data sets and application K3
scenarios.
Text books:
1) Glenn J. Myatt, Making sense of Data: A practical Guide to Exploratory Data Analysis
and Data Mining, John Wiley Publishers, 2007.
2) Data Analysis and Data Mining, 2nd Edition, John Wiley & Sons Publication, 2014.
Reference Books:
•Security.
•Transportation.
•Risk detection.
•Risk Management.
•Delivery.
•Fast internet allocation.
•Reasonable Expenditure.
•Interaction with customers.
•Planning of cities
OS Unit-1
12 December 2023 8
Course Outcomes
Course outcome: After completion of this course students will be able to:
CO5 Understand and analyze the I/O management and File systems K2, K4
OS Unit-1
12 December 2023 9
Program Outcomes
1. Engineering knowledge
2. Problem analysis
3. Design/development of solutions
4.Conduct investigations of complex problems
5. Modern tool usage
6. The engineer and society
7. Environment and sustainability
8. Ethics:
9. Individual and team work
10. Communication
11. Project management and finance
12. Life-long learning
Course PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
Outcome
1 3 2 2 - - - - - - - - 1
3 3 3 - - - - - - - - 1
2
3 3 3 - - - - - - - - 1
3
3 2 1 - - - - - - - - 1
4
3 2 2 - - - - - - - - 1
5
Average
3 2.4 2.2 - - - - - - - - 1
1 3 - -
3 2 -
2
3 2 -
3
3 2 2
4
3 2 -
5
Average
3 2 2
•Solve real-time complex problems and adapt to technological changes with the ability of
lifelong learning.
•Work as data scientists, entrepreneurs, and bureaucrats for the goodwill of the society
and pursue higher education.
•Exhibit professional ethics and moral values with good leadership qualities and effective
interpersonal skills.
• NA
Recap
To Understand the types of Data.
To understand Data Classification and analyze the various File Formats.
To import and export data in R/Python
Data analytics (DA) is the area of examining data sets in order to find trends
and draw conclusions about the information they contain. Increasingly, data
analytics is done with the aid of specialized systems and software.
Objective:
▪ In this topic we learn about how data science came into existence
and what was the industry need for Bid Data. This shows the
innovation importance of the technology as an open source
framework.
Recap:
• As you can see from the above image, a Data Analyst usually explains what is
going on by processing history of the data. On the other hand, Data Scientist
not only does the exploratory analysis to discover insights from it, but also uses
various advanced machine learning algorithms to identify the occurrence of a
particular event in the future. A Data Scientist will look at the data from many
angles, sometimes angles not known earlier.
• So, Data Science is primarily used to make decisions and predictions making use
of predictive causal analytics, prescriptive analytics (predictive plus decision
science) and machine learning.
• Predictive causal analytics – If you want a model which can predict the
possibilities of a particular event in the future, you need to apply predictive
causal analytics. Say, if you are providing money on credit, then the probability
of customers making future credit payments on time is a matter of concern for
you. Here, you can build a model which can perform predictive analytics on the
payment history of the customer to predict if the future payments will be on
time or not.
● IBM predicts that demand for data scientists will soar by 28% by 2022
● Data scientist roles have grown over 650% since 2012, but currently, 35,000
people in the US have data science skills, while hundreds of companies are
hiring for those roles.
Objective:
▪ This topic introduces the big data as an open source framework
with its ecosystems and also how processing takes place worth huge
amount of data in cloud infrastructure.
Recap:
Objective:
▪ This topic deals with the concept of different types if big data
requirements on digital platform and how they are occupying space
in our day to day environment.
Recap:
Objective:
▪ This Unit is basically dealing with the data science and its evolution. It also
focusses on how we can manage data science with the open source
frameworks.
Recap:
• Traditionally, the data that we had was mostly structured and small in size, which
could be analyzed by using the simple BI tools. Unlike data in the traditional
systems which was mostly structured, today most of the data is unstructured or
semi-structured.
• This data is generated from different sources like financial logs, text files,
multimedia forms, sensors, and instruments. Simple BI tools are not capable of
processing this huge volume and variety of data. This is why we need more
complex and advanced analytical tools and algorithms for processing, analyzing
and drawing meaningful insights out of it.
• This is not the only reason why Data Science has become so popular.
Let’s dig deeper and see how Data Science is being used in various domains.
• How about if you could understand the precise requirements of your customers
from the existing data like the customer’s past browsing history, purchase history,
age and income. No doubt you had all this data earlier too, but now with the vast
amount and variety of data, you can train models more effectively and
recommend the product to your customers with more precision. Wouldn’t it be
amazing as it will bring more business to your organization?
• Let’s take a different scenario to understand the role of Data Science in decision
making. How about if your car had the intelligence to drive you home? The self-
driving cars collect live data from sensors, including radars, cameras and lasers to
create a map of its surroundings. Based on this data, it takes decisions like when
to speed up, when to speed down, when to overtake, where to take a turn –
making use of advanced machine learning algorithms.
Objective:
▪ This unit objective is to specify the Datafication which refers to the fact that
daily interactions of living things can be rendered into a data format and put
to social use.
▪ Recap:
Examples:
• And here could be many examples of datafication.
• Let’s say social platforms, Facebook or Instagram, for example, collect and
monitor data information of our friendships to market products and services to
us and surveillance services to agencies which in turn changes our behavior;
promotions that we daily see on the socials are also the result of the monitored
data. In this model, data is used to redefine how content is created by
datafication being used to inform content rather than recommendation
systems.
• However, there are other industries where datafication process is actively used:
• Insurance: Data used to update risk profile development and business models.
• Banking: Data used to establish trustworthiness and likelihood of a person
paying back a loan.
• Human resources: Data used to identify e.g. employees risk-taking profiles.
• Hiring and recruitment: Data used to replace personality tests.
• Social science research: Datafication replaces sampling techniques and
restructures the manner in which social science research is performed.
Objective:
▪ This unit objective is to specify Programming Skills for Data Science
which brings together all the fundamental skills needed to transform
raw data into actionable insights. While there is no specific rule
about the selection of programming language, Python and R are the
most favored ones.
Recap:
Datafication
Objective:
▪ This unit objective is to specify the data science life cycle which is an
iterative set of steps you take to deliver a data science project or product.
Because every data science project and team are different, every specific
data science life cycle is different. However, most data science projects tend
to flow through the same general life cycle
Recap:
• BI basically analyzes the previous data to find hindsight and insight to describe the
business trends. BI enables you to take data from external and internal sources,
prepare it, run queries on it and create dashboards to answer the questions
like quarterly revenue analysis or business problems. BI can evaluate the impact of
certain events in the near future.
• Data Science is a more forward-looking approach, an exploratory way with the
focus on analyzing the past or current data and predicting the future outcomes
with the aim of making informed decisions. It answers the open-ended questions
as to “what” and “how” events occur.
• Phase 1—Discovery: Before you begin the project, it is important to understand the various
specifications, requirements, priorities and required budget. You must possess the ability to
ask the right questions. Here, you assess if you have the required resources present in terms
of people, technology, time and data to support the project. In this phase, you also need to
frame the business problem and formulate initial hypotheses (IH) to test.
• Phase 2—Data preparation: In this phase, you require analytical sandbox in which you can
perform analytics for the entire duration of the project. You need to explore, preprocess and
condition data prior to modeling. Further, you will perform ETLT (extract, transform, load
and transform) to get data into the sandbox. Let’s have a look at the Statistical Analysis flow
below.
• You can use R for data cleaning, transformation, and visualization. This will help you to spot
the outliers and establish a relationship between the variables. Once you have cleaned and
prepared the data, it’s time to do exploratory analytics on it. Let’s see how you can achieve
that.
• Phase 3—Model planning: Here, you will determine the methods and techniques to draw the
relationships between variables. These relationships will set the base for the algorithms
which you will implement in the next phase. You will apply Exploratory Data Analytics (EDA)
using various statistical formulas and visualization tools.
12 December 2023 OS Unit-1 64
THE
Data
Big CONCEPT
Science
Data ToolsLEARNING
importanceand TASK
andtechnologies
applications
Phase 4—Model building: In this phase, you will develop datasets for training and testing
purposes. You will consider whether your existing tools will suffice for running the
models or it will need a more robust environment (like fast and parallel processing). You
will analyze various learning techniques like classification, association and clustering to
build the model.
Objective:
▪ This unit objective is to specify the data science Top 9 Tools [Most Used in 2021]
❖ Apache Spark.
❖ BigML.
❖ D3.js.
❖ MATLAB.
❖ SAS.
❖ Tableau.
❖ Matplotlib.
❖ Scikit-learn.
Recap:
▪ Revision of need of Data science in Industry.
R has a complete set of modeling capabilities and provides a good environment for
building interpretive models.
SQL Analysis services can perform in-database analytics using common data mining
functions and basic predictive models.
SAS/ACCESS can be used to access data from Hadoop and is used for creating
repeatable and reusable model flow diagrams.
Although, many tools are present in the market but R is the most commonly used
tool.
Now that you have got insights into the nature of your data and have decided the
algorithms to be used. In the next stage, you will apply the algorithm and build up a
12 December 2023 OS Unit-1 68
model.
THE
Data
Big CONCEPT
Science
Data ToolsLEARNING
importanceand TASK
andtechnologies
applications
Objective:
▪ This unit objective is to specify that Data Analysis can be separated and organized
into 6 types, arranged with an increasing order of difficulty as following types:
❖ Descriptive Analysis
❖ Exploratory Analysis
❖ Inferential Analysis
❖ Predictive Analysis
❖ Causal Analysis
❖ Mechanistic Analysis
Recap:
▪ Revision of need of Data science in Industry.
1. Descriptive Analysis
Goal — Describe or Summarize a set of Data
Description:
The very first analysis performed
Generates simple summaries about samples and measurements
common descriptive statistics (measures of central tendency, variability,
frequency, position, etc)
Example:
Take the COVID-19 statistics page on google for example, the line graph is just a
pure summary of the cases/deaths, a presentation and description of the
population of a particular country infected by the virus
Summary:
Descriptive Analysis is the first step in analysis where you summarize and
describe the data you have using descriptive statistics, and its result is a simple
presentation of your data.
6. Mechanistic Analysis
Goal — Understand exact changes in variables that lead to other changes in other variables
Description:
Applied in physical or engineering sciences, situations that require high precision and little
room for error(only noise in data is measurement error)
Designed to understand a biological or behavioral process, the pathophysiology of a
disease, or the mechanism of action of an intervention. (by NIH)
Example:
Many graduate-level research and complex topics are suitable examples, but to put it in a
simple manner, let’s say an experiment is done to simulate safe and effective nuclear fusion
to power the world, a mechanistic analysis of the study would entail precise balance of
controlling and manipulating variables with highly accurate measures of both variables and
the desired outcomes. It’s this intricate and meticulous modus operandi (strategy) towards
these big topics that allows for scientific breakthroughs and advancement of society.
Summary:
MA is in some ways a predictive analysis, but modified to tackle studies that require high
precision and meticulous methodologies for physical or engineering science.
Objective:
▪ This unit objective is to specify the reason why we need data science
is the ability to process and interpret data. This enables companies
to make informed decisions around growth, optimization, and
performance. For example, machine learning is now being used to
make sense of every kind of data – big or small.
Recap:
• The same is the case with challenges and problems. The problems and
concerns of the past for a specific theme, illness, or shortfall may not be the
same today as they have advanced in terms of complexity.
• Data is the key component for every business, as businesses need it to analyze
their current scenario based on past facts and performance and make decisions
for future challenges. They need data to survive in today’s competitive market
and mature their decision-making power, which would enhance their
productivity and profitability. Today, data science is the requirement of every
business to make business forecasts and predictions based on facts and figures,
which are collected in the form of data and processed through data science.
12 December 2023 OS Unit-1 79
THE
Big CONCEPT
Need
Data importanceLEARNING
for Data Science
and TASK
applications
Data Science
• There are also two ways of looking at data: with the intent to explain
behavior that has already occurred, and you have gathered data for it; or
to use the data you already have in order to predict future behavior that
has not yet happened.
• Linear regression
• In data science, the linear regression model is used for quantifying causal
relationships among the different variables included in the analysis. Like the
relationship between house prices, the size of the house, the neighborhood,
and the year built. The model calculates coefficients with which you can predict
the price of a new house, if you have the relevant information available.
• Logistic regression
• Since it’s not possible to express all relationships between variables as linear,
data science makes use of methods like the logistic regression to create non-
linear models. Logistic regression operates with 0s and 1s. Companies apply
logistic regression algorithms to filter job candidates during their screening
process. If the algorithm estimates that the probability that a prospective
candidate will perform well in the company within a year is above 50%, it
would predict 1, or a successful application. Otherwise, it will predict 0.
• Cluster analysis
• This exploratory data science technique is applied when the observations in the
data form groups according to some criteria. Cluster analysis takes into account
that some observations exhibit similarities, and facilitates the discovery of new
significant predictors, ones that were not part of the original conceptualization
of the data.
Factor analysis
If clustering is about grouping observations together, factor analysis is about
grouping features together. Data science resorts to using factor analysis to
reduce the dimensionality of a problem. For example, if in a 100-item
questionnaire each 10 questions pertain to a single general attitude, factor
analysis will identify these 10 factors, which can then be used for a regression
that will deliver a more interpretable prediction. A lot of the techniques in data
science are integrated like this.
Time series analysis
Time series is a popular method for following the development of specific
values over time. Experts in economics and finance use it because their subject
matter is stock prices and sales volume – variables that are typically plotted
against time.
Objective:
▪ This unit objective is to specify the reason of Analysis: The process
of exploring data and reports in order to extract meaningful insights,
which can be used to better understand and improve business
performance. __Reporting translates raw data into information.
Analysis transforms data and information into insights.
Recap:
Data Analytics
• Data analytics is the often complex process of examining big data to uncover
information -- such as hidden patterns, correlations, market trends and
customer preferences -- that can help organizations make informed business
decisions.
• On a broad scale, data analytics technologies and techniques give organizations
a way to analyze data sets and gather new information. Business intelligence
(BI) queries answer basic questions about business operations and
performance.
• Big data analytics is a form of advanced analytics, which involve complex
applications with elements such as predictive models, statistical algorithms and
what-if analysis powered by analytics systems.
• Why is big data analytics important?
• Organizations can use big data analytics systems and software to make data-
driven decisions that can improve business-related outcomes. The benefits may
include more effective marketing, new revenue opportunities, customer
personalization and improved operational efficiency. With an effective strategy,
these
12 benefits
December 2023 can provide competitive advantages over rivals.
OS Unit-1 88
THE CONCEPT
Analysis vsLEARNING
Reporting TASK
1. Purpose: Reporting helps companies monitor their data even before digital technology
boomed. Various organizations have been dependent on the information it brings to their
business, as reporting extracts that and makes it easier to understand.
Analysis interprets data at a deeper level. While reporting can link between cross-
channels of data, provide comparison, and make understand information easier (think of
a dashboard, charts, and graphs, which are reporting tools and not analysis reports),
analysis interprets this information and provides recommendations on actions.
3. Outputs: Reporting has a push approach, as it pushes information to users and outputs
come in the forms of canned reports, dashboards, and alerts.
Analysis has a pull approach, where a data analyst draws information to further probe and
to answer business questions. Outputs from such can be in the form of ad hoc responses and
analysis presentations.
5. Value: This Path to Value illustrates how data converts into value by reporting and
analysis such that it’s not achievable without the other.
Objective:
▪ This Unit is basically dealing with the big data ecosystem and
frameworks. It also focusses on how we can manage big data with
the open source frameworks.
Recap:
OS Unit-1
12 December 2023 91
THE CONCEPT LEARNING TASK
Big Data Ecosystem
1. Data Management
2. Data Mining
3. Hadoop
4. In-Memory Analytics
5. Predictive Analytics
6. Text Mining
Why is big data concepts analytics
important?
1. Reduced cost
2. Quick decision making
3. New products and features
Objective:
▪ This unit objective is to specify as You can think about the data
increase from IoT or from social data at the edge. If we look a little
bit more ahead, the US Bureau of Labor Statistics predicts that by
2026—so around six years from now—there will be 11.5 million jobs
in data science and analytics.
Recap:
• By all means, advancement in Machine Learning is the key contributor towards the
future of data science.
• Data Integration.
• Distributed Architecture.
• Automating Machine learning.
• Data Visualization.
• Dashboards and BI.
• Data Engineering.
• Deployment in production mode
• Automated, data-driven decisions.
• i. Data Science currently does not have a fixed definition due to its vast number of data
operations. These data operations will only increase in the future. However, the
definition of data science will become more specific and constrained as it will only
incorporate essential areas that define the core data science.
• ii. In the near future, Data Scientists will have the ability to take on areas that are
business-critical as well as several complex challenges. This will facilitate the businesses
to make exponential leaps in the future. Companies in the present are facing a huge
shortage of data scientists. However, this is set to change in the future.
• In India alone, there will be an acute shortage of data science professionals until 2020.
The main reason for this shortage is India is because of the varied set of skills required
for data science operations.
• There are very few existing curricula that address the requirements of data scientists
and train them. However, this is gradually changing with the introduction of Data
Science degrees and bootcamps that can transform a professional from a quantitative
background or a software background into a fully-fledged data scientist.
• We can summarize the trends leading to the future of data science in the following
three points –
Objective:
▪ This unit objective is to specify the disciplinary areas that make up
the data science field include mining, statistics, machine learning,
analytics, and programming. Statistical measures or predictive
analytics use this extracted data to gauge events that are likely to
happen in the future based on what the data shows happened in the
past.
Recap:
Objective:
▪ This unit objective is to specify that Crowdsourcing data is an
effective way to seek the help of a large audience usually through
the internet to gather information on how to solve the company's
problems, generate new ideas and innovations. ... These brief
understudies towards idea and collaboration, and give useable
information to them.
Recap:
Objective:
▪ This unit objective is to specify the integrity and privacy of data are
at risk from unauthorized users, external sources listening in on the
network, and internal users giving away the store. This section
explains the risky situations and potential attacks that could
compromise your data.
Recap:
• There are several technologies and practices that can improve data
security. No one technique can solve the problem, but by combining
several of the techniques below, organizations can significantly improve
their security posture.
• Data Discovery and Classification
• Modern IT environments store data on servers, endpoints, and cloud
systems. Visibility over data flows is an important first step in
understanding what data is at risk of being stolen or misused. To properly
protect your data, you need to know the type of data, where it is, and
what it is used for. Data discovery and classification tools can help.
• Data detection is the basis for knowing what data you have. Data
classification allows you to create scalable security solutions, by
identifying which data is sensitive and needs to be secured. Data
detection and classification solutions enable tagging files on endpoints,
file servers, and cloud storage systems, letting you visualize data across
the enterprise, to apply the appropriate security policies.
Objective:
▪ This unit objective is to specify Fundamentally, a data analytics use
case is the manner in which the business user leverages data and
the analytics system to derive insights to answer tangible business
questions for decision making.
Recap:
Netflix Case
• Netflix, an internet streaming media provider, is a bright example of datafication
process. It provides services in more than 40 countries and 33 million streaming
members. Originally, operations were more physical in nature with its core business in
mail order-based disc rental (DVD and Blu-ray). Simply said, the operating model was
that the subscriber creates and maintains the queue (an ordered list) of media content
that they want to rent (for example, a movie). If you limit the total number of disks, the
contents can be stored for a long time, as the subscriber wishes. However, to rent a
new disk, the subscriber sends the previous one back to Netflix, which then forwards
the next available disk to the subscribers queue. Thus, the business goal of the disk
rental model is to help people fill their turn. The model has changed and now Netflix is
actively transforming their service into a smart one, actively using datafication
processes.
• It’s noticeable that in all aspects of the streamlined implementation of the Netflix
business, a gradual change occurs where the IT infrastructure and artifacts completely
free media content from its physical manifestation; for example, a disk and its mail
delivery. While streaming, subscribers can select videos before making a reservation,
they can consume multiple videos in one session and observe viewing statistics to a
much finer degree; and in real time, to a greater extent.
• Netflix initially started as a DVD rental service in 1998. It mostly relied on a third party
postal services to deliver its DVDs to the users. This resulted in heavy losses which
they soon mitigated with the introduction of their online streaming service in 2007.
• In order to make this happen, Netflix invested in a lot of algorithms to provide a
flawless movie experience to its users. One of such algorithms is the recommendation
system that is used by Netflix to provide suggestions to the users.
• A recommendation system understands the needs of the users and provides
suggestions of the various cinematographic products.
• A recommendation system is a platform that provides its users with various contents
based on their preferences and likings. A recommendation system takes the
information about the user as an input.
• This information can be in the form of the past usage of product or the ratings that
were provided to the product. It then processes this information to predict how much
the user would rate or prefer the product. A recommendation system makes use of a
variety of machine learning algorithms.
• Next in data science use cases is Uber. Uber is a popular smartphone application that
allows you to book a cab. Uber makes extensive use of Big Data. After all, Uber has to
maintain a large database of drivers, customers, and several other records.
• It is therefore, rooted in Big Data and makes use of it to derive insights and provide
the best services to its users. Uber shares the big data principle with crowdsourcing.
That is, registered drivers in the area can help anyone who wants to go somewhere.
• As mentioned above, Uber contains a database of drivers. Therefore, whenever you
hail for a cab, Uber matches your profile with the most suitable driver. What
differentiates Uber from other cab companies is that Uber charges you based on the
time it takes to cover the distance and not the distance itself.
• It calculates the time taken through various algorithms that also make use of data
related to traffic density and weather conditions.
• Uber makes the best use of data science to calculate its surge pricing. When there are
less drivers available to more riders, the price of the ride goes up. This happens only
during the scarcity of drivers in any given area.
• However, if the demand for Uber rides is less, then Uber charges a lower rate. This
dynamic pricing is rooted in Big Data and makes excellent usage of data science to
calculate the fares based on the parameters.
• Facebook is a social-media leader of the world today. With millions of users around the
world, Facebook utilizes a large scale quantitative research through data science to
gain insights about the social interactions of the people.
• Facebook has become a hub of innovation where it has been using advanced
techniques in data science to study user behavior and gain insights to improve their
product. Facebook makes use of advanced technology in data science called deep
learning.
• 2Mmachine learning project vintage colorizer
• Using deep learning, Facebook makes use of facial recognition and text analysis. In
facial recognition, Facebook uses powerful neural networks to classify faces in the
photographs. It uses its own text understanding engine called “DeepText” to
understand user sentences.
• It also uses Deep Text to understand people’s interest and aligning photographs with
texts.
• However, more than being a social media platform, Facebook is more of an
advertisement corporation. It uses deep learning for targeted advertising. Using this, it
decides what kind of advertisements the users should view.
• It uses the insights gained from the data to cluster users based on their preferences
and provides them with the advertisements that appeal to them.
• This also comes through the suggestions that are drawn from the other users who use
similar products or provide similar ratings.
• Amazon has an anticipatory shipping model that uses big data for predicting the
products that are most likely to be purchased by its users. It analyzes the pattern of
your purchases and sends products to your nearest warehouse which you may utilize
in the future.
• Amazon also optimizes the prices on its websites by keeping in mind various
parameters like the user activity, order history, prices offered by the competitors,
product availability, etc. Using this method, Amazon provides discounts on popular
items and earns profits on less popular items.
• Another area where every e-commerce platform is addressing is Fraud Detection.
Amazon has its own novel ways and algorithms to detect fraud sellers and fraudulent
purchases.
• Other than online platforms, Amazon has been optimizing the packaging of products in
warehouses and increasing the efficiency of packaging lines through the data collected
from the workers.
12 December 2023 OS Unit-1 122
Use cases of Data science-Facebook, Netflix,
THE CONCEPT LEARNING
Amazon, Uber, AirBnB.
TASK
https://www.youtube.com/watch?v=KxryzSO1Fjs
https://www.youtube.com/watch?v=-ETQ97mXXF0
https://www.youtube.com/watch?v=H4YcqULY1-Q
https://www.youtube.com/watch?v=fn1rKKNLuzk&list=PL15FRvx6P0O
WTlNBS_93NHG2hIn9cynVT
https://www.youtube.com/watch?v=XohgKT13FKY&list=PLqICp9VkFcbE
WeZ0Q-_6gs-HCRaqe5eyf
Assignment 1
1.Explaion about Big Data-Characteristics and applications?
2.Explain The building blocks of Hadoop?
3.Explain Why is Big Data Important?
4.What is data Analysis? Why python is used for data analysis?
5.What are the applications of machine Learning in data science?
6.What are the problems face when handling large data?
7.What do you understand with crowdsourcing analytics?
8. What do u mean by 5v’s of Big Data?
9. What are security challenges of Data Science?
10. How data science can be used in medical industry? Explain briefly?
⮚ This unit provide us fundamentals domain of Big Data and its latest
trends in industry.
⮚ In this unit we are also benefitted with the knowledge of different
types of data
⮚ and very important one are the 5 V’s of Big Data and we also through
the concept of reporting vs analysis which is used in industry
prospects.
⮚ This unit will impart us with knowledge of business analytics and tolls
used in data science.