0% found this document useful (0 votes)

6 views

Chirag Modi Data Science Report

The seminar report on Data Science submitted to the University College of Engineering, Banswara, outlines the interdisciplinary nature of data science, its historical context, modern applications, and the technologies involved. It discusses the challenges faced by data scientists, including data preparation, security, and effective communication, while also highlighting the future scope of data science in various sectors like healthcare and e-commerce. The report concludes with objectives and methodologies of an internship focused on practical data science skills using Python programming.

Uploaded by

Chirag Modi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Chirag Modi Data Science Report

Uploaded by

Chirag Modi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 29

A Seminar Report

“Data Science”
Submitted to the University College Of Engineering Banswara
In Partial Fulfilment of the requirement for the degree of
Bachelor of Technology

University College of Engineering, Banswara

(A Constituent College of GGTU Banswara)

Affiliated to
Govind Guru Tribal University, Banswara

Submitted To:
Submitted By :
Mrs. Kamna Agarwal Shakyavanshi
Mr. Chirag Modi
(Assistant Professor,
RollCSE
NO.Dept., GEC
: 500009
Banswara)

(December, 2024)
B. Tech. (2021-25)

ACKNOWLEDGEMENT
I would like to acknowledge the contributions of the following people without whose help and guidance
this report would not have been completed. I am also thankful to Mrs. Kamna Agarwal Shakyavanshi
University College of Engineering, Banswara, Rajasthan for his constant encouragement, valuable
suggestions and moral support and blessings. Although it is not possible to name individually, I shall
ever remain indebted to the faculty members of University College Of Engineering, Banswara,
Rajasthan for their persistent support and cooperation extended during this work. This
acknowledgement will remain incomplete if I fail to express our deep sense of obligation to my parents
and God for their consistent blessings and encouragement.

Contents
S. No. Contents Page no.
1 Chapter 1 (Introduction to Data Science) 1-7
2 Chapter 2 (Technologies implemented) 8-12
3 Chapter 3 (Implementation) 12-21
4 Chapter 4 (Conclusion and results) 22
5 Chapter 5 (References) 23

Contents of figures
S. No. Figure Page no.
1 Data Science. 6
2 Pre-processing the datasets. 13
3 Total match wins by teams 13
4 Total finals played and won 13
5 CSK vs MI head to head 14
6 Most Man of the Match winners 14
7 Highest run getters in the league 15
8 Excluding irrelevant wickets for bowlers 15
9 Excluding irrelevant wickets for bowlers 16
10 Top wicket taking bowlers in the league. 16
11 Top all-rounders in the league 17

12 Highest death runs scorers in both innings of the game. 17

13 Highest wicket takers of the league in death overs. 18

14 Bowlers with least economy rate in IPL 18
15 Win % while batting first at top 20 venues 19
16 Win % while chasing at top 20 venues 19
17 Toss winning decision 20
18 Win % batting second 20
19 Is toss winner the match winner? 20
20 Team wise toss decision analysis. 21
Chapter 1
Introduction

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and
systems to extract knowledge and insights from noisy, structured and unstructured data, and apply
knowledge and actionable insights from data across a broad range of application domains. Data
science is related to data mining, machine learning and big data.

Data science is a "concept to unify statistics, data analysis, informatics, and their related methods" in
order to "understand and analyse actual phenomena" with data. [3] It uses techniques and theories
drawn from many fields within the context of mathematics, statistics, computer science, information
science, and domain knowledge. However, data science is different from computer science and
information science. Turing Award winner Jim Gray imagined data science as a "fourth paradigm" of
science (empirical, theoretical, computational, and now data-driven) and asserted that "everything
about science is changing because of the impact of information technology" and the data deluge.

History of Data Science

The term "data science" has been traced back to 1974, when Peter Naur proposed it as an alternative
name for Computer Science. In 1996, the International Federation of Classification Societies became
the first conference to specifically feature data science as a topic. However, the definition was still in
flux. After the 1985 lecture in the Chinese Academy of Sciences in Beijing, in 1997 C.F. Jeff
Wu again suggested that statistics should be renamed data science. He reasoned that a new name
would help statistics shed inaccurate stereotypes, such as being synonymous with accounting, or
limited to describing data. In 1998, Hayashi Chikio argued for data science as a new,
interdisciplinary concept, with three aspects: data design, collection, and analysis.

During the 1990s, popular terms for the process of finding patterns in datasets (which were
increasingly large) included "knowledge discovery" and "data mining".

Modern Usage

The modern conception of data science as an independent discipline is sometimes attributed

to William S. Cleveland. In a 2001 paper, he advocated an expansion of statistics beyond theory into

1
technical areas; because this would significantly change the field, it warranted a new name. "Data
science" became more widely used in the next few years: in 2002, the Committee on Data for
Science and Technology launched Data Science Journal. In 2003, Columbia University
launched The Journal of Data Science. In 2014, the American Statistical Association's Section on
Statistical Learning and Data Mining changed its name to the Section on Statistical Learning and
Data Science, reflecting the ascendant popularity of data science.

The professional title of "data scientist" has been attributed to DJ Patil and Jeff Hammerbacher in
2008. Though it was used by the National Science Board in their 2005 report, "Long-Lived Digital
Data Collections: Enabling Research and Education in the 21st Century," it referred broadly to any
key role in managing a digital data collection.

There is still no consensus on the definition of data science and it is considered by some to be a
buzzword.

Impact
Big data is very quickly becoming a vital tool for businesses and companies of all sizes. The
availability and interpretation of big data has altered the business models of old industries and
enabled the creation of new ones. Data-driven businesses are worth $1.2 trillion collectively in 2020,
an increase from $333 billion in the year 2015. Data scientists are responsible for breaking down big
data into usable information and creating software and algorithms that help companies and
organizations determine optimal operations. As big data continues to have a major impact on the
world, data science does as well due to the close relationship between the two.

Technologies and Techniques

There is a variety of different technologies and techniques that are used for data science which
depend on the application. More recently, full-featured, end-to-end platforms have been developed
and heavily used for data science and machine learning.
Techniques:
 Linear regression
 Logistic regression
 Decision trees are used as prediction models for classification and data fitting. The decision
tree structure can be used to generate rules able to classify or predict target/class/label
variable based on the observation attributes.
 Support-vector machine (SVM)
 Cluster analysis is a technique used to group data together.

2
 Dimensionality reduction is used to reduce the complexity of data computation so that it can
be performed more quickly.
 Machine learning is a technique used to perform tasks by inferencing patterns from data
 Naive Bayes classifiers are used to classify by applying the Bayes' theorem. They are mainly
used in datasets with large amounts of data, and can aptly generate accurate results.

The Challenges Faced by Data Scientists

1. Data Preparation

Data scientists spend nearly 80% of their time cleaning and preparing data to improve its quality –
i.e., make it accurate and consistent, before utilizing it for analysis. However, 57% of them consider
it as the worst part of their jobs, labelling it as time-consuming and highly mundane. They are
required to go through terabytes of data, across multiple formats, sources, functions, and platforms,
on a day-to-day basis, whilst keeping a log of their activities to prevent duplication.

One way to solve this challenge is by adopting emerging AI-enabled data science technologies like
Augmented Analytics and Auto feature engineering. Augmented Analytics automates manual data
cleansing and preparation tasks and enables data scientists to be more productive.

2. Multiple Data Sources

As organizations continue to utilize different types of apps and tools and generate different formats
of data, there will be more data sources that the data scientists need to access to produce meaningful
decisions. This process requires manual entry of data and time-consuming data searching, which
leads to errors and repetitions, and eventually, poor decisions.

Organizations need a centralized platform integrated with multiple data sources to instantly access
information from multiple sources. Data in this centralized platform can be aggregated and controlled
effectively and in real-time, improving its utilization and saving huge amounts of time and efforts of
the data scientists.

3. Data Security

As organizations transition into cloud data management, cyberattacks have become increasingly
common. This has caused two major problems –

a) Confidential data becoming vulnerable

3
b) As a response to repeated cyberattacks, regulatory standards have evolved which have
extended the data consent and utilization processes adding to the frustration of the data scientists.

Organizations should utilize advanced machine learning enabled security platforms and instill
additional security checks to safeguard their data. At the same time, they must ensure strict adherence
to the data protection norms to avoid time-consuming audits and expensive fines.

4. Understanding the Business Problem

Before performing data analysis and building solutions, data scientists must first thoroughly
understand the business problem. Most data scientists follow a mechanical approach to do this and
get started with analysing data sets without clearly defining the business problem and objective.

Therefore, data scientists must follow a proper workflow before starting any analysis. The workflow
must be built after collaborating with the business stakeholders and consist of well-defined checklists
to improve understanding and problem identification.

5. Effective Communication with Non-Technical Stakeholders

It is imperative for the data scientists to communicate effectively with business executives who may
not understand the complexities and the technical jargon of their work. If the executive, stakeholder,
or the client cannot understand their models, then their solutions will, most likely, not be executed.

This is something that data scientists can practice. They can adopt concepts like “data storytelling” to
give a structured approach to their communication and a powerful narrative to their analysis and
visualizations.

6. Collaboration with Data Engineers

Organizations usually have data scientists and data engineers working on the same projects. This
means there must be effective communication across them to ensure the best output. However, the
two usually have different priorities and workflows, which causes misunderstanding and stifles
knowledge sharing.

Management should take active steps to enhance collaboration between data scientists and data
engineers. It can foster open communication by setting up a common coding language and a real-
time collaboration tool. Moreover, appointing a Chief Data Officer to oversee both the departments
has also proven to have improved collaboration between the two teams.
4
7. Misconceptions About the Role

In big organizations, a data scientist is expected to be a jack of all trades – they are required to clean
data, retrieve data, build models, and conduct analysis. However, this is a big ask for any data
scientist. For a data science team to function effectively, tasks need to be distributed among
individuals pertaining to data visualization, data preparation, model building and so on.

It is critical for data scientists to have a clear understanding of their roles and responsibilities before
they start working with any organization.

8. Undefined KPIs And Metrics

The lack of understanding of data science among management teams leads to unrealistic expectations
on the data scientist, which affects their performance. Data scientists are expected to produce a silver
bullet and solve all the business problems. This is very counterproductive.

Therefore, every business should have:

A) Well-defined metrics to measure the accuracy of analysis generated by the data scientists

B) Proper business KPIs to analyse the business impact generated by the analysis

Applications of Data Science

 Fraud and Risk Detection

 Healthcare
 Internet Search
 Targeted Advertising
 Website Recommendations
 Advanced Image Recognition
 Speech Recognition
 Airline Route Planning
 Gaming
 Augmented Reality

5
Fig. 1: Data Science
Future Scope
Health care sector
There is a huge requirement of data scientists in the healthcare sector because they create a lot of data
on a daily basis. Tackling a massive amount of data is not possible by any unprofessional candidate.
Hospitals need to keep a record of patients’ medical history, bills, staff personal history, and much
other information. Data scientists are getting hired in the medical sector to enhance the quality and
safety of the data.

Transport Sector
The transport sector requires a data scientist to analyse the data collected through passenger counting
systems, asset management, location system, fare collecting, and ticketing.

E-commerce
The e-commerce industry is booming just because of data scientists who analyse the data and create
customized recommendation lists for providing great results to end-users.

Objectives of the Internship:

The main objectives of the training are as follows:
1. Perform Exploratory Data Analysis on dataset Indian Premier League to find out top
teams and players that companies can spend money on, to endorse their products.

6
2. Python Programming.
3. Data Science Concepts.
4. Analyse the data.
5. Plotting charts.
6. Visualising the models.
7. ML Library Scikit, NumPy, Matplotlib, Pandas.

Methodologies
There were several facilitation techniques used by the trainer which included question and answer,
brainstorming, group discussions, case study discussions and practical implementation of some of the
topics by trainees. The multitude of training methodologies was utilized in order to make sure all the
participants get the whole concepts and they practice what they learn, because only listening to the
trainers can be forgotten, but what the trainees do by themselves they will never forget. After the
post-tests were administered and the final course evaluation forms were filled in by the participants,
the trainer expressed his closing remarks and reiterated the importance of the training for the trainees
in their daily activities and their readiness for applying the learnt concepts in their assigned tasks.
Certificates of completion were distributed among the participants at the end.

7
Chapter 2

Technology Implemented

Python – The New Generation Language

Python is a widely used general-purpose, high level programming language. It was initially designed
by Guido van Rossum in 1991 and developed by Python Software Foundation. It was mainly
developed for an emphasis on code readability, and its syntax allows programmers to express
concepts in fewer lines of code. Python is dynamically typed and garbage-collected. It supports
multiple programming paradigms, including procedural, object-oriented, and functional
programming. Python is often described as a "batteries included" language due to its comprehensive
standard library.

Features
• Interpreted
In Python there is no separate compilation and execution steps like C/C++. It directly run the
program from the source code. Internally, Python converts the source code into an
intermediate form called bytecodes which is then translated into native language of specific
computer to run it.

• Platform Independent
Python programs can be developed and executed on the multiple operating system platform.
Python can be used on Linux, Windows, Macintosh, Solaris and many more.

• Multi- Paradigm
Python is a multi-paradigm programming language. Object-oriented programming and
structured programming are fully supported, and many of its features support functional
programming and aspect-oriented programming.

• Simple
Python is a very simple language. It is a very easy to learn as it is closer to English language.
In python more emphasis is on the solution to the problem rather than the syntax.

8
• Rich Library Support
Python standard library is very vast. It can help to do various things involving regular
expressions, documentation generation, unit testing, threading, databases, web browsers, CGI,
email, XML, HTML, WAV files, cryptography, GUI and many more.
• Free and Open Source
Firstly, Python is freely available. Secondly, it is open-source. This means that its source
code is available to the public. We can download it, change it, use it, and distribute it. This is
called FLOSS (Free/Libre and Open-Source Software). As the Python community, we’re all
headed toward one goal- an ever-bettering Python.

Why Python Is a perfect language for Machine Learning?

1. A great library ecosystem –

A great choice of libraries is one of the main reasons Python is the most popular programming
language used for AI. A library is a module or a group of modules published by different sources
which include a pre-written piece of code that allows users to reach some functionality or
perform different actions. Python libraries provide base level items so developers don’t have to
code them from the very beginning every time. ML requires continuous data processing, and
Python’s libraries let us access, handle and transform data. These are some of the most
widespread libraries you can use for ML and AI:
 Scikit-learn for handling basic ML algorithms like clustering, linear and logistic
regressions, regression, classification, and others.
 Pandas for high-level data structures and analysis. It allows merging and filtering of
data, as well as gathering it from other external sources like Excel, for instance.
 Kera’s for deep learning. It allows fast calculations and prototyping, as it uses the
GPU in addition to the CPU of the computer.
 TensorFlow for working with deep learning by setting up, training, and utilizing
artificial neural networks with massive datasets.
 Matplotlib for creating 2D plots, histograms, charts, and other forms of visualization.
 NLTK for working with computational linguistics, natural language recognition, and
processing. o Scikit-image for image processing.
 PyBrain for neural networks, unsupervised and reinforcement learning.
 Caffe for deep learning that allows switching between the CPU and the GPU and
processing 60+ million images a day using a single NVIDIA K40 GPU.
 StatsModels for statistical algorithms and data exploration. In the PyPI repository, we
can discover and compare more python libraries.
9
2. A low entry barrier –
Working in the ML and AI industry means dealing with a bunch of data that we need to process
in the most convenient and effective way. The low entry barrier allows more data scientists to
quickly pick up Python and start using it for AI development without wasting too much effort
into learning the language. In addition to this, there’s a lot of documentation available, and
Python’s community is always there to help out and give advice.
3. Flexibility-
Python for machine learning is a great choice, as this language is very flexible: ▪ It offers an
option to choose either to use OOPs or scripting. ▪ There’s also no need to recompile the source
code, developers can implement any changes and quickly see the results. ▪ Programmers can
combine Python and other languages to reach their goals.
4. Good Visualization Options-
For AI developers, it’s important to highlight that in artificial intelligence, deep learning, and
machine learning, it’s vital to be able to represent data in a human-readable format. Libraries like
Matplotlib allow data scientists to build charts, histograms, and plots for better data
comprehension, effective presentation, and visualization. Different application programming
interfaces also simplify the visualization process and make it easier to create clear reports.
5. Community Support-
It’s always very helpful when there’s strong community support built around the programming
language. Python is an open-source language which means that there’s a bunch of resources open
for programmers starting from beginners and ending with pros. A lot of Python documentation is
available online as well as in Python communities and forums, where programmers and machine
learning developers discuss errors, solve problems, and help each other out. Python programming
language is absolutely free as is the variety of useful libraries and tools.
6. Growing Popularity-
As a result of the advantages discussed above, Python is becoming more and more popular
among data scientists. According to Stack Overflow, the popularity of Python is predicted to
grow until 2020, at least. This means it’s easier to search for developers and replace team players
if required. Also, the cost of their work maybe not as high as when using a less popular
programming language.

Libraries used:
1. Pandas

Pandas is a Python library used for working with data sets. Pandas is used to analyse data. It has
functions for analysing, cleaning, exploring, and manipulating data. The name "Pandas" has a
reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney
10
in 2008. Pandas allows us to analyse big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant. Relevant data is very
important in data science.

Syntax:

import pandas as pd

Example:

import pandas as pd

mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}

myvar = pd.DataFrame(mydataset)

print(myvar)

2. Matplotlib

Matplotlib is a low-level graph plotting library in python that serves as a visualization utility.
Matplotlib was created by John D. Hunter. Matplotlib is open source and we can use it freely.
Matplotlib is mostly written in python, a few segments are written in C, Objective-C and
JavaScript for Platform compatibility.

Syntax:

import matplotlib.pyplot as plt

Example:

import matplotlib.pyplot as plt

import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])
plt.plot(xpoints, ypoints)
plt.show()

3. NumPy

11
NumPy is a Python library. NumPy is used for working with arrays. NumPy is short for
"Numerical Python".

Syntax:

Import numpy as np

Example:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(arr)
print(type(arr))

4. Plotly

Plotly is Python graphing library makes interactive, publication-quality graphs. Examples of

how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms,
heatmaps, subplots, multiple-axes, polar charts, and bubble charts. Plotly.py is free and open
source.

12
Chapter 3

Implementation

Getting clear with the expected outcomes:

We had to perform Exploratory Data Analysis (EDA) on IPL. As a sports analyst, we had to find out the
most successful teams, players and factors contributing to win or loss of a team. The results were to be used
by companies to decide which team or player they should choose in order to endorse their products. Since
we were given choice with the tool to be used, I selected Python.

Setting up the virtual environment and making datasets ready to use:

The first and the foremost task while working on any project is to create and activate a virtual environment
on your machine so that whatever libraries we use don’t affect the global environment’s variables.
To create a virtual environment, we go to the directory where we need to keep all our files related to the
project, open up bash or command prompt and run ‘virtualenv <name of venv>’. It sets up the virtual
environment in the base directory where we are present at the moment. This virtual environment can be
activated by browsing to the ‘Scripts’ directory inside the directory with same name as our venv, and run
‘activate.bat’ (on Windows CMD) and ‘source .activate’ (on bash). Once our virtual environment is
activated, we are good to go!

Heading to the notebook:

Once we reached the Jupyter notebook section, we first included all the required libraries that are mentioned
above briefly.
The very next task is to read the provided datasets that we need to work on.
In this case, the datasets given were related to a very famous sports league known as Indian Premier League
(aka IPL). One CSV contained information about each and every delivery over bowled since the beginning
of the league(deliveries.csv), and the other CSV contained information about every match played, like toss-
winner, toss-decision, match winner, etc (matches.csv).

The next task, as in any data related task, was of pre-processing the datasets and perform data cleaning in
order to make the analysis easier. For instance, a few franchisees have changed the name of their teams.
They needed to be changed. There were also some errors with the names of stadiums, there were two

13
different names for the same stadium, etc. I also replaced to team names with their initials to make their
representation less complex.

Fig. 2: Pre-processing the datasets

Analysing the most successful teams:

There were several grounds to identify the most successful teams, such as total match wins in all seasons
of the league, teams to have played the maximum number of finals and won the maximum number of
leagues, highest scores ever by any team, most 200+ scores by any team, etc.

Fig. 3: Total match wins by teams

Fig. 4: Total finals played and won

14
On finding that the two most successful teams
in the leagues have been Chennai Super Kings
(CSK) and Mumbai Indians (MI), a head to
head analysis of the teams was also done to
identify who actually dominated the battle.
And MI turned out to be on top of CSK.

Fig. 5: CSK vs MI head to head

Player analysis:
The process of identifying the most successful turned out to be most complex and time consuming. There
were several grounds on which the most successful player could be identified. For example, most Man of the
Matches for any player, highest run scorers, highest wicket takers, best all-rounders, best death bowlers, best
death batsmen, most economical bowlers, etc.
Most Man of the Match winners:

Fig. 6: Most Man of the Match winners

Highest run getters in the league:

15
Fig. 7: Highest run getters in the league

Highest individual score in the league till now:

Fig. 8: Highest run getters in the league

Highest wicket takers in the league:
Since a batsman can be dismissed in various ways, not all the dismissal types are rewarded to the bowler.
For instance, hit wicket, retired hurt, obstructing the field, etc aren’t counted as bowler’s wicket. Hence
while identifying the top wicket takers, this factor was to be taken care of.

16
Fig. 9: Excluding irrelevant wickets for bowlers

Fig. 10: Top wicket taking bowlers in the league.

Top all-rounders of the league:
To identify top all-rounders, we have used a formula which is as follows: A player will be considered as an
all-rounder only if he has scored above 1000 runs in the league and has taken more than 40 wickets. Then
the player score was generated like this:
SCORE = RUNS + 25*WICKETS

17
Fig. 11: Top all-rounders in the league
Death over analysis of players:
Overs 16-20 are considered as death overs of the innings. Mostly, this is considered to be one of the most
challenging phases of the game for any team, batsman or bowler. Hence, we have performed a separate
death over analysis for batsmen and bowlers to find out which batsmen has scored most runs or runs with
highest strike rate and which bowler has got maximum wickets in death bowling.

Fig. 12: Highest death runs scorers in both innings of the game.

18
Fig. 13: Highest wicket takers of the league in death overs.

Fig. 14: Bowlers with least economy rate in IPL

Factors contributing to wins:

19
Match Venue:

Fig. 15: Win % while batting first at top 20 venues

Fig. 16: Win % while chasing at top 20 venues

Toss decision:
20
Fig. 17: Toss winning decision Fig. 18: Win % batting second

It is quite visible from the pie chart that majority of the toss winning teams have decided to field first and
chase the score on board. The reason for the above decisions can be concluded from the adjoining pie
chart.

Is toss winner the match winner?

Lets now analyse how good the decision turns out to be for a team after they win the toss and select what
they want to do first.

Fig. 19: Is toss winner the match winner?

Team wise toss decision analysis:

21
Fig. 20: Team wise toss decision analysis.

22
Chapter 4
Conclusion

The given datasets included needed to be pre-processed due to several reasons. Then we were able to
analyse the data properly and make proper conclusions. Top teams were identified and their head-to-head
clashes were also counted. Top run scorers, highest wicket takers, best all-rounders, best death batsmen
and best death bowlers were also listed. With the help of another dataset, we identified what factors led to
winning matches, i.e., venue and toss decision.

Observations and results:

 Top teams were turned out to be Chennai Super Kings and Mumbai Indians as they have won the
maximum number of league titles and overall matches as well.
 Top run scorer in the league was found to be Virat Kohli while Chris H. Gayle won the maximum
number of Man of the Match awards.
 Lasith Malinga and Dwayne Bravo are the leading wicket takers in the history of this league.
 Shane Watson and Suresh Raina are the top-rated all-rounders in the IPL.
 MS Dhoni has been the top run scorer when in comes to batting in death overs in both the innings,
but AB de Villiers has the maximum strike rate while batting in death overs.
 Lasith Malinga has been the highest wicket taker in death overs as well.
 When it comes to bowlers with lowest economy rates, Sunil Narine and R Ashwin have the best
economy rates.

Learning Experience
With the completion of this project, we can take a bag full of innumerous learnings. This project motivated
me to explore and learn more new things, and to push myself beyond my own limits and expectations. It also
helped me in improving my presentation and communication skills.

23
Chapter 5
References
 Python - https://developers.google.com/edu/python/
 Pandas- https://www.youtube.com/watch?v=ZyhVh-qRZPA&list=PL-
osiE80TeTsWmV9i9c58mdDCSskIFdDS
 Data visualization - https://matplotlib.org/gallery/index.html
 Matplotlib - https://www.youtube.com/watch?
v=UO98lJQ3QGI&list=PLosiE80TeTvipOqomVEeZ1HRcEvtZB_

Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
DBMS Total Notes
No ratings yet
DBMS Total Notes
283 pages
Unit-1 Data Science
No ratings yet
Unit-1 Data Science
74 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
DataScience Intro
No ratings yet
DataScience Intro
36 pages
Data Science
No ratings yet
Data Science
6 pages
23STUCHH010864
No ratings yet
23STUCHH010864
24 pages
iMY DATA SCIENCE - Removed
No ratings yet
iMY DATA SCIENCE - Removed
19 pages
1.1 Idml
No ratings yet
1.1 Idml
3 pages
M1.1 DS
No ratings yet
M1.1 DS
57 pages
Carmichael MArron 2018 OJO
No ratings yet
Carmichael MArron 2018 OJO
22 pages
Data Science and Its Importance
No ratings yet
Data Science and Its Importance
9 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
24 pages
Data Science A Beginner S Guide 1668243666
100% (1)
Data Science A Beginner S Guide 1668243666
26 pages
COMPUTATIONAL DATA SCIENCE - UNIT 1
No ratings yet
COMPUTATIONAL DATA SCIENCE - UNIT 1
18 pages
Data Science Techniques AND PREDICTIONS
No ratings yet
Data Science Techniques AND PREDICTIONS
5 pages
Fundamentals of Data Science
No ratings yet
Fundamentals of Data Science
53 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
Data Science
100% (2)
Data Science
33 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
20 pages
Data Science
No ratings yet
Data Science
46 pages
Data science basics
No ratings yet
Data science basics
5 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
Data Science Notes - 1-PD
No ratings yet
Data Science Notes - 1-PD
17 pages
Unit 1
No ratings yet
Unit 1
28 pages
Data Science
No ratings yet
Data Science
3 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Unit 1-FDS
No ratings yet
Unit 1-FDS
18 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
Data Science
No ratings yet
Data Science
3 pages
Data Science-New (Unit-I)
No ratings yet
Data Science-New (Unit-I)
18 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
16 pages
himadev
No ratings yet
himadev
37 pages
Data Science vs. Statistics: Two Cultures?
No ratings yet
Data Science vs. Statistics: Two Cultures?
22 pages
Introduction to Data-Science
No ratings yet
Introduction to Data-Science
246 pages
data science assignment
No ratings yet
data science assignment
4 pages
A Review Paper On Why Data Science Isnt Producing Desired Results and How Can This Be Fixed IJERTV8IS090105
No ratings yet
A Review Paper On Why Data Science Isnt Producing Desired Results and How Can This Be Fixed IJERTV8IS090105
5 pages
Data Science
No ratings yet
Data Science
11 pages
Ch7-Overview of Data Science-part 1
No ratings yet
Ch7-Overview of Data Science-part 1
37 pages
Data Science Essay
No ratings yet
Data Science Essay
2 pages
Data Science Trends Perspectives and Prospects
No ratings yet
Data Science Trends Perspectives and Prospects
22 pages
Data Science Helping in Decision Making
No ratings yet
Data Science Helping in Decision Making
6 pages
Basic of ds
No ratings yet
Basic of ds
14 pages
1) Data-sci Chapter-1
No ratings yet
1) Data-sci Chapter-1
17 pages
Introduction to Data Science Lecture 1
No ratings yet
Introduction to Data Science Lecture 1
4 pages
Data Science S3mca
No ratings yet
Data Science S3mca
55 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
7 pages
PSAI Unit 1
No ratings yet
PSAI Unit 1
70 pages
Fds Module 1
No ratings yet
Fds Module 1
65 pages
Module 1 - What Is Data Science
No ratings yet
Module 1 - What Is Data Science
17 pages
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
100% (1)
OceanofPDF - Com DATA SCIENCE Simple and Effective Tips An - Benjamin Smith
122 pages
BD - eBOOK Big Data Data Scientist
No ratings yet
BD - eBOOK Big Data Data Scientist
11 pages
Session 1819
No ratings yet
Session 1819
47 pages
AI UNIT 1 Data Science
No ratings yet
AI UNIT 1 Data Science
16 pages
Data Science Lecture 1 Introduction
No ratings yet
Data Science Lecture 1 Introduction
27 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Bhavya Khurana
No ratings yet
Bhavya Khurana
21 pages
Data Science
No ratings yet
Data Science
5 pages
Data Science, AI, and Blockchain: Integrated Approaches
From Everand
Data Science, AI, and Blockchain: Integrated Approaches
Ekaaksh Deshpande
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Database Systems: Design, Implementation, and Management: Normalization of Database Tables
No ratings yet
Database Systems: Design, Implementation, and Management: Normalization of Database Tables
53 pages
6th semester result
No ratings yet
6th semester result
2 pages
CH 6 - Object Oriented System Design
No ratings yet
CH 6 - Object Oriented System Design
25 pages
The Future of Urban AI - Global Dialogues
No ratings yet
The Future of Urban AI - Global Dialogues
39 pages
SoumitraSinhaResume (2)
No ratings yet
SoumitraSinhaResume (2)
1 page
Sri Ram - Week 4 Assignment
No ratings yet
Sri Ram - Week 4 Assignment
5 pages
How To Make A Simple Search Engine
No ratings yet
How To Make A Simple Search Engine
2 pages
Full Download Intelligent Multimedia Databases and Information Retrieval Advancing Applications and Technologies 1st Edition Li Yan PDF
100% (9)
Full Download Intelligent Multimedia Databases and Information Retrieval Advancing Applications and Technologies 1st Edition Li Yan PDF
84 pages
Python Engineer problem statements
No ratings yet
Python Engineer problem statements
5 pages
Computer Science Mock
No ratings yet
Computer Science Mock
42 pages
21CSC205P DBMS UNIT I (1)
No ratings yet
21CSC205P DBMS UNIT I (1)
154 pages
Assignment 2 1
No ratings yet
Assignment 2 1
4 pages
Data Science Course in Kochi : Hands-On Learning Approach
No ratings yet
Data Science Course in Kochi : Hands-On Learning Approach
15 pages
Sample practical file TPS XI
No ratings yet
Sample practical file TPS XI
5 pages
Uddin et al (2023)
No ratings yet
Uddin et al (2023)
21 pages
Ajaysen Vangoor JAVAJ2EE - 15+
No ratings yet
Ajaysen Vangoor JAVAJ2EE - 15+
3 pages
Data Entry and Database Reports, By, Angel Naresh 9G
No ratings yet
Data Entry and Database Reports, By, Angel Naresh 9G
10 pages
Data Science: Indian Institute of Management Kozhikode
No ratings yet
Data Science: Indian Institute of Management Kozhikode
13 pages
MAT 161 Lesson - 2
No ratings yet
MAT 161 Lesson - 2
21 pages
B batch Applied machine learning
No ratings yet
B batch Applied machine learning
1 page
Ugc - Net Library and Information Science Paper-Ii
100% (1)
Ugc - Net Library and Information Science Paper-Ii
127 pages
Azure Databricks Brief Introduction
No ratings yet
Azure Databricks Brief Introduction
40 pages
Lisans - Bilgisayar Müh - Müfredat - SEÇMELİ DERSLER - AYBÜ
No ratings yet
Lisans - Bilgisayar Müh - Müfredat - SEÇMELİ DERSLER - AYBÜ
5 pages
Technical Session
No ratings yet
Technical Session
3 pages
Viva Questions
No ratings yet
Viva Questions
2 pages
Data Analysis Projects PDF
No ratings yet
Data Analysis Projects PDF
4 pages
Job Specializations for Computing Professionals
No ratings yet
Job Specializations for Computing Professionals
20 pages
Mohieldeen Hassan CV
No ratings yet
Mohieldeen Hassan CV
2 pages
Resume Anirudh Joshi
No ratings yet
Resume Anirudh Joshi
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Chirag Modi Data Science Report

Uploaded by

Chirag Modi Data Science Report

Uploaded by

A Seminar Report

University College of Engineering, Banswara

12 Highest death runs scorers in both innings of the game. 17

13 Highest wicket takers of the league in death overs. 18

History of Data Science

The modern conception of data science as an independent discipline is sometimes attributed

Technologies and Techniques

The Challenges Faced by Data Scientists

2. Multiple Data Sources

a) Confidential data becoming vulnerable

4. Understanding the Business Problem

5. Effective Communication with Non-Technical Stakeholders

6. Collaboration with Data Engineers

8. Undefined KPIs And Metrics

Therefore, every business should have:

Applications of Data Science

 Fraud and Risk Detection

Objectives of the Internship:

Python – The New Generation Language

Why Python Is a perfect language for Machine Learning?

1. A great library ecosystem –

import matplotlib.pyplot as plt

import matplotlib.pyplot as plt

Plotly is Python graphing library makes interactive, publication-quality graphs. Examples of

Getting clear with the expected outcomes:

Setting up the virtual environment and making datasets ready to use:

Heading to the notebook:

Fig. 2: Pre-processing the datasets

Analysing the most successful teams:

Fig. 3: Total match wins by teams

Fig. 4: Total finals played and won

Fig. 5: CSK vs MI head to head

Fig. 6: Most Man of the Match winners

Highest individual score in the league till now:

Fig. 8: Highest run getters in the league

Fig. 10: Top wicket taking bowlers in the league.

Fig. 14: Bowlers with least economy rate in IPL

Factors contributing to wins:

Fig. 15: Win % while batting first at top 20 venues

Fig. 16: Win % while chasing at top 20 venues

Is toss winner the match winner?

Fig. 19: Is toss winner the match winner?

Observations and results:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.