0% found this document useful (0 votes)
55 views

Class Intermediate Data Science

The document provides information about an intermediate data science course taught by Dr. S. Shariq Husain. It outlines the instructor's research interests and teaching areas which include complex systems dynamics, sociophysics, nonlinear dynamics, network science, and machine learning. It also provides a tentative sketch of topics that will be covered in the course, including data science, complex networks, nonlinear dynamics, computational social science, and machine learning approaches.

Uploaded by

NANDINI JAIN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

Class Intermediate Data Science

The document provides information about an intermediate data science course taught by Dr. S. Shariq Husain. It outlines the instructor's research interests and teaching areas which include complex systems dynamics, sociophysics, nonlinear dynamics, network science, and machine learning. It also provides a tentative sketch of topics that will be covered in the course, including data science, complex networks, nonlinear dynamics, computational social science, and machine learning approaches.

Uploaded by

NANDINI JAIN
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Intermediate Data Science

Instructor: Dr. S. Shariq Husain


Assistant Professor, JSGP, JGU,
Sonipat, Haryana, India
Email: syeds.husain@jgu.edu.in
shariq.iitk@gmail.com
My research interest includes but not limited to Complex Systems Dynamics,
Sociophysics/Computational Social Science, Nonlinear Dynamics, Innovation Diffusion,
Statistical Physics, Network Science, Data Analysis, Machine Learning.
I am also extending some work in the domain of Fractional Calculus, System Sciences,
understanding of Malliavin Calculus and have a wide spectrum of teaching interests.
Teaching: Complex Systems, Sociophysics/Computational Social Science, Network Science,
Digital Epidemiology, Data Science and Machine Learning.
Current Ongoing Work:
❏ Data, Complexity and Cities,
❏ Sand: In the making of Civilisation
Tentative Sketch of Overall Course
❏ Data Science and its Implications
❏ Complex Systems
❏ Dynamical Systems
❏ Complex Networks
❏ Non-linear Dynamics/ Differential Equations
❏ Computational Social Science
❏ Digital Epidemiology
❏ Computational Linguistics
❏ Computational Anthropology
❏ Machine Learning Approaches
❏ Deep Learning: Introduction
❏ Reservoir Computing: Introduction
Overview
Introduction
a) Motivational Introduction
b) Importance of data: data during ages
c) Montreal Protocol
d) Tragedy of commons
e) Types of data
f) Case of Japan Whaling,East Coast Fisheries Collapse: Canada
Shell, Ipython, Jupyter, Anaconda: Concepts and Installation
a) variables types and data
b) Introductory Mathematics
c) Introduction to numpy
d) Linear algebra and numpy
e) Calculus: concepts
f) Calculus:operations
d) Statistics:concepts
h) Statistics:operations
Visualizations
a) Introduction to matplotlib
b) Complex network of global terrorism and resilience studies*
c) Network implementation on real data*/ Spatial data
Introduction to pandas
a)Origin
b)Series
c)dataframes….
Predator-prey model/ Lotka-Volterra/Population dynamics modelling , Machine Learning, Innovation diffusion and so on…*.
Different data during different ages

Source:Various available source including web, Also, acknowledge several resources and references for many of the slides from different academic means
Sense of data and information in nature /survival
Another data form
Montreal protocol
The Montreal Protocol is an international treaty that came into existence to protect the
ozone layer.

It puts emphasis to phase out the production of numerous substances that are responsible
for ozone depletion.

This protocol was agreed upon on 16 September 1987, and came into force in January
1989. Following this it underwent, nine revisions, across different places, namely London,
Nairobi, Copenhagen, Bangkok, Vienna, Montreal, Australia, Beijing and Kigali.

As a result of the international agreement, it is found that the ozone hole in Antarctica is
slowly recovering*

*Ref: "Ozone Layer on Track to Recovery: Success Story Should Encourage Action on Climate". UNEP. UNEP. 10 September 2014.
Tragedy of Commons
The tragedy of the commons is a situation in which members or individual users, who have free access to a resource which
is shared in society without any well defined set of rules for governing the access and use, act independently according to
their self interest and, not in accordance with the common good of all other users, results in depletion of the resource due to
their uncoordinated action [Lloyd].

"Commons" may refer to an unregulated resource with open-access, such as the atmosphere, oceans, rivers, ocean fish
stocks

The "tragedy of the commons" often relates with sustainable development,


economic growth and environmental protection at intersection, as well as
discussing global warming. It is also implemented in analysis of behavior
studies in the fields of anthropology, economics, socioeconomics, politics
and sociology.

Ref: Garrett Hardin ; Lloyd


Tragedy of Commons
Journey from data to policy

Insights/unders
Data Information tanding/policy
making
Data science
Data science is a multidisciplinary approach
to extracting actionable insights from the large
Computing/ and ever-increasing volumes of data collected
Machine Data Maths/Statis
learning Scie tics
and created by today’s organizations.
nce
Data science encompasses preparing data for
Domain analysis and processing, performing advanced
Knowledge/R
elevance data analysis, and presenting the results to
reveal patterns and enable stakeholders to
draw informed conclusions.

Fig. inspired by Drew Conway’s venn Diagram, Ref: What is Data Science | IBM
Python has emerged during the last few decades as an excellent tool for scientific and computing tasks.

This includes the analysis and visualization of large datasets

The efficacy of Python for data science originates primarily from the vast and active ecosystem of third-party packages:

NumPy for manipulation of homogeneous array based data,

Pandas for manipulation of heterogeneous and labeled data,

SciPy for common scientific computing tasks,

Matplotlib for publication-quality visualizations,

IPython for interactive execution and sharing of code,

Scikit-Learn for machine learning


Installation
Installing on Windows — Anaconda documentation

Installing on macOS — Anaconda documentation

Ref: anaconda installation guide


Steps
Visuals
continued…
For mac
Scripts for installation
1 shasum -a 256 /PATH/FILENAME

# Replace /PATH/FILENAME with your installation's path and filename.

2 # Include the bash command regardless of whether or not you are using the Bash shell

bash ~/Downloads/Anaconda3-2020.05-MacOSX-x86_64.sh

# Replace ~/Downloads with your actual path

# Replace the .sh file name with the name of the file you downloaded or

# Include the bash command regardless of whether or not you are using the Bash shell

bash ~/Downloads/Anaconda2-2019.10-MacOSX-x86_64.sh

# Replace ~/Downloads with your actual path

# Replace the .sh file name with the name of the file you downloaded
❏ Press Enter to review the license agreement. Then press and hold
Enter to scroll.
❏ Enter “yes” to agree to the license agreement.
❏ Use Enter to accept the default install location
❏ Installation may take a few minutes to complete.
❏ The installer prompts you to choose whether to initialize Anaconda
Distribution by running conda init. recommend entering “yes”.
Launching the Jupyter Notebook

The Jupyter notebook is a browser-based graphical interface to the IPython shell

For executing Python/IPython statements, the notebook allows the user to include formatted text, static and dynamic
visualizations, mathematical equations, etc

Though the IPython notebook is viewed and edited through your web browser window, it must connect to a running
Python process in order to execute code. This process (known as a "kernel") can be started by running the following
command in your system shell:

$ jupyter notebook

This command will launch a local web server that will be visible to your browser. It immediately spits out a log
showing what it is doing; that log will look something like this:

$ jupyter notebook
[NotebookApp] Serving notebooks from local directory: /Users/Downloads/Socialcomquant

[NotebookApp] 0 active kernels

[NotebookApp] The IPython Notebook is running at: http://localhost:8888/

[NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

Upon issuing the command, your default browser should automatically open and navigate to the listed local URL;
the exact address will depend on your system. If the browser does not open automatically, you can open a window
and manually open this address (http://localhost:8888/ in this example).
Introduction to python programming

Programs are made up of commands that a computer can understand.


These commands are called statements, which the computer executes.

The wider picture


The computer itself is assembled from pieces of hardware, including a
processor that can execute instructions and do arithmetic.
There should be a place to store data such as hard drive, and other parts
such as computer monitor, a keyboard.
Network related stuffs etc.
But
how to communicate with these??
For this an operating system is needed

Few examples are Microsoft Windows, Linux, or Mac OS X

An operating system, or OS, is a “special” program on the computer that has direct access
to the hardwares
Python expressions
If we open python terminal the >>> part is called a prompt, because it prompts us to type
something

For ipython line numbering is being displayed

For jupyter notebook cell appears

Python commands are called statements and are simply known as expression statement or
‘expression’

Let’s make use of mathematical operators


Built-in Functions
>>> abs(-9)
>>> int(34.6)
9
>>> round(3.8)
34

4.0 >> float(21)


>>> round(3.3) 21.0
3.0
>>> pow(2, 4)
>>> round(3.5)
4.0
16
Strings
Prominently, numbers are fundamental to computing– Originally, crunching numbers is
what computers were supposed to do—
But
There are other kinds of data in the real world that includes, addresses, pictures, images,
voice and music.
These can be represented as a data type, and knowing how to manipulate those data types
is a big part of programing
Now we will head towards non-numeric data type that represents text, such as the words in
sentences or strands of DNA, text of twitter messages, etc.,
Text data
From arithmetic to text processing

Different chat program “Hi, I’m Nia……”

In Python, a piece of text is represented as a string, which is a sequence of characters


(letters, numbers, and symbols)

String can store characters from the Latin alphabet found on most North American
keyboards.

Another data type called unicode can store strings containing any characters at all,
including Chinese ideograms, and chemical symbols.

x=”yourfirstname”

y=”yourlastname”

x+y or print(x)
List and its modification
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]

nobles = ['helium', 'none', 'argon', 'krypton', 'xenon', 'radon']

nobles[1] = 'neon'

nobles

Q:Create a list having financial elements

And modify the second element with

“currency”

Courtesy:JC
Note: Definition and Usage. The range () function returns a sequence of numbers, starting
from 0 by default, and increments by 1 (by default), and stops before a specified number.

range(start, stop, step)

It generates a sequence of integers in a fashion start, start+step, start+2*step, and so on up


to, but not including, stop
Applying our learning to new settings
Cities = [ ‘New Delhi’, ‘Lucknow’, ‘Mumbai’, ‘Chennai’, ‘Hyderabad’]

Region = [‘ India’, ‘Uttar Pradesh’, ‘Maharashtra’, ‘Tamil Nadu’, ‘Telangana’]


System of Linear Equation and Python
20x+10y=350

17x+22y=500

A = np.array([[20, 10], [17, 22]])

B = np.array([350, 500])

X = np.linalg.solve(A,B)

#X = np.linalg.inv(A).dot(B)

print(X)
Illustration
Product of matrices
Python output of product matrices
Squaring/Powering
Demo, Hands on with Codes
Introducing matplotlib and pandas with applications:

❏ Command: import matplotlib.pyplot as plt and import pandas as pd


❏ Library for visualisation of data and computed data or numerical calculations
whereas pandas is for handling different dimensional data
❏ Statistical plots
❏ Differential equation
❏ Population dynamics together with numpy and scipy
❏ Variation and variability among different variables
❏ Statistical Computation

Numpy with matplotlib example


Quick Demo, Synthetic Data Generation and Storing in pandas
3D Plot, Pandas and Handling High Dimensional Data
An example
With equiangular palette using plotly
What is a
network?
• Networks are collections of points joined by lines.

“Network” ≡ “Graph”

node points lines Domain

vertices edges, arcs math


edge
nodes links computer science

sites bonds physics

actors ties, relations sociology

1
Types of networks
There are several classifications of networks:
❑ According to direction of links: directed or undirected
❑ According to kind of interaction: weighted or unweighted
❑ According to differences between nodes: bipartite or not
Directed and undirected
networks
• The relationship between nodes may be symmetric (undirected
networks) or asymmetric (directed networks).

• The direction of the links is crucial in dynamical processes occurring in


the network, such as information spreading, synchronization or network
robustness.
Weighted and unweighted
networks
The capacity or intensity of the relationship between nodes may be heterogeneous
(weighted networks).

Again, the weight of the links is crucial in dynamical processes occurring in the network,
such as information spreading, synchronization or network robustness.
Bipartite networks
• Networks with two (or more) kind of nodes and links joining
ONLY nodes of unlike type.
• For example, we may have individuals and events
• directors and boards of directors network; movies and actors network;
affiliation and students network; customers and the items they purchase
❑ Despite the different types of networks, which in turn are obtained
from completely different interacting systems (people, neurons,
proteins, routers,...) we will see that they share some universal
properties
Examples:Network Everywhere

▪ Human disease networks


▪ Power grids
▪ Gas networks
▪ Internet router network
▪ World Wide Web
▪ Finance networks
▪ Airline networks
▪ Call networks
▪ Social networks
Each node corresponds to a distinct disorder, colored based on the disorder class to
▪ Protein-protein interaction which it belongs. A link between disorders in the same disorder class is colored with
the corresponding dimmer color and links connecting different disorder classes are
network gray. The size of each node is proportional to the number of genes participating in
the corresponding disorder.
Network Everywhere

▪ Human disease networks


▪ Power grids
▪ Gas networks
▪ Client server network
▪ World Wide Web
▪ Finance networks
▪ Airline networks
▪ Call networks
▪ Social networks
▪ Protein-protein interaction network
Network Everywhere

▪ Human disease networks


▪ Power grids
▪ Gas networks
▪ Client server network
▪ World Wide Web
▪ Finance networks
▪ Airline networks
▪ Call networks
▪ Social networks
▪ Protein-protein interaction
network
Network Everywhere
▪Human disease networks
▪Power grids
▪Gas networks
▪Client server network
▪World Wide Web
▪Finance networks
▪Airline networks
▪Call networks
▪Social networks
▪Protein-protein interaction
network
Network Everywhere

▪Human disease networks


▪Power grids
▪Gas networks
▪Client server network
▪Internet network
▪Finance networks
▪Airline networks
▪Call networks
▪Social networks
▪Protein-protein interaction
network
Network Everywhere

▪Human disease networks


▪Power grids
▪Gas networks
▪Client server network
▪World Wide Web
▪Finance networks
▪Airline networks
▪Call networks
▪Social networks
▪Protein-protein interaction
network
Network Everywhere

▪Human disease networks


▪Power grids
▪Gas networks
▪Client server network
▪World Wide Web
▪Finance networks
▪Airline networks
▪Call networks
▪Social networks
▪Protein-protein interaction
network
Network Everywhere

▪Human disease networks


▪Power grids
▪Gas networks
▪Client server network
▪World Wide Web
▪Finance networks
▪Airline networks
▪Call networks
▪Social networks
▪Protein-protein interaction
network
Network Everywhere

▪Human disease networks


▪Power grids
▪Gas networks
▪Client server network
▪World Wide Web
▪Finance networks
▪Airline networks
▪Call networks
▪Social networks
▪Protein-protein interaction
network
Network Everywhere

▪Human disease networks


▪Power grids
▪Gas networks
▪Internet router network
▪World Wide Web
▪Finance networks
▪Airline networks
▪Call networks
▪Social networks
▪Protein-protein
interaction
Gist: Networks

A network is resulted from connection between various nodes through edges.

Social Network How human/persons interacts

Biological Network How biological units (protein,


DNA) interacts Edge
Node
Transportation How cities are connected via
networks Rail, Road, Air
Example: Indegree & Outdegree
Here, node B is associated with 2 types D
of edges. One type is coming to node B
(edges CB and DB, number of edges
C
coming on node B are 2) whereas other
type is going away from node B(edge
BA, number of edges going away from B
node B is 1) in this directed network.
So for node B, outdegree is 1 and Edge
A
indegree 2. Node
Similarly node D has 2 outdegree and no
indegree.
Konigsberg 7 Bridges Problem
❏ A path in a graph that travels all bridges(nodes) and also has the same
starting and ending point is referred as a walk and is known as Eulerian
Circuit.
❏ Such a circuit exists if, and only if, the graph is connected and all nodes
have even degree.
❏ If there are nodes of odd degree, then any Eulerian path will start at one of
them and end at the other.
❏ Since the graph associated with historical Königsberg has four nodes of
odd degree, it cannot have an Eulerian path.
Source: Online and academic material
Egyptian Revolution and Social Networks

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy