0% found this document useful (0 votes)

17 views

Data Modeling and Data Engineering

Uploaded by

Sakshi Jain

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Data Modeling and Data Engineering

Uploaded by

Sakshi Jain

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Data Modeling and Data Engineering

Content Overview: -
• Data Modeling
o Conceptual Data Modeling
o Logical Data Modeling
o Physical Data Modeling
o Identifying and non-identifying relationships
o Relationship Cardinalities (One to one, one to many, many to
many)
o How to resolve many to many relationships problem

• Data Engineering
o Data Engineering with SQL
o SQL Basics
o Understanding OLTP and OLAP
o Understanding Joins
o Aggregate Functions
o Analytical Functions
Data Modeling
Data Model
A data model is simply a diagram that displays a set of tables and the relationship between
them. We can understand a lot more by looking at a data model diagram than by looking at
a list of tables. This helps us in understanding the purpose of the table as well as their
dependency. A data model is applicable to any software development that involves creation
of database objects, to store and manipulate data. Now this includes transactional systems
as well as data warehouse systems. When the data model is being designed. We progress
through three main stages, they are:
➢ Conceptual data model.
➢ Logical data model.
➢ Physical data model.

Conceptual data model

A conceptual data model is just a set of square shapes connected by a line. The square
shape represents an entity, and the line represents a relationship between the entities. A
conceptual data model can be easily drawn on a whiteboard or a piece of paper. It need not
be a digital document. This makes it easy and quick to change and can be rapidly updated.
So, what are some of the attributes of the conceptual data model. First it is highly abstract.
When we say abstract, we refer to the to the fact that we do not have too much details. It is
at a very high level; hence we call it highly abstract.
It is easily understood. So, whether the user is a technical or a non-technical person. It's
easy for anyone to understand what this model is about.

As you can look at this diagram it's easy to say that there are four main entities- time,
product, sales, and store. All the three entities which is time product and store have a direct
relationship with the sales entity. So that way there is a lot of information that can be easily
obtained by looking at the conceptual data model and since it is not a digital document it
can be easily enhanced and the thing to notice here is only the entities are visible but there
is something else called as attributes, those are not visible, but we will be talking about it in
just a bit. Even the relationships are quite abstract meaning we just know that the product is
connected to sales but what is the column on which the relationship is established that is
not clear yet. So, this is a way of hiding the complexity at the very initial stages and since a
conceptual model can be written on a piece of paper or a whiteboard you really do not need
a software tool to create a conceptual data model. That makes it a whole lot easier. Once
the conceptual data model is finalized, we can elaborate it into a logical data model. So, let's
look at a logical data model.

Logical data model

Logical data model expands the conceptual data model by adding more detail to it and what
are those details. So, first you'll notice the presence of attributes, earlier what we used to be
a simple square shape, now has a list of attributes.
These attributes are further identified as key attributes and non-key attributes. Key
attributes or attributes that define the uniqueness of that entity, such as in the time entity
it's the date. That's a key attribute. Similarly, we have Product ID for product and Store ID
for store. So, in the logical data model you draw a line within each entity.

All the attributes mentioned or displayed above the line form the key attribute and all the
other attributes below the line are called non key attributes. Meaning they do not help in
uniquely identifying the record. An example is the category in the product entity. So,
category something that could repeat for a number of records, hence it's a non-key
attribute and that is why it is listed below the line in this entity. Then we have the primary
key foreign key relationships clearly defined. So, the key attributes that are mentioned here
for each entity can also be used as a primary key and these primary keys are referred as
foreign keys in the sales entity table. As it is apparent from the word FK enclosed within
parenthesis. So, this is a detail that has been added and this was not available in the
conceptual data model. The other thing to notice is the user-friendly attribute names. So,
these are very easily. Any technical or a non-technical person can easily understand what
each of these entities means and to help in the readability. It doesn't take too much time to
understand what each column means because they are self-explanatory. All these changes
that we have done or new things we have added to the logical model, it makes it more
detailed than the conceptual model. At this stage, this logical model is not dependent on
any specific database, meaning you can take this logical model and you can implement it in
any database. It may be Oracle it may be sequel server it could be even a OLAP tools such as
sequel server analysis services and so on. All these additional properties of a logical data
model make it slightly more difficult than a conceptual model to update. Once you have
finalized the logical data model, we go into the last step of a data model design which is a
physical data model.

Physical data model

A physical data model looks a little similar to logical data model however there are some
significant changes. Here, we don't refer to the entities as entities. Instead, we refer to them
as tables and what we used to call as attributes in the logical data model now we refer to as
columns. So, you see tables and columns are words specific to a database whereas entities
and attributes are specific to a logical data model design. So, when we create a physical data
model, we should clearly be referring to these as tables, and columns. The other thing that
you notice is the column names. These column names are no longer user-friendly instead,
they are database compatible names. So, if you have worked on a database, you know that
as a rule you do not use a space when naming a table name or column name. Although you
can use a space, but it becomes very difficult when you're writing queries using those lists of
tables and columns. Hence you avoid using any special characters or any space between the
words.

One other thing that we do is, we try to keep the column length as minimal as possible. So,
as it's evident from here for product, the short form is prod. So, product description has is
now replaced here with prod underscore DESC. So, these are database compatible. This
makes the life of a DBA a lot easier by using names that are fully compatible with the
database as well as any queries that we are going to write. So, the same applies for the table
name as well as the column name and now we have introduced the concept of a data type.
So, these data types mention what is the type of data that is going to be stored in every
column. Here we have VARCHAR, we have integer and float. So, these data types are specific
to a database.
In this example, this physical data model is created for which Microsoft sequel database, so
these data types are specific to sequel server. If you were creating a physical data model for
a different database such as Oracle on my sequel these data types would be different.
Hence, a physical data model is specific to a certain database now this makes it difficult for
users to understand. So, if you are talking about non-technical users, they will have a hard
time understanding what each of these tables mean, what of what these columns mean,
and what are these data types for right. So, usually it's not recommended to share the
physical data model with the users, you only share the logical data model. Now, this has
more details than a logical data model. It makes it even more difficult in order to enhance in
comparison to a logical model.
So, let's assume that you got a sign off on the logical data model and you go ahead and
created a physical data model for a specific database. Now if there are any changes, you first
need to apply those changes in the logical data model and then to the physical data model.
So that's one kind of change which will take time. Other changes let's suppose the database
itself changed now. You're thinking of implementing this entire data instead of sequel server
which means a lot of effort must be involved in converting these data types to something
specific. These are the objects that are very much required to implement a physical data
model.

Identifying and Non-identifying Relationship

Identifying Relationship
As you can see there are two tables- VEHICLE and VEHICLE OWNER. In VEHICLE table Vehicle
ID is a primary key and in VEHICLE OWNER table, there are three primary keys- Vehicle ID,
Vehicle Owner ID as well as Vehicle Ownership start date.

In a data model, parent table and child tables are present. Here VEHICLE tables are parent
table and VEHICLE OWNER table is child table. Parent table and child are connected by a
relationship line. So, these two tables are connected by a relationship line. If the referenced
column in the child table is a part of the primary key in the child table. So, Vehicle_ID is a
referenced column in child table VEHICLE OWNER from parent table VEHICLE and Vehicle_ID
is a part of primary key in VEHICLE OWNER child table. Then relationship is drawn by thick
lines by connecting the parent table and child table. Here VEHICLE table and VEHICLE
OWNER table are connected by a thick line of relationship, which is called as identifying
relationship.
Non-identifying Relationship

Here, parent table is Vehicle manufacturer and child table is Vehicle. If the referenced
column in the child table is not a part of the primary key and standalone column in the child
table. Here you can see Vehicle_ManufacturerID, which is primary key in Vehicle
manufacturer table. But in Vehicle table, Vehicle_ManufacturerID is a foreign key, not a part
of primary key. Then relationship is drawn by dotted lines by connecting these two tables,
which is called as non-identifying relationship.

Relationship Cardinality
Cardinality is a mathematical term that refers to the number of elements in a given set.
Database administrators may use cardinality to count tables and values. In a database,
cardinality usually represents the relationship between the data in two different tables by
highlighting how many times a specific entity occurs compared to another. For example, the
database of an auto repair shop may show that a mechanic works with multiple customers
every day. This means that the relationship between the mechanic entity and the customer
entity is one mechanic to many customers.
However, each customer has exactly one vehicle that they bring to the auto repair shop
during their visit. This means the relationship between the customer entity and the car
entity is a one-to-one relationship. Using cardinality can help database administrators
automatically establish these relationships in a software program or database. This can
make it easy for users to see the correlation between mechanics, customers and cars when
searching for specific data or files.

Importance of Cardinality
Cardinality is important because it creates links from one table or entity to another in a
structured manner. This has a significant impact on the query execution plan. A query
execution plan is a sequence of steps users can take to search for and access data stored in
a database system. Having a well-structured query execution plan can make it easier for
users to locate the data they need quickly. Cardinality can be applied to databases for a
variety of reasons, but businesses typically use the cardinality model to analyze information
about their customers or their inventory.
For example, an online retailer may have a database table that lists each one of its unique
customers. They may also have another database table that lists all the purchases
customers have made from their store. Since it's likely that each customer purchased
multiple items from the store, the database administrator may represent this by using a
one-to-many cardinality relationship that links each customer in the first table to all the
purchases they made in the second table.

Types for Cardinality

➢ One-to-One
➢ One-to Many
➢ Many-to-Many

One-to-One Relationship Cardinality

The ONE-TO-ONE (1:1) RELATIONSHIP defines the fact that one row in a database table
relates to exactly one row in a second table. In an ER diagram, 1 to 1 means that one
occurrence of an entity is related to one event in a second entity.
Examples of the 1 to 1 relationship include student to student contact details, country or
state to capital city, and person to social security or identity number.

The 1 to 1 relationship is notated in an ER diagram with a single line connecting the two
entities. In our scenario, the line connects the Student entity to the Student Contact Details
entity. The two perpendicular lines (|) indicate a mandatory relationship between the two
entities. In other words, the student must have contact details, and the contact details must
have a related student.

One-To-Many Relationship Cardinality

The ONE-TO-MANY (1: N) RELATIONSHIP is the most common database relationship. It is
used to indicate the relationship between the majority of tables found in a relational
database.
In summary, the one-to-many relationship means that one row in a database table relates
to many rows in a second table. It is also known as a Primary Key-Foreign Key relationship
because it uses primary keys and foreign keys to enforce this relationship.
There are innumerable instances of a 1 to N relationship, including a student to subjects,
courses or degrees to a student, and a sales invoice to invoice transactions.
In an E R diagram, the cardinality of one-to-many is notated with a line joining the two
entities. The connectors reflect the different characteristics of this relationship. The single
vertical line (on the left side of this relationship line) indicates that this connector only has
one row affected by this relationship.
The crow’s foot with an open circle indicates that this connector has many rows influenced
by this relationship. The open circle indicates optionality. In other words, there does not
have to be a student enrollment record linked to a course.

Many-to-Many Relationship Cardinality

The MANY-TO-MANY RELATIONSHIP means that many rows in one table are related to many
rows in a second table. In other words, many instances in one entity correlate with many
instances in a second entity. For example, a student can sign up for many classes, and a class
can have many students signed up.
It is slightly more difficult to model a cardinality of many-to-many. A direct many-to-many
relationship between these two example entities is not possible. A cross-reference table is
required to convert this relationship into two one-to-many relationships.

As with the one-to-many relationship described above, the relationship between two
entities is indicated by a line between them. The connectors on each end describe the
nature of this relationship.
The single vertical line (|) on the Students entity side indicates that the connector only has
one row affected by this relationship. And the crow’s foot on the other side of the line
shows that this relationship influences multiple rows.
The middle table (Class Student) consists of two primary/foreign keys, one of which is the
primary key for the Students table and the other the primary key for the Classes table.
Therefore, there must be a StudentID and a ClassID for each row in the Class-Student table.
Because these elements of the Class Student table are also primary keys of the entity on
each side of it, each element has to exist in the Students and Classes tables, respectively.
How to resolve Many-to Many relationships problem
Many-to-many (M: N) relationships add complexity and confusion to your model and to the
application development process. The key to resolve M: N relationships is to separate the
two entities and create two one-to-many (1: N) relationships between them with a
third intersect entity. The intersect entity usually contains attributes from both connecting
entities.

The telephone directory example has a M: N relationship between

the name and fax entities, as shown in figure. The business rules say, “One person can have
zero, one, or many fax numbers; a fax number can be for several people.” Based on what we
selected earlier as our primary key for the voice entity, an M: N relationship exists.
A problem exists in the fax entity because the telephone number, which is designated as the
primary key, can appear more than one time in the fax entity; this violates the qualification
of a primary key. Remember, the primary key must be unique.
To resolve this M: N relationship, you can add an intersect entity between
the name and fax entities, as shown in figure. The new intersect entity, fax name, contains
two attributes, fax_num and rec_num. The primary key for the entity is a composite of both
attributes. Individually, each attribute is a foreign key that references the table from which
it came. The relationship between the name and fax name tables is 1: N because one name
can be associated with many fax numbers; in the other direction, each fax
name combination can be associated with one rec_num. The relationship between
the fax and fax name tables is 1: N because each number can be associated with many fax
name combinations.
Data Engineering
Businesses produce a lot of data. Everything from customer feedback to sales performance
and stock price influences how a company operates. But understanding what stories the
data tells isn’t always easy or intuitive, which is why many businesses rely on data
engineering.
Data engineering is the process of designing and building systems that let people collect and
analyze raw data from multiple sources and formats. These systems empower people to find
practical applications of the data, which businesses can use to thrive.

Importance
Companies of all sizes have huge amounts of disparate data to comb through to answer
critical business questions. Data engineering is designed to support the process, making it
possible for consumers of data, such as analysts, data scientists and executives, to inspect
all the data available reliably, quickly and securely.
Data analysis is challenging because the data is managed by different technologies and
stored in various structures. Yet, the tools used for analysis assume the data is managed by
the same technology and stored in the same structure. This rift can cause headaches for
anybody trying to answer questions about business performance.
For example, consider all the data a brand collects about its customers:

• One system contains information about billing and shipping

• Another system maintains order history
• And other systems store customer support, behavioral information, and third-
party data
Together, this data provides a comprehensive view of the customer. However, these
different datasets are independent, which makes answering certain questions — like what
types of orders result in the highest customer support costs — very difficult.
Data engineering unifies these data sets and lets you find answers to your questions quickly
and efficiently.
Data engineers play a crucial role in designing, operating, and supporting the increasingly
complex environments that power modern data analytics. Once a data set has been fully
cleaned and formatted through data engineering, it’s easier and faster. to read and
understand.

Data Engineering Tools

Data engineers use many different tools to work with data. They use a specialized skill set to
create end-to-end data pipelines that move data from source systems to target
destinations.
Data engineers work with a variety of tools and technologies, including:
• ETL Tools: ETL (extract, transform, load) tools move data between systems. They
access data, then apply rules to “transform” the data through steps that make it
more suitable for analysis.
• SQL: Structured Query Language (SQL) is the standard language for querying
relational databases.
• Python: Python is a general programming language. Data engineers may choose to
use Python for ETL tasks.
• Cloud Data Storage: Including Amazon S3, Azure Data Lake Storage (ADLS), Google
Cloud Storage, etc.

• Query Engines: Engines run queries against data to return answers. Data engineers
may work with engines like Spark, Flink, and others.

Data Engineering with SQL

Table
A table is a collection of related data in an organized manner in the form of rows and
columns. Here is a simple example of a table containing the data about different students
that is their ID, name, age, and course. Each type of information present in the vertical
position are known as columns or fields, and the data of each student is known as row or
record.

Each record holds the total data for a specific student. In this way we can make as many
tables as possible with the different combinations of different data.
A table is an organized arrangement of data and information in the tabular form containing
rows and columns making it easier to understand and compare data.
Database
A database is a collection of multiple tables in a single container. We have a type of
database known as relational database. So, when a database contains multiple tables which
are related to each other in a specific manner then that database is known as a relational
database.

Database is stored on computer and so can be easily modified. Database is a large collection
of data or information specifically a large number of tables organized in a way in which it
can be easily updated or accessed with a computer system.

DBMS (Data Base Management System)

We all know that we need a software, basically an app where we can do things. Similarly in
order to make a database, edit a database or do anything related to a database, we need a
software. This type of software that deals with the database is known as database
management system or DBMS.

A DBMS or a database management system is a software package designed to define,

manipulate, retrieve, and manage data in a database. It helps the users to store and retrieve
data. Retrieve data means accessing the data that is stored so that you can display or copy
it.

Structured Query Language (SQL)

A DBMS or a database management system is a software that work with databases. But we
know that the software could not work on its own. We have to give instructions to it on
what is to be done and how it is to be done. Consider command prompt. It is a software in
our computer in which we write commands and execute it to do a specific task. We write
commands in a specific way and in a specific syntax.
Similarly in a DBMS when we need to do a task on a database, we give commands, and
these commands are known as SQL or structured query language.
So, SQL is nothing, but it is simply a programming language that is a set of syntax and rules
which helps us to give instructions to the DBMS software to work with databases.
Indirectly SQL helps us to communicate with a database through a database management
system. There are many databases management system software which could be used to
manage databases, but they all use a common language that is the structured query
language or SQL.
I hope you are quite clear with the concept of SQL, now let's take a look at the definition of
SQL. It will help you understand it in a better way. Structured Query language or SQL is a
computer language for management of databases and data manipulation. SQL is used to
query, insert, update, and modify data in a database. It contains a lot of commands which a
user can execute to perform operations on a database.

Basic Queries in SQL

➢ Data Definition Language
➢ Data Manipulation Language
➢ Data Control Language
➢ Transaction Control Language

Data Definition Language (DDL)

DDL or data definition language consists of the SQL commands that can be used to define
the database schema. Let's see DDL commands.
Create: - Create command is used to create the database or its objects like table, index,
function, views, stored procedure, and triggers.
Drop: - Drop command is used to delete objects from the database.
Truncate: - Truncate command is used to remove all records from a table.
Alter is used to add, delete, or modify the structure of the database.

Data Manipulation Language (DML)

DML or data manipulation language consists of the SQL commands that deals with the
manipulation of data present in the database.
Insert into: - This command is used to insert data into a table.
Update: - Update is used to update existing data within a table.
Delete: - Delete command is used to delete records from a database table.
Select: - This command helps you to select the attribute based on the condition described
by the Where clause.
Transaction Control Language (TCL)
TCL or transaction control language these commands deal with the transaction within the
database.
Commit: - Commit commands commit a transaction.
Rollback: - Rollback command rollbacks a transaction in case of any error occurs.
Save point: - Save point sets a save point within a transaction.
Set transaction: - Set transaction it specifies characteristics for the transaction.

Data Control Language (DCL)

DCL or data control language it consists of the commands such as grant and revoke which
mainly deals with the rights permissions and other controls of the database system.
Grant: - Grant command gives user access privileges to database.
Revoke-it: - This command withdraws user success privileges given by using the grant
command.

Understanding OLTP and OLAP

OLAP
OLAP or Online Analytical Processing is a category of software tools which provides analysis
of data for business decisions. OLAP systems allow users to analyze database information
from multiple database systems online. We must keep one thing in mind which is the
primary objective of OLAP, or data analysis is not just data processing it has beyond that.
Now, moving ahead we shall consider some of the basic examples of OLAP systems. Any
data warehouse is an example for OLAP system. The uses of OLAP system are as follows a
company might compare their sales in the month of January with the month of February.
Then compare those results with another location which may be stored in a separate
database.
Amazon analyzes purchases made by its customers to come up with a personalized home
page what products which are likely to be interested by their customers, so this is one of the
good examples of OLAP systems.

Advantages of OLAP
• OLAP creates a single platform for all type of business analytical needs.
• The main benefit of OLAP is the consistency of information and calculations.
• Easily apply security restrictions on users and objects to comply with regulations
and protect sensitive data.
Disadvantages of OLAP
• Implementation and maintenance are dependent on IT professionals because the
traditional OLAP tools require a complicated modeling procedure.
• OLAP needs cooperation between people or various departments to be effective
which might always be not possible.

OLTP
OLTP or online transaction processing supports transaction-oriented applications in a three-
tier architecture. OLTP administers day-to-day transaction of an organization. Here we need
to consider one major that is the primary objective of OLTP systems is data processing not
data analysis.
An example of OLTP system is an ATM center. Assume that a couple has a joint account with
a bank. One day both simultaneously reach different ATM centers at precisely the same
time and want to withdraw the total amount present in their bank account. However, the
person that completes authentication process first will be able to get the money. In this case
OLTP systems make sure that withdrawn amount will never be more than the amount
present in the bank.
The key to note here is that OLTP systems are optimized for transactional superiority instead
of data analysis.

Advantages of OLTP
• OLTP method administers daily transactions for an organization
• OLTP widens the customer base of an organization by simplifying individual
processes.

Disadvantages of OLTP
• If OLTP system faces hardware failures, then online transactions get severely
affected.
• OLTP systems allows multiple users to access and change the same data and
same kind which many times created unprecedented situation.
SQL Joins
The information you want to retrieve is often stored in various tables. In such scenarios
you'll need to join these tables to view data in a much better way. This is where SQL join
comes into picture. SQL joins is widely used clause SQL essentially to combine and retrieve
data from two or more tables based on related columns or you can say common fields
between them.

Now consider two tables. Here, table one has three columns, ABC and three records. Let's
say, for reference we'll take them as one two three. Similarly, table two also has three
columns BCD and three records three, four, five. Here, I have taken a different color
combination to represent values, that are present in various columns. Now, instead of
querying each table every time to retrieve data we will simply join these two tables, and this
will be the resultant table three.
Also make sure when you're joining two tables, it should compulsorily have a common
column. Here C is the common field, which forms the basis to join these two tables.

Types of SQL joins

➢ Inner join
➢ Outer join
➢ Left join
➢ Right join

Inner Join
SQL inner join joins two tables based on a common column and selects the records that
have matching values in these columns.

Now when the condition is applied for these columns the query checks all the rows of table
1 and table 2. Only the rows that satisfy the join predicate are included in the resultant
table.
Syntax
SELECT
Table1.column1, Table1.column2, Table2.column1, Table2.column2 and so on
From Table1
INNER JOIN Table2
ON Table1.column = Table2.column

Now inner join syntax basically compares rows of table one with table 2 to check if anything
matches based on the condition provided in the on clause and when the condition is met it
returns matched rows in both tables with the selected columns in the select clause.

Outer Join
SQL outer join or else it is called as SQL full join or full outer join is used to get all the rows
which are present in both the tables. That means it will return all the records which are
present in either left table that is the table 1 or the right table that is table 2. Even, if there
are no matching records present in both the tables.

The syntax remains same that is

SELECT Table1.column1, Table1.column2 and so on up to
Table2.column2
FROM Table1
FULL OUTER JOIN Table2
ON Table1.column = Table2.column

Here you must mention the same or the similar column name after the ON statement .

Left Join
Left join or left outer join results in a table containing all the rows from the table on the left
side of the join, that is the first table and only the rows that satisfy the join condition, from
the table on the right side of the join, that is the second table. Any missing values for the
rows from the right table in the result of the join tables are represented by null values.
Syntax
SELECT column_lists
FROM Table1
LEFT JOIN Table2
ON Table1.column = Table2.column

So, in this way you can use the left join to display the records.

Right Join
Right join or Right outer join is opposite to the left outer join. It follows the same rules as
the left join and the only difference is that all the rows from the right table and only the
conditions satisfying the rows from the left table are present in the resultant table. That
means it will return all the rows from the right table and all the matching records that are
present in the left table.

Syntax
SELECT column_lists
FROM table 1
RIGHT JOIN Table 2
ON Table1.column = Table2.column
Aggregate Function
• Aggregate function is a function where values of multiple rows are grouped
together as input on certain criteria to form a single value of more significant
meaning.
• It returns a single value.
• Aggregate functions are also used to summarize the data.
Some Important Aggregate Functions

COUNT
Basically, we use the count function to count the total number of rows of a particular
column of a table. Also, we can use count on numeric and non-numeric datatype. For
example, you can count salary column which is numeric, and you can also count name
column which is non-numeric.
SYNTAX:

SELECT COUNT (Column_Name)

FROM Table_Name.

SUM
Sum is used to calculate the sum of non-null values over the selected columns. Secondly, we
cannot sum non-numeric values, we can sum only numeric values.
Here the query is same as count function, only difference is we have to use sum at the place
of count.
SYNTAX:

SELECT SUM (Column_Name)

FROM Table_Name;

AVG
We can use this function to calculate average of particular column of numeric type. Like sum
function, average also consider only null values. Basically, average function is equals to sum
function divided by count function.
SYNTAX:

SELECT AVG(Column_Name)
FROM Table_Name;
MIN
• MIN function is used to find the minimum value of a certain column.
• It determines the smallest value of all selected value of a column.
• It can work on both numeric and non-numeric data type.
SYNTAX:

SELECT MIN(Column_Name)
FROM Table_Name;

MAX
• MAX function is used to find the maximum value of a certain column.
• It determines the largest value of all selected value of a column.
• It can work on both numeric and non-numeric data type.
SYNTAX:

SELECT MAX(Column_Name)
FROM Table_Name;

Analytical Functions in SQL

An analytical function computes values over a group of rows and returns a single result for
each row. This is different from aggregate function which returns a single result for a group
of rows.

In this diagram in a table, when you apply analytical function on this group of rows it will
return a single result for each row. On the other hand, when you apply an aggregate
function in a group of rows it will return a single row for each group. This is the key
difference between analytical function and aggregate function. Some analytical functions
are mentioned below.
RANK
The RANK Function in SQL Server is a kind of Ranking Function. This function will assign the
number to each row within the partition of an output. It assigns the rank to each row as one
plus the previous row rank. When the RANK function finds two values that are identical
within the same partition, it assigns them with the same rank number. In addition, the next
number in the ranking will be the previous rank plus duplicate numbers. Therefore, this
function does not always assign the ranking of rows in consecutive order.

Here we have a Demo table with Name column. Let us use RANK function to assign ranks to
the rows in the Demo table. Query for getting the desired output is,
SYNTAX:

SELECT Name, RANK () OVER (ORDER BY Name)

AS Rank_no FROM Demo_table;

In output results you can see that same rank has been given to the same name and plus one
to row number rank is given to next name.

DENSE RANK

The Dense Rank function assigns a unique rank for each row within a partition as per the
specified column value without any gaps. It always specifies ranking in consecutive order. If
we get a duplicate value, this function will assign it with the same rank, and the next rank
being the next sequential number. This characteristic differs DENSE_RANK() function from
the RANK() function.
Consider the following employee table. For example, we have to calculate row no., rank,
dense rank of employees is employee table according to salary within each department.
Query for calculation this will be,
SYNTAX:
SELECT
ROW_NUMBER () OVER (PARTITION BY Dept ORDER BY Salary DESC)
AS emp_row_no, Name, Dept, Salary,
RANK () OVER (PARTITION BY Dept ORDER BY Salary DESC)
AS emp_rank, DENSE_RANK () OVER (PARTITION BY Dept ORDER BY Salary DESC)
AS emp_dense_rank, FROM employee

The output table is result of above query. So, we can see that row numbers are consecutive
integers within each partition. Also, we can see difference between rank and dense rank
that in dense rank there is no gap between rank values while, there is gap in rank values
after repeated rank.

Row Number
Row Number function is used to return the unique sequential number for each row within
its partition. The row numbering begins at one and increases by one until the partition's
total number of rows is reached. It will return the different ranks for the row having similar
values that make it different from the RANK () function.

Consider this employee table to display employee with top five highest salary. The query for
this will be as follows:
SYNTAX:
SELECT Emp_No, Name, Salary,
FROM (SELECT Emp_No, Name, Salary, ROW_NUMBER () OVER (ORDER BY Salary DESC) AS
Row_number FROM employee)
WHERE Row_number<=5;

In output table you can see the row numbers are given in a sequence even though same
salary value.
LAG
Lag function returns previous row data with the current row. If previous row doesn’t exist, it
will display null with the current row. The LAG () function allows access to a value stored in a
different row above the current row. The row above may be adjacent, or some number of
rows above, as sorted by a specified column or set of columns.

Let’s consider the sale table for example and following query with a LAG () function:
SYNTAX:
SELECT Seller_name, Sale_value,
LAG(Sale_value) OVER (ORDER BY Sale_value) as previous_sale_value
FROM sale;

The result of this query is the output table. This simplest use of LAG () displays the value
from the adjacent row above. For example, the second record displays Alice’s sale amount
($12,000) with Stef’s ($7,000) from the row above, in
columns Sale_value and previous_sale_value, respectively. Notice that the first row does
not have an adjacent row above, and consequently the previous_sale_value field is empty
(NULL) since the row from which the value of Sale_value should be obtained does not exist.
LEAD
This function displays next row data with the current row. If, no next row is available,
then LEAD () function will display null with the current row by default.
LEAD () is similar to LAG (). Whereas LAG () accesses a value stored in a row above, LEAD
accesses a value stored in a row below.

SYNTAX:
SELECT Seller_name, Sale_value,
LEAD(Sale_value) OVER (ORDER BY Sale_value) as next_sale_value
FROM sale;
The rows are sorted by the column specified in ORDER BY (Sale_value). The LEAD () function
grabs the sale amount from the row below. For example, Stef’s own sale amount is $7,000
in the column Sale_value, and the column next_sale_value in the same record contains
$12,000. The latter comes from the Sale_value column for Alice, the seller in the next row.
Note that the last row does not have a next row, so the next_sale_value field is empty
(NULL) for the last row.

Preguntas C S4CPB
No ratings yet
Preguntas C S4CPB
7 pages
Task 1 Best and Precise Vocabulary PDF
No ratings yet
Task 1 Best and Precise Vocabulary PDF
5 pages
Principles of Speech Writing
100% (6)
Principles of Speech Writing
31 pages
Unit-II DBMS
No ratings yet
Unit-II DBMS
56 pages
MultidimensionalDataModeling UnitIV
No ratings yet
MultidimensionalDataModeling UnitIV
86 pages
SS2 TERM 1
No ratings yet
SS2 TERM 1
31 pages
Data Modeling: Database Review
No ratings yet
Data Modeling: Database Review
27 pages
Dbm Ss Sssssssss
No ratings yet
Dbm Ss Sssssssss
1 page
Chapter 4 Data Modeling
No ratings yet
Chapter 4 Data Modeling
9 pages
SQL Question From Interview Point of View
No ratings yet
SQL Question From Interview Point of View
61 pages
Database Systems - Lecture 5
No ratings yet
Database Systems - Lecture 5
7 pages
Basis Data - Database Design and SQL
No ratings yet
Basis Data - Database Design and SQL
72 pages
Database Notes
No ratings yet
Database Notes
40 pages
Database Models
No ratings yet
Database Models
3 pages
Chapter 4. Database System Architecture & Modeling
No ratings yet
Chapter 4. Database System Architecture & Modeling
57 pages
Types
No ratings yet
Types
2 pages
Data modeling (1)
No ratings yet
Data modeling (1)
7 pages
SQL Data Analytics: Data Analysts & Data Scientist To Become A Successful
No ratings yet
SQL Data Analytics: Data Analysts & Data Scientist To Become A Successful
62 pages
Chapter 2 Data Models
No ratings yet
Chapter 2 Data Models
75 pages
DBMS
No ratings yet
DBMS
21 pages
Arsalan
No ratings yet
Arsalan
12 pages
2nd Chapter Slide
No ratings yet
2nd Chapter Slide
98 pages
dbms 1
No ratings yet
dbms 1
51 pages
Lecture 4 Database Design Stages
No ratings yet
Lecture 4 Database Design Stages
25 pages
Data Model - Important - Concepts
No ratings yet
Data Model - Important - Concepts
24 pages
Chapter 2 Data Models
No ratings yet
Chapter 2 Data Models
55 pages
UIT FE4 Database
No ratings yet
UIT FE4 Database
13 pages
SQL For Data Analytics
No ratings yet
SQL For Data Analytics
92 pages
Lesson 2 DB
No ratings yet
Lesson 2 DB
42 pages
DBMS Chapter 2
No ratings yet
DBMS Chapter 2
31 pages
Data Science Complete Theory PPT
No ratings yet
Data Science Complete Theory PPT
884 pages
sịch long
No ratings yet
sịch long
28 pages
Database midterm reviewer
No ratings yet
Database midterm reviewer
4 pages
ITDSA2-12 Week 2 2
No ratings yet
ITDSA2-12 Week 2 2
55 pages
Department of Computer Science: Dual Degree Integrated Post Graduate Program
No ratings yet
Department of Computer Science: Dual Degree Integrated Post Graduate Program
31 pages
WK 1 - 2 - DATA MODELS
No ratings yet
WK 1 - 2 - DATA MODELS
12 pages
DBMS Module-1
No ratings yet
DBMS Module-1
36 pages
Databse 421
No ratings yet
Databse 421
8 pages
DB-L4
No ratings yet
DB-L4
60 pages
DBMS Chapter2 Concepts and Architecture
No ratings yet
DBMS Chapter2 Concepts and Architecture
20 pages
Data Medelling
No ratings yet
Data Medelling
76 pages
Chap2
No ratings yet
Chap2
19 pages
Data Models in DBMS
No ratings yet
Data Models in DBMS
5 pages
2020 DBMS
No ratings yet
2020 DBMS
46 pages
Unit 1
No ratings yet
Unit 1
8 pages
DBMS Session 1 & 2
No ratings yet
DBMS Session 1 & 2
32 pages
Week-3 LECTURE Databasemodels
No ratings yet
Week-3 LECTURE Databasemodels
50 pages
Data Modeling
No ratings yet
Data Modeling
8 pages
What Is A Database?
No ratings yet
What Is A Database?
37 pages
Lesson2 Database Models
No ratings yet
Lesson2 Database Models
15 pages
Introduction To Databases
No ratings yet
Introduction To Databases
39 pages
High-Level Conceptual Data Models For Database Design: Unit - 2 Data Modelling Using Entity-Relationship Model
No ratings yet
High-Level Conceptual Data Models For Database Design: Unit - 2 Data Modelling Using Entity-Relationship Model
17 pages
Unit2
No ratings yet
Unit2
56 pages
Database Report
No ratings yet
Database Report
13 pages
DCICT 2 Databases
No ratings yet
DCICT 2 Databases
26 pages
Prelims - Dcit55
No ratings yet
Prelims - Dcit55
4 pages
Introduction To DBMS: What Is A Database?
No ratings yet
Introduction To DBMS: What Is A Database?
18 pages
Database Management System: Arpit Madaan
No ratings yet
Database Management System: Arpit Madaan
35 pages
Introduction Dbms Unit1
No ratings yet
Introduction Dbms Unit1
7 pages
The Elements of A Database
No ratings yet
The Elements of A Database
11 pages
On DBMS
100% (2)
On DBMS
66 pages
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
From Everand
DATABASE From the conceptual model to the final application in Access, Visual Basic, Pascal, Html and Php: Inside, examples of applications created with Access, Visual Studio, Lazarus and Wamp
Olga Maria Stefania Cucaro
No ratings yet
GROKKING ALGORITHMS: Advanced Methods to Learn and Use Grokking Algorithms and Data Structures for Programming
From Everand
GROKKING ALGORITHMS: Advanced Methods to Learn and Use Grokking Algorithms and Data Structures for Programming
Eric Schmidt
No ratings yet
09 - IEB Poetry 2020-2022 - Detailed Annotation of - Touch - by Lewin
No ratings yet
09 - IEB Poetry 2020-2022 - Detailed Annotation of - Touch - by Lewin
2 pages
Lab 1
No ratings yet
Lab 1
8 pages
EFT 2 - Yonny
No ratings yet
EFT 2 - Yonny
15 pages
Jade Programming Tutorial
No ratings yet
Jade Programming Tutorial
20 pages
Pygmalion Study Notes
No ratings yet
Pygmalion Study Notes
37 pages
Feature Engineering
100% (2)
Feature Engineering
76 pages
Python Lab Questions
No ratings yet
Python Lab Questions
1 page
Ibn Arabi Mahdi
No ratings yet
Ibn Arabi Mahdi
19 pages
Metaphor in Software
No ratings yet
Metaphor in Software
2 pages
English Verb Conjugation 4
No ratings yet
English Verb Conjugation 4
2 pages
ATM Test Cases and Defect Report
No ratings yet
ATM Test Cases and Defect Report
2 pages
Thalai Weekly - August 27, 2023
No ratings yet
Thalai Weekly - August 27, 2023
2 pages
Download Satanism Johan Nilsson ebook All Chapters PDF
100% (1)
Download Satanism Johan Nilsson ebook All Chapters PDF
61 pages
Certificates 2025
No ratings yet
Certificates 2025
6 pages
MTE 1 Syllabus
No ratings yet
MTE 1 Syllabus
3 pages
Chapter 1.1 Philosophical Perspective of The Self
No ratings yet
Chapter 1.1 Philosophical Perspective of The Self
69 pages
Um Kerfa
No ratings yet
Um Kerfa
4 pages
Unit 2 Fact of Fiction
100% (1)
Unit 2 Fact of Fiction
23 pages
Hanuman Chalisa (Hindi, Sanskrit, English, Illustrated)
No ratings yet
Hanuman Chalisa (Hindi, Sanskrit, English, Illustrated)
8 pages
Master Theorem
100% (1)
Master Theorem
21 pages
Ic1as L03D01
No ratings yet
Ic1as L03D01
3 pages
Programming Logic and Design: Eighth Edition
No ratings yet
Programming Logic and Design: Eighth Edition
35 pages
Full A Treatise On Universal Algebra With Applications Alfred North Whitehead PDF All Chapters
100% (13)
Full A Treatise On Universal Algebra With Applications Alfred North Whitehead PDF All Chapters
60 pages
Learning Python by Example ACE INTL Nodrm
No ratings yet
Learning Python by Example ACE INTL Nodrm
331 pages
Rubrics
No ratings yet
Rubrics
10 pages
The Hounds of Baskerville CHAPTER 15
No ratings yet
The Hounds of Baskerville CHAPTER 15
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Modeling and Data Engineering

Uploaded by

Data Modeling and Data Engineering

Uploaded by

Data Modeling and Data Engineering

Conceptual data model

Logical data model

Physical data model

Identifying and Non-identifying Relationship

Types for Cardinality

One-to-One Relationship Cardinality

One-To-Many Relationship Cardinality

Many-to-Many Relationship Cardinality

The telephone directory example has a M: N relationship between

• One system contains information about billing and shipping

Data Engineering Tools

Data Engineering with SQL

DBMS (Data Base Management System)

A DBMS or a database management system is a software package designed to define,

Structured Query Language (SQL)

Basic Queries in SQL

Data Definition Language (DDL)

Data Manipulation Language (DML)

Data Control Language (DCL)

Understanding OLTP and OLAP

Types of SQL joins

The syntax remains same that is

SELECT COUNT (Column_Name)

SELECT SUM (Column_Name)

Analytical Functions in SQL

SELECT Name, RANK () OVER (ORDER BY Name)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.