DBMS Merged
DBMS Merged
• Developed by Michael Widenius in 1995. It is named after his daughter name Myia.
• Sun Microsystems acquired MySQL in 2008.
• Oracle acquired Sun Microsystem in 2010.
• MySQL is free and open-source database under GPL. However some enterprise
modules are close sourced and available only under commercial version of MySQL.
• MariaDB is completely open-source clone of MySQL.
• MySQL support multiple database storage and processing engines.
• MySQL versions:
• < 5.5: MyISAM storage engine
• 5.5: InnoDb storage engine
• 5.6: SQL Query optimizer improved, memcached style NoSQL
• 5.7: Windowing functions, JSON data type added for flexible schema
• 8.0: CTE, NoSQL document store.
• MySQL is database of year 2019 (in database engine ranking).
Agenda
DBMS VS RDBMS
SQL
SQL Categories
MySQL
Getting Started with MySQL
Database- Logical & Physical Layout
Data Types
Char vs Varchar vs TEXT
SQL Scripts
Documentation Link
https://dev.mysql.com/doc/refman/8.0/en/introduction.html
DBMS
Excel
Getting Started
- cmd>mysql -u root -p
- u - Username
- p - password
- h - hostname
1/4
Day01.MD 3/28/2022
Program Files
Program data
Steps to Follow
SHOW TABLES;
SQL Categories
1. DDL -> Data Defination Language
Create,Alter,Drop,Truncate Rename
2. DQL -> Data query Language
Select
3. DML -> Data Manipulation Language
Insert,Update,Delete
4. DCL -> Data Control Language
Create user,grant,revoke
2/4
Day01.MD 3/28/2022
--To add the data into the table use below DML query
INSERT INTO stud VALUES(1,"Pradnya",80);
INSERT INTO stud VALUES(2,"Girish",70);
-- To view all the data from the table use below DQL Query
SELECT * FROM stud; --* means all
Datatypes
Numeric Datatypes
tinyint(1 byte)
smallint(2 bytes)
medium int (3 bytes)
int (4 bytes)
bigint(8 bytes)
Float(4 bytes)
Double (8 bytes)
Decimal (m,n)
m-> no of digits
n-> no of digits after decimal eg - DECIMAL(4,2) -> 12.34
String Datatypes
Char(n)
Varchar(n)
tinytext(255)
text(64K)
medium text(16 MB)
longText(4 GB)
Binary DataTypes
TinyBlob, MediumBlob,Blob
Date Datatypes
3/4
Day01.MD 3/28/2022
Misc Datatypes
ENUM
SET
Sql Scripts
4/4
MySQL - RDBMS
Trainer: Mr. Rohan Paramane
• It is always good idea to fetch only required rows (to reduce network traffic).
• The WHERE clause is used to specify the condition, which records to be fetched.
• Relational operators
• <, >, <=, >=, =, != or <>
• NULL related operators
• NULL is special value and cannot be compared using relational operators.
• IS NULL or <=>, IS NOT NULL.
• Logical operators
• AND, OR, NOT
Agenda - Day 02
- DQL
- Computed Columns
- Distinct
- LIMIT
- Order By
- WHERE clause
- Relational Operators
- IN,BETWEEN Operator
- LIKE
- DML-UPDATE
DQL
SELECT DATABASE();
USE classwork;
Distinct
Limit
OrderBy
2/5
Day02_Help.MD 3/29/2022
Where Clause
--Practice Examples
INSERT INTO emp(ename) VALUES('B'),('J'),('K');
--display all the emps whose salaray is not in range of 1000 and 2000
SELECT * FROM emp WHERE NOT sal BETWEEN 1000 AND 2000;
LIKE OPERATOR
--dispaly enames having 4 letter name and having 3rd letter as 'R'
SELECT ename FROM emp WHERE ename LIKE '__R_';
4/5
Day02_Help.MD 3/29/2022
DML- UPDATE
5/5
MySQL - RDBMS
Trainer: Mr. Rohan Paramane
• DELETE
• To delete one or more rows in a table.
• Delete row(s)
• DELETE FROM table WHERE c1=value;
• Delete all rows
• DELETE FROM table
• TRUNCATE
• Delete all rows.
• TRUNCATE TABLE table;
• Truncate is faster than DELETE.
• DROP
• Delete all rows as well as table structure.
• DROP TABLE table;
• DROP TABLE table IF EXISTS;
• Delete database/schema.
• DROP DATABASE db;
• ABS()
• POWER()
• ROUND(), FLOOR(), CEIL()
• ASCII(), CHAR()
• CONCAT()
• SUBSTRING()
• LOWER(), UPPER()
• TRIM(), LTRIM(), RTRIM()
• LPAD(), RPAD()
• REGEXP_LIKE()
• VERSION()
• USER(), DATABASE()
• NULL is special value in RDBMS that represents absence of value in that column.
• NULL values do not work with relational operators and need to use special operators.
• Most of functions return NULL if NULL value is passed as one of its argument.
• ISNULL()
• IFNULL()
• NULLIF()
• COALESCE()
• GREATEST(), LEAST()
Agenda
DML-DELETE
DDL- DROP,TRUNCATE
DUAL
SQL FUNCTIONS
String Functions
Numeric Functions
Date and Time Functions
Flow Control Functions
Group Functions
DELETE-
DDL- DROP,TRUNCATE
DELETE,DROP,TRUNCATE
- DELETE
- IT is a DML Query
- It deletes data based on where clause
- Deleted data can be rolled back.
- TRUNCATE
- It is a DDL Query
1/6
DAY03_Help.MD 3/30/2022
DUAL
HELP
It will help to get the documentation part for any SQL category/Functions etc.
FUNCTIONS
HELP FUNCTIONS
HELP String Functions
String Functions
SELECT UPPER('sunbeam');
SELECT LOWER('SUNBEAM');
SELECT LOWER(ename) AS ename,sal FROM emp;
SELECT SUBSTRING('sunbeam',2);
2/6
DAY03_Help.MD 3/30/2022
SELECT SUBSTRING('sunbeam',-2);
--It will go to postition from right
SELECT SUBSTRING('sunbeam',-5,3);
SELECT LENGTH('sunbeam');
SELECT TRIM("Sunbeam ");
--Homework
--Diplay 16 digit credit card number as 1234 XXXX XXXX 5678
Numeric Functions
SELECT ROUND(1.25);
SELECT ROUND(10.48);--10
SELECT ROUND(10.52);--11
SELECT ROUND(10.1256,2); -- 10.13
SELECT ROUND(178.1256,-2); -- 200
3/6
DAY03_Help.MD 3/30/2022
--If you want the diffence between two dates in terms of Days
SELECT DATEDIFF(NOW(),'2020-03-22');
SELECT DAY(NOW()),MONTH(NOW()),YEAR(NOW());
SELECT HOUR(NOW()),MINUTE(NOW()),SECOND(NOW());
-- dispaly emp,sal,category
--Category sal>2500 RICH
--Category sal<2500 POOR
SELECT ename,sal,CASE
WHEN sal>=2500 THEN 'RICH'
ELSE 'POOR'
END AS Category
FROM emp;
List Functions
Group Functions
5/6
DAY03_Help.MD 3/30/2022
6/6
MySQL - RDBMS
Trainer: Mr. Rohan Paramane
• Join statements are used to SELECT data from multiple tables using single query.
• Typical RDBMS supports following types of joins:
• Cross Join
• Inner Join
• Left Outer Join
• Right Outer Join
• Full Outer Join
• Self join
• The inner JOIN is used to return rows from both tables that satisfy the join condition.
• Non-matching rows from both tables are skipped.
• If join condition contains equality check, it is referred as equi-join; otherwise it is non-equi-join.
• Left outer join is used to return matching rows from both tables along with additional rows in left
table.
• Corresponding to additional rows in left table, right table values are taken as NULL.
• OUTER keyword is optional.
Sunbeam Infotech www.sunbeaminfo.com
Right Outer Join
• Right outer join is used to return matching rows from both tables along with additional rows in right table.
• Corresponding to additional rows in right table, left table values are taken as NULL.
• OUTER keyword is optional.
• Full join is used to return matching rows from both tables along with additional rows in both tables.
• Corresponding to additional rows in left or right table, opposite table values are taken as NULL.
• Full outer join is not supported in MySQL, but can be simulated using set operators.
• UNION operator is used to combine results of two queries. The common data is taken only
once. It can be used to simulate full outer join.
• UNION ALL operator is used to combine results of two queries. Common data is repeated.
Sunbeam Infotech www.sunbeaminfo.com
Self Join
• When join is done on same table, then it is known as "Self Join". The both columns in
condition belong to the same table.
• Self join may be an inner join or outer join.
empno ename deptno mgr empno ename deptno mgr
1 Amit 10 4 1 Amit 10 4
2 Rahul 10 3 2 Rahul 10 3
3 Nilesh 20 4 3 Nilesh 20 4
4 Nitin 50 5 4 Nitin 50 5
5 Sarang 50 NULL 5 Sarang 50 NULL
Agenda
Assignment Submission
Group By
Having Clause
Joins
Assignment Submission
- To start the submission use a new word document for noting down your solution
- Write down the question and solution for it.
- Take a screenshot of the solution that you have got in mysql and paste it below
your solution in word document.
- After completing commit the same on to your git lab account.
- NOTE :- After your assignment is completed on any specific database logically it
should be commited.
- Meaningful message should be given while commiting
- git commit -m "assignmnet 5 on Sales database is completed"
- git commit -m "assignmnet 5 q3 deleted"
1/5
DAY04_HELP.MD 3/31/2022
Group By
Having Clause
--dispaly max sal for each job only if max sal is more than 2500. for emps in
deptno 10 and 20.
SELECT job,MAX(sal) FROm emp WHERE deptno IN(10,20) GROUP BY job;
SELECT job,MAX(sal) FROm emp WHERE deptno IN(10,20) GROUP BY job HAVING
MAX(sal)>2000;
2/5
DAY04_HELP.MD 3/31/2022
JOINS
1. CROSS JOIN
2. INNER JOIN
3/5
DAY04_HELP.MD 3/31/2022
FULL OUTER JOIN = Interection + EXTRA ROWS FROM LEFT TABLE + EXTRA ROWS FROM RIGHT
TABLE
FULL OUTER JOIn does not exists in MySQL but can be simulated using set Operators
4/5
DAY04_HELP.MD 3/31/2022
5. SELF JOIN
5/5
Products Category
pid Name cid Price cid Category
1 p1 1 5000 1 Home applinnce
2 p2 2 2500 2 Watch
3 p3 1
4 p4 2
5 p5 1
6 p6 2
Employee Dept
eno ename sal deptno deptno dname
1 e1 1000 10 10 dept1
2 e2 2000 10 20 dept2
3 e3 3000 20 30 dept3
meeting emp_meeting
meetno meetname Venue empno meetno
100 m1 v1 1 100
200 m2 v2 2 100
300 m3 v3 3 200
4 200
5 200
1 300
Amit DEV 2 300
Amit QA 3 300
Amit OPS 4 300
Amit ACC 5 300
Rahul DEV
Rahul QA
Rahul OPS
Rahul ACC
Nilesh DEV
Nilesh QA
Nilesh OPS
Nilesh ACC
Sarang
MySQL - RDBMS
Trainer: Mr. Rohan Paramane
• Transaction management
• START TRANSACTION;
• …
• COMMIT WORK;
• START TRANSACTION;
• …
• ROLLBACK WORK;
• In MySQL autocommit variable is by default 1. So each DML command is auto-
committed into database.
• SELECT @@autocommit;
• Changing autocommit to 0, will create new transaction immediately after current
transaction is completed. This setting can be made permanent in config file.
• SET autocommit=0;
Agenda
Joins Practice
Non Standard Joins
Security
- DCL
Transactions
- Locking
--Display job ID of jobs that were done by more than 1 employees for more than
1500 days.
SELECT job_id FROM job_history GROUP BY job_id HAVING COUNT(employee_id)>3 and
SUM(datediff(end_date,start_date))>100;
Joins Practice
1/6
DAY05_Help.MD 4/1/2022
--dispaly emp name and their meeting count in desc manner of meeting count
SELECT empno,count(meetno) FROM emp_meeting GROUP BY empno;
2/6
DAY05_Help.MD 4/1/2022
Security
mgr>SHOW GRANTS;
teamlead>SHOW GRANTS;
teamlead>SHOW GRANTS;
teamlead>USE classowrk;
teamlead>SHOW TABLES;
teamlead>SELECT * FROM emp;
dev1>SHOW GRANTS;
dev1>USE classowrk;
dev1>SHOW TABLES;
dev1>DELETE FROM dept WHERE deptno=40; -- cannot do.
Transactions
** TCL COMMANDS
START TRANSACTION
COMMIT
ROLLBACK
3/6
DAY05_Help.MD 4/1/2022
SAVEPOINT
START TRANSACTION;
ROLLBACK;
COMMIT;
START TRANSACTION;
ROLLBACK TO s4;
ROLLBACK TO s2;
ROLLBACK;
4/6
DAY05_Help.MD 4/1/2022
Savepoints
Transaction properties/characteristics
When an user is in a transation, changes done by the user are saved in a temp
table. These changes are visible to that user.
However this temp table is not accessible/visible to other users and hence changes
under progress in a transaction are not visible to other users.
When an user is in a transaction, changes committed by other users are not visible
to him. Because he is dealing with temp data.
ROW LOCKING
- Pessimistic Locking
SELECT * FROM accounts WHERE id = 2 FOR UPDATE;
- TABLE LOCKING
When deleted or updated the table gets locked if primary key does not exists
5/6
DAY05_Help.MD 4/1/2022
-- TABLE LOCKING
--ROW LOCKING
6/6
MySQL - RDBMS
Trainer: Mr. Rohan Paramane
• Multi-row sub-query
• Sub-query returns multiple rows.
• Usually it is compared in outer query using operators like IN, ANY or ALL.
• IN operator compare for equality with results from sub-queries (at least one result should match)..
• ANY operator compares with the results from sub-queries (at least one result should match).
• ALL operator compares with the results from sub-queries (all results should match).
• Correlated sub-query
• If number of results from sub-query are reduced, query performance will increase.
• This can be done by adding criteria (WHERE clause) in sub-query based on outer query row.
• Typically correlated sub-query use IN, ALL, ANY and EXISTS operators.
• Sub queries with UPDATE and DELETE are not supported in all RDBMS.
• -In MySQL, Sub--queries in UPDATE/DELETE is allowed, but sub--query should not
SELECT from the same table, on which UPDATE/DELETE operation is in progress.
• Applications of views
• Security: Providing limited access to the data.
• Hide source code of the table.
• Simplifies complex queries.
Agenda
SubQuery
Views
SubQuery
SELECT * FROM emp WHERE sal = (SELECT DISTINCT sal FROM emp ORDER BY sal DESC
LIMIT 1,1);
SELECT * FROM emp where deptno=(SELECT deptno FROM emp WHERE ename='KING');
1/7
Day06_Help.MD 4/4/2022
SELECT * FROM emp WHERE sal > ALL(SELECT sal FROM emp WHERE job="Salesman");
--(sal>1600 AND sal>1250 AND sal>1500)
SELECT * FROM emp WHERE sal >(SELECT MAX(sal) FROM emp WHERE job="Salesman");
-- display all emps having sal less than 'any' emp in dept = 20;
SELECT sal FROM emp WHERE deptno=20;
SELECT * FROM emp WHERE sal < ANY(SELECT sal FROM emp WHERE deptno=20);
--(sal<800 OR sal<2975 OR sal<3000 OR sal<1100)
AND OR
0 0 - 0 0
0 1 - 0 1
1 0 - 0 1
1 1 - 1 1
SELECT * FROM emp WHERE sal <(SELECT MAX(sal) FROM emp WHERE deptno=20);
ANY vs IN operator
- ANY can be used in sub-queries only, while IN can be used with/without sub-
queries.
- ANY can be used for comparision (<, >, <=, >=, =, or !=), while IN can be used
only for equality comparision (=).
- Both operators are logically similar to OR operator.
- Both can be used for comparision (<, >, <=, >=, =, or !=).
- Both are usable only with sub-queries.
- ANY is similar to logical OR, while ALL is similar to logical AND.
2/7
Day06_Help.MD 4/4/2022
Co-related Subquery
SELECT dname FROM dept WHERE deptno = ANY (SELECT deptno FROM emp);
--10, ACCOUNTING --> SELECT deptno FROM emp - 14 rows
--20, RESEARCH --> SELECT deptno FROM emp - 14 rows
--30, SALES --> SELECT deptno FROM emp - 14 rows
--40, OPERATIONS --> SELECT deptno FROM emp - 14 rows
SELECT * FROM dept d WHERE d.deptno IN(SELECT e.deptno FROM emp e WHERE
e.deptno=d.deptno)
--10, ACCOUNTING --> 10,10,10 - 3 rows
--20, RESEARCH --> 20,20,20,20,20 - 5 rows
--30, SALES --> 30,30,30,30,30,30 - 6 rows
--40, OPERATIONS --> 0 rows
SELECT * FROM dept d WHERE d.deptno IN(SELECT DISTINCT e.deptno FROM emp e WHERE
e.deptno=d.deptno);
SELECT * FROM dept d WHERE d.deptno = (SELECT DISTINCT e.deptno FROM emp e WHERE
e.deptno=d.deptno);
--10, ACCOUNTING --> 10 - 1 row
--20, RESEARCH --> 20 - 1 row
--30, SALES --> 30 - 1 row
--40, OPERATIONS --> 0 rows
Subquery in projection
- Inner-query can be written in FROM clause of SELECT statement. The output of the
inner query is treated as a table (MUST give table alias) and outer query execute
3/7
Day06_Help.MD 4/4/2022
on that table.
- This is called as "Derived table" or "Inline view".
--display emp,sal,category(2000>'Rich',Poor)
SELECT ename, sal, IF(sal > 2000, 'RICH', 'POOR') AS category FROM emp;
SQL Performance
SELECT @@optimizer_switch;
EXPLAIN FORMAT = JSON SELECT dname FROM dept WHERE deptno = ANY (SELECT deptno
4/7
Day06_Help.MD 4/4/2022
FROM emp);
EXPLAIN FORMAT = JSON SELECT * FROM dept d WHERE d.deptno IN(SELECT e.deptno FROM
emp e WHERE e.deptno=d.deptno);
EXPLAIN FORMAT = JSON SELECT * FROM dept d WHERE d.deptno = (SELECT DISTINCT
e.deptno FROM emp e WHERE e.deptno=d.deptno);
EXPLAIN FORMAT = JSON SELECT * FROM dept d WHERE d.deptno IN (SELECT DISTINCT
deptno FROM emp);
Views
SELECT ename, sal, IF(sal > 2000, 'RICH', 'POOR') AS category FROM emp;
CREATE VIEW v_emp_category AS SELECT ename, sal, IF(sal > 2000, 'RICH', 'POOR') AS
category FROM emp;
SHOW TABLES;
DESC v_emp_category;
ALTER VIEW v_emp_category AS SELECT ename, sal, IF(sal > 2500, 'RICH', 'POOR') AS
category FROM emp;
5/7
Day06_Help.MD 4/4/2022
ALTER VIEW v_richemp AS SELECT empno,ename,sal FROM emp WHERE sal>2500 WITH CHECK
OPTION;
--Q4. dipsplay all employees from Accounting dept using cretaed view
SELECT * FROM v_emp_list WHERE dname='ACCOUNTING';
6/7
Day06_Help.MD 4/4/2022
7/7
MySQL - RDBMS
Trainer: Mr. Rohan Paramane
• Indexes should be created on shorter (INT, CHAR, …) columns to save disk space.
• Few RDBMS do not allow indexes on external columns i.e. TEXT, BLOB.
• MySQL support indexing on TEXT/BLOB up to n characters.
• CREATE TABLE test (blob_col BLOB, …, INDEX(blob_col(10)));
• To list all indexes on table:
• SHOW INDEXES FROM table;
• To drop an index:
• DROP INDEX idx_name ON table;
• When table is dropped, all indexes are automatically dropped.
• Indexes should not be created on the columns not used frequent search, ordering or
grouping operations.
• Columns in join operation should be indexed for better performance.
• NOT NULL
• NULL values are not allowed.
• Can be applied at column level only.
• CREATE TABLE table(c1 TYPE NOT NULL, …);
• UNIQUE
• Duplicate values are not allowed.
• NULL values are allowed.
• Not applicable for TEXT and BLOB.
• UNIQUE can be applied on one or more columns.
• Internally creates unique index on the column (fast searching).
• A table can have one or more unique keys.
• Can be applied at column level or table level.
• CREATE TABLE table(c1 TYPE UNIQUE, …);
• CREATE TABLE table(c1 TYPE, …, UNIQUE(c1));
• CREATE TABLE table(c1 TYPE, …, CONSTRAINT constraint_name UNIQUE(c1));
• PRIMARY KEY
• Column or set of columns that uniquely identifies a row.
• Only one primary key is allowed for a table.
• Primary key column cannot have duplicate or NULL values.
• Internally index is created on PK column.
• TEXT/BLOB cannot be primary key.
• If no obvious choice available for PK, composite or surrogate PK can be created.
• Creating PK for a table is a good practice.
• PK can be created at table level or column level.
• CREATE TABLE table(c1 TYPE PRIMARY KEY, …);
• CREATE TABLE table(c1 TYPE, …, PRIMARY KEY(c1));
• CREATE TABLE table(c1 TYPE, …, CONSTRAINT constraint_name PRIMARY KEY(c1));
• CREATE TABLE table(c1 TYPE, c2 TYPE, …, PRIMARY KEY(c1, c2));
Agenda
Indexes
Constraints
Not Null
Unique
Primary Key
Foreign Key
CHECK
ALTER
Indexes
EXPLAIN FORMAT = JSON SELECT * FROM books WHERE subject = 'C Programming';
--1.55
EXPLAIN FORMAT = JSON SELECT * FROM books WHERE subject = 'C Programming';
--0.90
EXPLAIN FORMAT = JSON SELECT e.ename,d.dname FROM emps e INNER JOIN depts d ON
e.deptno=d.deptno;
--2.50
EXPLAIN FORMAT = JSON SELECT e.ename,d.dname FROM emps e INNER JOIN depts d ON
e.deptno=d.deptno;
--2.05
DESC emps;
2. Unique Index
1/9
Day07_Help.MD 4/5/2022
EXPLAIN FORMAT = JSON SELECT * FROM emp WHERE deptno=20 and job='Analyst';
--0.70
2/9
Day07_Help.MD 4/5/2022
4. Clustered Index
It is Create automatically on primary key.
If you dont give primary key then also a hidden column will be added and index will be cretaed
on this column
These indexes are called as clustered index.
DROP INDEX
Constraints
Level Of Constraint
a. Column Level
All the above constraints you can apply on column level
b. Table level
Except Not Null All the constraints you can apply on table level
email CHAR(50),
mobile CHAR(12),
addr CHAR(30),
PRIMARY KEY(id),
UNIQUE (email),
UNIQUE(mobile)
);
1. NOT NULL NULL Values are not allowed in that column Not Null can be applied on column level only
2. UNIQUE
It is going to keep only unique values in that column
Multiple NULL Values are allowed.
Unique Constraint creates Unique indexes automatically
If you create UNIQUE Constraint on two columns thne it will create a composite index.
DESC students;
SHOW INDEXES FROM students;
DROP TABLE students;
3. Primary Key
This constraint is similar to that of Unique(Duplicate values are not allowed) + NOT NULL(Null
values are not allowed).
CREATE TABLE temp4(c1 INT PRIMARY KEY,C2 INT PRIMARY KEY,C3 INT);
-- Multiple primary keys are not allowed.
CREATE TABLE temp4(c1 INT UNIQUE,C2 INT UNIQUE,C3 INT);
-- Multiple uniques are allowed
DESC students;
SHOW INDEXES FROM students;
DROP TABLE students;
4. Foreign Key
DESC emps;
SHOW INDEXES FROM emps;
6/9
Day07_Help.MD 4/5/2022
Homework
5. CHECK
--error
ALTER
-- HOMEWORK
--Add all the data of dept table in dept_backup
--Add all the data of emp table in emp_backup
UPDATE emp_backup e set e.job = (SELECT job FROM emp WHERE empno = e.empno);
Extra
8/9
Day07_Help.MD 4/5/2022
9/9
MySQL - RDBMS
Trainer: Mr. Rohan Paramane
DELIMITER ;
SOURCE /path/to/01_hello.sql
CALL sp_hello();
CALL sp_hello();
SELECT * FROM result;
• -Triggers are supported by all standard RDBMS like Oracle, CREATE TRIGGER
MySQL, etc.
CREATE TRIGGER trig_name
AFTER|BEFORE dml_op ON table
FOR EACH ROW
• Triggers are not supported by WEAK RDBMS like MS--Access. BEGIN
body;
• Like SP/FN, Triggers may contain SQL statements with DROP TRIGGER
programming constructs. They may also call other SP or FN. DROP TRIGGER trig_name;
Agenda
PL/SQL OR PSM
Triggers
PL/SQL OR PSM
Stored Procedure
- In Mysql when ; is found it sends the complete statement to the server for
processing
- this ; is called as Delimiter
- For writing stored procedures you should change the delimiter temporarily
- You can change it by using DELIMITER keyword.
--PSM04 Write a procedure to calculate area of rectangle and insert the area in
result table. Pass the length and breadth from user.
--SET v_area = p_len * p_bre;
1/3
Day08_HELP.MD 4/6/2022
int main(){
int res;
square(5,&res);
printf("square = %d",res);
return 0;
}
Triggers
If you do DML operations on Multiple rows then Trigger will be fired for once for
each row.
INSERT->NEW
DELETE->OLD
UPDATE->NEW,OLD
Triggers are not called by user. They are automatically fired on DML operations
2/3
Day08_HELP.MD 4/6/2022
Extra - Functions
1. Deterministic
2. NOT Deterministic
--SMITH
--CONCAT(UPPER(LEFT(ename,1)),LOWER(SUBSTR(ename,2)))
--Smith
--Write your function as TITLE('SMITH') -> Smith
3/3
MySQL - RDBMS
Trainer: Mr. Rohan Paramane
• The 0th rule is the main rule known as “The foundation rule”.
• For any system that is advertised as, or claimed to be, a relational data base management system,
that system must be able to manage data bases entirely through its relational capabilities.
• Concept of table design: Table, Structure, Data Types, Width, Constraints, Relations.
• Goals:
• Efficient table structure.
• Avoid data redundancy i.e. unnecessary duplication of data (to save disk space).
• Reduce problems of insert, update & delete.
• Done from input perspective.
• Based on user requirements.
• Part of software design phase.
• View entire appln on per transaction basis & then normalize each transaction
separately.
• Transaction Examples:
• Banking, Rail Reservation, Online Shopping.
§ What is JSON?
§ Introduction to MongoDB
§ JSON vs BSON
§ MongoDB
- Fundamentals
§ Basic CRUD operations
§=
Performance optimization → indexing
§ Data Modeling
-
→
json / structure
§ Aggregation Pipeline *
-
§ A database is an organized collection of data, generally stored and accessed electronically from a
computer system
§ A database refers to a set of related data and the way it is organized
§ Database Management System
§ Software that allows users to interact with one or more databases and provides access to all of the data
contained in the database
§ Types
§ RDBMS (SQL)
§ NoSQL
grow 9
yearn '° '
RDMBS
FLI
§ The idea of RDBMS was borne in 1970 by E. F. Codd
§ The language used to query RDBMS systems is SQL (Sequel Query Language )
ansuase
§ RDBMS systems are well suited for structured data held in columns and rows, which can be queried
=
using SQL - ___
§ Supports: DML, DQL, DDL, DTL, DCL
§ The RDBMS systems are based on the concept of ACID transactions
§ Atomic: implies either all changes of a transaction are applied completely or not applied at all
§ Consistent: data is in a consistent state after the transaction is applied
§ Isolated: transactions that are applied to the same set of data are independent of each other
§ Durable: the changes are permanent in the system and will not be lost in case of any failures
, ,
iw
§ Scalability is the ability of a system to expand to meet your business needs
§ E.g. scaling a web app is to allow more people to use your application
§ Types
§ Vertical scaling
§ Add resources within the same logical unit to increase capacity → single point of failure
§ E.g. add CPUs to an existing server, increase memory in the system or expanding storage by adding
hard drives
§ Horizontal scaling
§ Add more nodes to a system
§ E.g. adding a new computer to a distributed software application
F☒
scdir
☒
-_*g
§ Based on principle of distributed computing
§ NoSQL databases are designed for Horizontal scaling
§ So they are reliable, fault tolerant, better performance (at lower cost)
q÷ q%
sync
↑
¥
43
✓
µ,µp
pp
primary
instance
go.na.gg
instance
CAP Theorem
§ Following are the types of NoSQL database based on how it stores the data
§ Key Value =
E-
§ Document DB
§ Column Based DB
§ Graph DB
¥he≈mem◦y
§ E.g.
§ DynamoDB 8 AWS -
serveries
-
§ Redis
§ Couchbase
FEED II
§ ZooKeeper
- -
relationships
iy
§ E.g.
§ MongoDB
§ CouchDB
§ RethinkDB
1÷÷/
3 id : 1
phone
.
: 1234
.
" ""
feeble
|{i}
]
name
phone 2435
:
:
P2 ,
E-
§ Data is stored as columns rather than rows
①
÷÷÷÷*
:÷ 0
-
read →
§ Fast queries for datasets
§ Slower when looking at individual records persons
÷÷:t÷¥:÷"§÷
§ E.g. ""
§ Redshift
§ Cassandra
§ HBase
§ Vertica
y
§ Neo4J _
=
§ Giraph
§ OrientDB
E-
§ High scalability
§ This scaling up approach fails when the transaction rates and fast response requirements increase. In
contrast to this, the new generation of NoSQL databases is designed to scale out (i.e. to expand horizontally
using low-end commodity servers).
§ Manageability and administration
§ NoSQL databases are designed to mostly work with automated repairs, distributed data, and simpler data
models, leading to low manageability and administration.
§ Low cost
§ NoSQL databases are typically designed to work with a cluster of cheap commodity servers, enabling the
users to store and process more data at a low cost.
§ Flexible data models
§ NoSQL databases have a very flexible data model, enabling them to work with any type of data; they don’t
comply with the rigid RDBMS data models. As a result, any application changes that involve updating the
database schema can be easily implemented.
§ Maturity
E-
§ Most NoSQL databases are pre-production versions with key features that are still to be implemented. Thus, when
deciding on a NoSQL database, you should analyze the product properly to ensure the features are fully
implemented and not still on the To-do list.
§ Support
§ Support is one limitation that you need to consider. Most NoSQL databases are from start-ups which were open
sourced. As a result, support is very minimal as compared to the enterprise software companies and may not have
global reach or support resources.
§ Limited Query Capabilities
§ Since NoSQL databases are generally developed to meet the scaling requirement of the web-scale applications,
they provide limited querying capabilities. A simple querying requirement may involve significant programming
expertise.
§ Administration
§ Although NoSQL is designed to provide a no-admin solution, it still requires skill and effort for installing and
maintaining the solution.
§ Expertise
§ Since NoSQL is an evolving area, expertise on the technology is limited in the developer and administrator
community.
ÉÉ
§ When to use NoSQL ? § When NOT to use NoSQL ? -
E-
§ Large amount of data (TBs) § Need ACID transactions
§ Many Read/Write operations § Fixed multiple relations
§ Economical scaling § Need joins
§ Flexible Schema § Need high consistency
§ Examples § Examples
§ Social Media § Financial transactions
§ Recordings § Business operations
§ Geospatial analysis
§ Information processing
SQL NoSQL
Types All types support SQL standard Multiple types exists, such as document stores, key value stores,
column databases, etc
History Developed in 1970 Developed in 2000s
Examples SQL Server, Oracle, MySQL MongoDB, HBase, Cassandra
Data Storage Data is stored in rows and columns in a table, The data model depends on the database type. It could be Key-
Model where each column is of a specific type value pairs, documents etc
Schemas Fixed structure and schema Dynamic schema. Structures can be accommodated
Scalability Scale up approach is used Scale out approach is used
Transactions Supports ACID and transactions Supports partitioning and availability
Consistency Strong consistency Dependent on the product [Eventual Consistency]
Support High level of enterprise support Open source model
Maturity Have been around for a long time Some of them are mature; others are evolving
② MongoDB
JIF-J.fi#BinaoyJ*N-
§ Publicly available in 2009
§ Open-source database which is controlled by 10gen
§ Document oriented database → stores JSON documents
§ Stores data in binary JSON
§ Design Philosophy
§ MongoDB wasn’t designed in a lab and is instead built from the experiences of building large scale, high
-
§ Indexing
§ MongoDB supports generic secondary indexes and provides unique, compound, geospatial, and full-text
indexing capabilities as well
-
§ Secondary indexes on hierarchical structures such as nested documents and arrays are also supported and
enable developers to take full advantage of the ability to model in ways that best suit their applications
- -
§ Aggregation
§ MongoDB provides an aggregation framework based on the concept of data processing pipelines
§ Aggregation pipelines allow you to build complex analytics engines by processing data through a series of
relatively simple stages on the server side, taking full advantage of database optimizations
§ Special collection and index types
§ MongoDB supports time-to-live (TTL) collections for data that should expire at a certain time, such as
sessions and fixed-size (capped) collections, for holding recent data, such as logs
§ File storage
§ MongoDB supports an easy-to-use protocol for storing large files and file metadata
=
Self – Managed / Enterprise Atlas (Cloud) -
Mobile Serverless Query API
-0€
Compass Database Triggers
MongoDB Charts
§ Create a directory somewhere in the disk to store the data [generally on Linux or Mac, create a
directory named data on the root (/): /data/]
ci→:É¥¥
§ mongoexport: Exports the document to Json, CSV format
§ mongoimport: To import some data into the DB ↑
wire tiger
§ mongorestore: to restore anything that you’ve exported
§ mongostat: Statistics of databases
-
-
§ Defined as part of the JavaScript language in the early 2000s by JavaScript creator Douglas
Crockford
§ It wasn’t until 2013 that the format was officially specified
§ JavaScript objects are simple associative containers, wherein a string key is mapped to a value
-
§ APIs
¥__÷÷i
§ Configuration files
§ Log messages .
§ Database storage
§ However, there are several issues that make JSON less than ideal for usage inside of a database
§ JSON is a text-based format, and text parsing is very slow
§ JSON’s readable format is far from space-efficient, another database concern
§ JSON only supports a limited number of basic data types
Key number
{
'
← p
# @ object
"_id":
id :&1,
" "
_
.
{ "award" : "W.W. McDowell Award", "year" : 1967, "by" : "IEEE Computer Society" },
{ "award" : "Draper Prize", "year" : 1993, "by" : "National Academy of Engineering" }
]
} string
key
=
value smÉjaay
E-
§ MongoDB stores data in BSON format both
internally, and over the network
§ Anything you can represent in JSON can be
natively stored in MongoDB
§ String − This is the most commonly used datatype to store the data. String in MongoDB must be UTF-8 valid
'
§ Integer − This type is used to store a numerical value. Integer can be 32 bit or 64 bit depending upon your server
§ Boolean − This type is used to store a boolean (true/ false) value
✗
§ Double − This type is used to store floating point values
§ Min/ Max keys − This type is used to compare a value against the lowest and highest BSON elements
§ Arrays − This type is used to store arrays or list or multiple values into one key
§ Timestamp − ctimestamp. This can be handy for recording when a document has been modified or added
§ Object − This datatype is used for embedded documents
§ Null − This type is used to store a Null value
§ Symbol − it's generally reserved for languages that use a specific symbol type
¥
✗ § Date − This datatype is used to store the current date or time in UNIX time format
§ Object ID − This datatype is used to store the document’s ID
§ Binary data − This datatype is used to store binary data
§ Code − This datatype is used to store JavaScript code into the document
Database : sampkdb
I I / :"
{ " "
- id :
"
. . .
-
"
,
"
id
"
: 2 ,
Database = Database
table = collection
column = attribute / field
row = document
MongoDB Terminology
§ Database
§ This is a container for collections like in RDMS wherein it is a container for tables
§ Each database gets its own set of files on the file system
§ A MongoDB server can store multiple databases
§ Collection
§ This is a grouping of MongoDB documents
§ A collection is the equivalent of a table which is created in any other RDMS such as Oracle or MS SQL
§ Collections don't enforce any sort of structure
§ Document
§ A record in a MongoDB collection is basically called a document
§ The document, in turn, will consist of field name and values
§ Field
-
,.mg , , ,,y
§ Restrictions , , µ
,,
.
men ,
,
Ée¥#
§ Top-level field names cannot start with the dollar sign ($) character
f
"
"
" name
"
: pl ,
"
" "
address :{
"
"
" MH
"
state : ,
,, ,
, , , ,, ,
,
,
, ,
g
g
§ Each document requires a unique _id field that acts as a primary key
§ If an inserted document omits the _id field, the MongoDB driver automatically generates an ObjectId
-
-=#-
§ Behaviors
§ By default, MongoDB creates a unique index on the _id field during the creation of a collection
§ The _id field is always the first field in the documents. If the server receives a document that does not have
the _id field first, then the server will move the field to the beginning.
=
§ The _id field may contain values of any BSON data type, other than an array
§ Autogenerated _id (of type ObjectId) will be of 12 bytes which contains
§ Timestamp: 4 bytes
§ Machine Id: 3 bytes
-
-
§ Counter: 3 bytes
-
-
CRUD operations
:
> show databases
§ Show
✓ the database statistics
> db.stats()
§ Create Collection
> db.createCollection(‘contacts’)
§ Drop Collection
> db.contacts.drop()
=
§ Create many documents
> db.contacts.insertMany([
{ name: ‘contact 1’, address: ‘pune’ },
{ name: ‘contact 2’, address: ‘mumbai’ } db.persons.insert({name: "person1", address: "pune"})
])
§ Note: if you are passing the _id field, make sure that it is unique. If it is not unique, the
-
=
document will not get inserted
I
§ Find documents
> db.contacts.find()
ִփ-
§ next(): returns the next document
§ skip(n): skips first n documents
§ limit(n): limit the result to n
§ count(): returns the count of result
§ toArray(): returns an array of document
§ forEach(fn): Iterates the cursor to apply a JavaScript function to each document from the cursor
§ pretty(): Configures the cursor to display results in an easy-to-read format
§ sort(): sorts documents
§ Shell by default returns 20 records. Press ”it” for more results
=
> db.contacts.find({ name: ‘amit’ })
> db.contacts.find({ name: /amit/ })
: select * from contacts where name = ‘amit’;
: select * from contacts where name like ‘%amit%’;
§ Relational operators
§ $eq, $ne, $gt, $lt, $gte, $lte, $in, $nin
-
- -
§ Logical opeators
§ $and, $or, $nor, $not
✓§ Element operators
§ $exists, $type
§ Evaluation operators
§ Syntax
§ db.<collection>.update(criteria, newObject)
§ E.g.
§ > db.contacts.update({ name: ‘amit’ }, { $set: { address: ’Pune’} })
§ Update operators
§ $set, $inc, $push, $each, $pull
§ In place updates are faster (e.g. $inc) than setting new object
> db.contacts.remove(criteria)
> db.contacts.deleteOne(criteria);
> db.contacts.deleteMany(criteria);
> db.contacts.deleteMany({}); → delete all docs, but not collection
> db.contacts.drop(); → delete all docs & collection as well : efficient
Data Modeling
!¥÷l""I=
.
I"¥Y¥=
:÷%- f.io/ordeoaidfProd#--idfpne../
addressordeoD-eta.IS ✓
have addresses
multiple
usermayplacemuetiple.orders.li
user
may
!
'' "
id
" "
name : .
- : 01 .
"
' '
"
name :
. .
"
'' _ .
email
. .
: . .
.
,
÷ ceseoid :
"
user 1 !
products :{
-1
"
Products product id : : .
"
city
"
: .
. .
price : . _ .
.
I
7
. §
÷
-
Embeddeddatamodel
Normalized data model
=
§ Reduce data duplication
§ Aggregation operations group values from multiple documents together, and can perform a variety of
operations on the grouped data to return a single result
§ Documents enter a multi-stage pipeline that transforms the documents into aggregated results
§ Pipeline
§ The MongoDB aggregation pipeline consists of stages
§ Each stage transforms the documents as they pass through the pipeline
§ Pipeline stages do not need to produce one output document for every input document
§ e.g., some stages may generate new documents or filter out documents
±
¥€ ¥1→¥
Sunbeam Infotech www.sunbeaminfo.com
Operators
IÉ
§ $match: where clause (criteria)
§ $group: group by
§ { $group: { _id: <expr>, <field1>: { <accum1> : <expr1> }, ... } }
§ The possible accumulators are: $sum, $avg, etc.
§ $unwind: extract array elements from array field
§ $lookup: left outer join
§ $out: put result of pipeline in another collection (last operation)
123456,789g
,
:
↑
④ ✓
ÉÉ → ⑤ ✓
Indexes
-
§ Get Indexes
> db.collection.get_indexes()
§ The default name for an index is the concatenation of the indexed keys and each key’s direction in the
index ( i.e. 1 or -1) using underscores as a separator
§ For example, an index created on { item : 1, quantity: -1 } has the name item_1_quantity_-1
pie
§ Unique index
§ TTL index
§ Geospatial indexes
§ Capped collections are fixed sized collections for high-throughput insert and retrieve operations
§ They maintain the order of insertion without any indexing overhead
§ The oldest documents are auto-removed to make a room for new records. The size of collection
should be specified while creation
§ The update operations should be done with index for better performance. If update operation change
size, then operation fails
§ Cannot delete records from capped collections. Can drop collection
§ E.g.
> db.createCollection("logs", { capped: true, size: 4096 });
- - -
-
if size is below 4096, 4096 is considered. Higher sizes are roundup by 256
§ Benefits
=
§ Incredible read/write speed (>10000 operations per seconds)
-
§ Availability of cursor
→ text
f→Js◦
→ document
"
documents → video
audio
→ ✗ MI
/ → data
GridFS
/
→
string
→ number
=
→
boolean
null
/ →
object
array
→
document →
16M€
= ☐¥÷÷÷:
§ GridFS is a specification for storing/retrieving files exceeding 16 MB
§ GridFS stores a file by dividing into chunks of 255 kb
§ When queried back, driver collect the chunks as requested
-
§ Query can be range query. Due to chunks file can be accessed without loading whole file in memory.
§ It uses two collections for storing files i.e. fs.chunks, fs.files
-
_
§ It is also useful to keep files and metadata synced and deployed automatically across geographically
distributed replica set
§ GridFS should not be used when there is need to update contents of entire file atomically
§ It can be accessed using mongofiles tool or compliant client driver
IMB
→
F-¥¥§ }
↓ ☒→
Et
file
§ fs.files
§ _id, length, chunkSize, updateDate, md5, filename, contentType
§ Files can be searched using
§ db.fs.files.find( { filename: myFileName } );
§ db.fs.chunks.find( { files_id: myFileID } ).sort( { n: 1 } )
§ GridFS automatically create indexes for faster search
EE
1¥ e-
mumbai
a
backup
\
¥
←
mon
god §
age Ired_ → read
Replica Sets same data is copied onto multiple locations
go P if one
of the servers is not
retrieved
working
from
,
still the data
instance
can be
=
§ Maintaining copies of data in different data centers can increase data locality and availability for
distributed application
§ You can also maintain additional copies for dedicated purposes, such as disaster recovery, reporting,
or backup.ns
§ A replica set in MongoDB is a group of mongod processes that maintain the same data set
F o
-
§ A replica set contains several data bearing nodes and optionally one arbiter node
secondary
primary secondary
☒
¥7
i:
,
27020
÷
.
=="--
§ The secondaries replicate the primary's oplog and apply the operations to their data sets such that the
secondaries' data sets reflect the primary's data set
§ If the primary is unavailable, an eligible secondary will hold an election to elect itself the new primary
db:mñ"eu_:#É
-
&
warrior "%• >
;
.
§ In case if one of the nodes is failed, there will be always a backup with you
§ There wont be any downtime
§ Along with the backup you also will have an option of load balancing
→ daemon
☐ start mongod processes ( on same
/ different machines]
in cluster
database serves
2) add
secondary mode the
→
primary
cluster name
→
my cluster
✓
7
I ✗
mongo I
27018
7
secondary a
[ secondary 2
☒
-
✓ II n
✓
mongod Mong od
27015 27020
⊖
Sharding
§ Sharding is the process of storing data records across multiple machines and it is MongoDB's
=
approach to meeting the demands of data growth
§ As the size of the data increases, a single machine may not be sufficient to store the data nor provide
an acceptable read and write throughput
§ Sharding solves the problem with horizontal scaling
§ With sharding, you add more machines to support data growth and the demands of read and write
operations _
F÷÷
Sunbeam Infotech www.sunbeaminfo.com
Why Sharding ?
ⁿ
§ In replication, all writes go to master node
-
§ Shards
§ Shards are used to store data
§ They provide high availability and data
consistency
§ In production environment, each shard is a
separate replica set
§ Config Servers
§ Config servers store the cluster's metadata
§ This data contains a mapping of the cluster's data
set to the shards
§ The query router uses this metadata to target
operations to specific shards
§ In production environment, sharded clusters have
exactly 3 config servers ⑧
Sunbeam Infotech www.sunbeaminfo.com
Sharding in MongoDB
§ Query Routers
=
§ Query routers are basically mongo instances, interface with client applications and direct operations to the
- - -
appropriate shard
§ The query router processes and targets the operations to shards and then returns results to the clients
§ A sharded cluster can contain more than one query router to divide the client request load
§ A client sends requests to one query router
§ Generally, a sharded cluster have many query routers
→ WiredTigers
mongo
0 Storage Engine
→ fundamentals : database
,
collection document field
,
,
delete / delete
many
f-
→
aggregation pipeline
→ * ""
" " "→
python
→
stranding → pymongo
→
optimization →
Index =
→ wised Tiger
CEE
Google filesystem -
Series ☒
§ Google BigTable led
§ High performance data storage system built on GFS and other Google technologies ⑤ ☒
§ Master-slave architecture
§ =
One key, multiple values
§ Columnar, SSTable (Sorted String Table) Storage, Append-only, Memtable, Compaction
- - - -
§ Amazon DynamoDb
§
§
Highly available and scalable key-value storage system
Decentralized peer to peer architecture ☐¥É ☒
§ Compromise on consistency for better availability - Eventual consistency *
§ Consistent hashing, Gossip protocol, Replication, Read repair
§ Cassandra
§ Inherited from BigTable and DynamoDb
§ BigTable: Column families, Memtable, SSTable
§ DynamoDb: Consistent hashing, Partitioning, Replication
"
country
Indian
M"f÷
USA 0-5 UK 0.1M
. . .
it≥ ☒
! I
0
±
?⃝?⃝
Introduction
Cassandra
§ Developed by *
Facebook
§ Avinash Laxman (Co-inventor Amazon DynamoDb)
§ Prashant Malik (Technical Leader at Facebook) 1.
§ Goals:
§
§
Distributed NoSQL database (on commodity hardware)
Large amount of structured data rows
÷÷:
§ High availability
like MySQL table columns
§ No single point of failure
✓
§ Basic data model is rows & columns
§ Column-oriented, Decentralized peer to peer & follow Eventual consistency
-
§ Developed in Java
§ 2007-2008 - Developed at Facebook
§ July 2008 - Open sourced by Facebook
§ March 2009 - Apache Incubator project
§ February 2010 - Apache Top-level project
§ 2011 - version 0.8 - Added CQL ndra
Query language
§ 2013 - version 2.0 - Added light-weight transactions, Triggers
=_=
§ 2015 - version 3.0 - Storage engine improved, Materialized views
§ 2020 - version 3.11 - Latest release
§ Prerequisite
§ Java 8 (Java 11 experimental)
- -
macOS
§ Can be installed through apt or yum tool (Ubuntu/CentOS)
→ -
brew
-
§ Manual installation
§ Download Cassandra 3.11.x (.tar.gz) and extract it
#
§ set CASSANDRA_HOME to Cassandra directory 3306
:
§ set JAVA_HOME to JDK 8 directory ☐ MySQL
27017
§ Install python 2.7 (for cqlsh) :
② MongoDB
§ =
set CASSANDRA_HOME/bin into PATH variable
3) Red is : €379
§ Start Cassandra
starts Cassandra serves
§ terminal1> cassandra →
Cassandra : 9042
=
4) .
§ terminal2> cqlsh
-
→ starts shell
§ Less expensive
§ ÷
Supports multiple programming languages Java C# Python . .
. .
: ,
,
§ Cloud Availability
§ Ability to deploy across data centers
§ Fault tolerant -
look
=
§ Range queries on partition key are not supported
§ Not good for too many joins
§=
Not suitable for transactional data
§ During compaction performance / throughput slows down
-
×
responsibility
Responsibilities
[
_
configure
Create service
instance
|
*±
installation
→ E.cz
-
→ configure service
"""""
- maintenance
→ setup and → start using it
-
back-up ogoeston.ae
.
configure
update
/
-
/
→ use in the
RD NOSE
app Dynamo DB
rig
-
✓ -
raysal - Redis
-
MariaDB - Redshift
-
PostsESQL
-
MS SQL
- Server
-
Oracle
?⃝
Performance
§ Performance measures
§ Throughput (operations per second)
§ Latency
§ Cassandra vs MySQL
§ MySQL (more than 50GB data)
§ Write speed: 300 ms
§ Read speed: 350 ms
§ Cassandra (more than 50GB data)
§ Write speed: 0.12 ms
§ Read speed: 15 ms
§ Applications ☐"ˢ%
§ Product catalog/Playlist
§ Recommendation/Personalization engine
§ Sensor/loT data
§ Messaging/Time-series data
§ Fraud detection
§ Customers
§ Facebook, Netflix, eBay, Apple, Walmart, GoDaddy
§ Application requirements
§ Store and handle time-series data
§ Store and handle large volume of data
§ Scale predictably (Linear Scaling)
§ High availability
operation
§ Cassandra uses a gossip protocol to
¥
Cluster
communicate with nodes in a cluster .
3-
✓
2- +
state information of other nodes in the cluster
§ The gossip process runs periodically on each
node and exchanges state information with three
other nodes in the cluster
§ Eventually, information is propagated to all
cluster nodes. Even if there are 1000s of nodes,
information is propagated to all the nodes within
a few seconds
§ LSM Tree
§ Disk based data structure to provide low-cost
§ indexing for a file, in which records are to be inserted at very high rate
lmemtab.LI/meml.-able2J-t. yET
Sunbeam Infotech www.sunbeaminfo.com
Components
=
: cluster
①§ Node ( peer)
uiaeko
§ A Cassandra node is a place where data is stored-
② § Data center
---
§ Data center is a collection of related nodes
③§ Cluster
___
§ A cluster is a component which contains one or more data centers
§ Commit log
#_-Éip
§ In Cassandra, the commit log is a crash-recovery mechanism
§ Every write operation is written to the commit log date / did
⑤§ Mem-table
=
§ A mem-table is a memory-resident data structure
§ After commit log, the data will be written to the mem-table
§ Sometimes, for a single-column family, there will be multiple mem-tables
§ SSTable
§ It is a disk file to which the data is flushed from the mem-table when its contents reach a threshold value
§ Bloom filter
§ These are nothing but quick, nondeterministic, algorithms for testing whether an element is a member of a
set
§ It is a special kind of cache
§ Bloom filters are accessed after every query
-
§ Cassandra provides the Cassandra Query Language (CQL), an SQL-like language, to create and update
database schema and access data
-
§ CQL allows users to organize data within a cluster of Cassandra nodes using:
§ Keyspace
§ Defines how a dataset is replicated, per datacenter
§ Replication is the number of copies saved per cluster. Keyspaces contain tables
§ Table
-
§ Defines the typed schema for a collection of partitions. Tables contain partitions, which contain rows, which
=
contain columns. Cassandra tables= =
can flexibly add new columns to tables with zero downtime.
§ Partition
§ Defines the mandatory part of the primary key all rows in Cassandra must have to identify the node in a
cluster where the row is stored
§ All performant queries supply the partition key in the query.
§ Row
§ Contains a collection of columns identified by a unique primary key made up of the partition key and optionally
additional clustering keys
§ Column
§ A single datum with a type which belongs to a row
§ Users can access Cassandra through its nodes using Cassandra Query Language (CQL)
-
go
in cache
memory
-
request
r ← s
mongo
Introduction
→•÷?-
§ Redis is an open source, advanced key-value store and an apt solution for building high performance,
-
§ Based on data structures: strings, hashes, sets, lists, sorted sets, geospatial, indexes, hyperloglogs.
§ Redis has three main peculiarities that sets it apart
§ Redis holds its database entirely in the memory, using the disk only for persistence
§ Redis has a relatively rich set of data types when compared to many key-value data stores
§ Redis can replicate data to any number of slaves
standalone
Re 1- cluster
policies
§ Data Structure: Based on data structures like Strings, Hashes, Sets
§ Atomic operations: Data is manipulated atomically by multiple clients
§ Supported Languages: Drivers available for C/C++, Java, Python, R, PHP
§ Master/Slave replication: Easy config and fast execution
§ Shading: Distributing across cluster. Based on client driver capability
§ Portable: Developed in C. Work on all UNIX variants. Not supported on Windows
§ Key-value DB, where values can store complex data types with atomic ops
§ Value types are basic data structures made available to programmers without layers of abstraction
§ It is in-memory but persistent store i.e. whole database is maintained in server RAM, only changes
are updated on disk for backup
§ The data storage in disk is in append-only data files
§ Maximum data size is limited to the RAM size
§ On modern systems if Redis is going out of memory, it will start swapping and slow down the system
§ Max memory limit can be configured to raise error on write or evict keys
§ Redis is a different evolution path in the key-value DBs, where values can contain more complex data
types, with atomic operations defined on those data types
§ Redis is an in-memory database but persistent on disk database, hence it represents a different trade
off where very high write and read speed is achieved with the limitation of data sets that can't be
larger than the memory
§ Another advantage of in-memory databases is that the memory representation of complex data
structures is much simpler to manipulate compared to the same data structure on disk. Thus, Redis
can do a lot with little internal complexity
§ To install Redis on Ubuntu, go to the terminal and type the following commands −
§ > sudo apt-get update
§ > sudo apt-get install redis-server
§ Start Redis
§ > redis-server
=
§ Run client
§ > redis-cli
terminating characters
§ Thus, you can store anything up to 512 megabytes in one string
-
§ E.g.
§ redis-cli> set name “amit”
§ redis-cli> get name
"
: name
ey
_
§ A Redis hash is a collection of key value pairs
-
§ Redis Hashes are maps between string fields and string values.
§ Hence, they are used to represent objects
§ Every hash can store up to 232 - 1 field-value pairs (more than 4 billion)
§ E.g.
§
§
1-1-1
redis-cli> hset user:amit username amit password sunbeam address pune
-
§
§ =
redis-cli> heget user:amit username
redis-cli> heget user:amit address
if is missing hget
will return the value as
n
a
key
to null
nil → similar
→ value is not available
§ E.g.
§ redis-cli> lpush colors red
=
§ redis-cli> lpush colors green
§ redis-cli> lpush colors blue
§ redis-cli> lrange colors 0 2
> a
push colors black
1¥
/ push 1-Xi.FI
→ opush
Sets
§ Redis Sorted Sets are similar to Redis Sets, non-repeating collections of Strings
§ The difference is, every member of a Sorted Set is associated with a score, that is used in order to
take the sorted set ordered, from the smallest to the greatest score
§ While members are unique, the scores may be repeated
§ E.g.
§ redis-cli> zadd colors 0 red
=
§ redis-cli> zadd colors 1 green
§ redis-cli> zadd colors 2 blue
§ redis-cli> zrangebyscore colors 0 2
subscriber /
§ Redis Pub/Sub implements the messaging system where
÷ˢ
-
them
- -
☐÷÷÷
P"b"ˢᵗI_
=-
§ The link by which the messages are transferred is
called channel channels
F÷÷÷÷±
0
netflix
¥
×
* ✗
§ SUBSCRIBE channel-pattern
§ UNSUBSCRIBE channel-pattern
§ stop receiving notifications from given channels.
§ UNSUBSCRIBE channel
§ stop receiving notifications from given channel.
§ PUBSUB command
§ monitor pub-sub subsystem
§ e.g. PUBSUB channels
_=
§ Redis transactions allow the execution of a group of commands in a single step
§ All commands in a transaction are sequentially executed as a single isolated operation
§ It is not possible that a request issued by another client is served in the middle of the execution of a
Redis transaction
§ Redis transaction is also atomic. Atomic means either all of the commands or none are processed
§ E.g.
§
mT
redis-cli> multi
¥
§ redis-cli> set username amit
§ redis-cli> set address pune
§ redis-cli> exec
✓
fE lDisaa⑦
response
_
§ The server processes the command and sends the response back to the client
§ The basic meaning of pipelining is, the client can send multiple requests to the server without waiting
for the replies at all, and finally reads the replies in a single step
§ The benefit of this technique is a drastically improved protocol performance
§ The speedup gained by pipelining ranges from a factor of five for connections to localhost up to a
factor of at least one hundred over slower internet connections
sends 637€
§ Redis SAVE command is used to create a backup of the current Redis database
§ Backup will be taken in the redis directory set in the config object
§ e.g.
§ redis-cli> save
§ Alternatively bgsave command can also be used to take a backup, which runs in background
nn¥Ñ
I =- ¥
,
Kostoff
similar to red is
i÷÷÷::
-
§ Terminologies:
E-
.
§ KV Pair - Key (Major & Minor keys), Value : byte array
-
.. ..
§ The KVStore is a collection of Storage Nodes which host a set of Replication Nodes
§ Data is spread across the Replication Nodes
§ Given a traditional three-tier web architecture, the KVStore either takes the place of your back-end
database, or runs alongside it
iii.
hardware
§ It should be, but is not required to be, identical to
all other Storage Nodes within the store
§ Every Storage Node hosts one or more
Replication Nodes as determined by its capacity →☐ I ÷
§ The capacity of a Storage Node serves as a
rough measure of the hardware resources
II. ⑨ ② ③ ④
Mon 80
associated with it ⑤ ⑥ ⑦ ⑧
§ At a very high level, a Replication Node can be thought of as a single database which contains key-
value pairs -
the replicas.
§ Although there can be only one master node at any given time, any of the members of the shard are
capable of becoming a master node. In other words, each shard uses a single master/multiple replica
strategy to improve read throughput and availability.
hard