TDA357-L10-NoSQL,JSON1
TDA357-L10-NoSQL,JSON1
ps name
Databases Programmerade
system
name db teacher students
instance
instance Jonas 160
Duregård
Period 3
Period 2 students
teacher 140
teacher
students
230
Jonas
Duregård
Ana Bove Could be a (directed) graph if
these where connected/merged
Examples of document-based SSD standards
• XML
• Extensible Markup Language
• Created in the 1990's
• Syntax: <tag attribute="value"><other_tag/>some text</tag>
• JSON
• JavaScript Object Notation
• Created in the 2000's
• Collections of key/value pairs, very simple syntax
• Used to various extents in lots of modern DBMS
• Both these are document based, a data set is most naturally described by a text
document rather than a table
• Both are hierarchical, the documents have a tree-structure
XML
• Derived from pre-existing document markup languages
• Compare with HTML: HTML uses tags to format a web-page, XML
uses tags to describe data
• Documents are built from elements, attributes and text
One way to express this tree in XML:
<instance name="Period 2">
instance <teacher>Jonas</teacher>
Period 2
<students>230</students>
students </instance>
teacher
230 Closing tags
Jonas
• Attributes and elements can be mixed however we want. What should we use?
• Having firstname as an attribute and lastname as en element seems odd
• Maybe firstname/lastname should be attributes, and courses elements (the
names feel "attributy" whereas a course feels more "entityish")
<Teacher>
<Firstname>Jonas</Firstname>
<Teacher> <Lastname>Duregård</Lastname>
<Firstname>Jonas</Firstname> <Course>
<Lastname>Duregård</Lastname> <Name>Databases</Name>
<Course>Databases</Course> <Code>TDA357</Code>
</Teacher> </Course>
<Course>
<Name>Programmerade system</Name>
<Code>TDA143</Code>
Extra name element to avoid </Course>
mixing text and elements </Teacher>
Summary: Attributes vs elements
• Advantages of attributes:
• Compact syntax
• Correspond naturally to attributes in relational databases
• Advantages of elements:
• Can represent complex objects (with attributes, subelements etc.)
• Can have arbitrarily many elements with the same tag
• Easily extensible (remember: we are using XML for flexibility!)
• Compare with ER-modelling: Anything that needs to have attributes
of its own can never be an attribute
• Often, elements are used to represent the actual data, while
attributes are used to describe "modifiers" of tags
Is an XML document a database?
• Yes, in a wide sense
• It contains data in a structured, (sort of) persistant manner
• It is very unlike a relational database:
• There is no "XML-server" corresponding to PostgreSQL
• There is no insert operation that adds data into a document
• Documents are either generated by a program or written by hand
• We typically do not write queries on our documents (but we can)
• Documents are processed by programs, using library functions etc.
• We do not have constraints on documents (but they can be validated)
JSON
• Think of JSON documents as Java(Script) objects without any methods
• Objects that can have variables (that are objects or primitive types)
• The "variable names" are called keys in JSON (always strings)
• This document contains an object that has a single variable/key, "Teacher"
• The value of "Teacher" is an object containing three variables of type String
{
"Teacher": {
This (and all JSON)
"Firstname": "Jonas",
is actual JavaScript
Start/end "Lastname": "Jonas",
code!
object "Course": "Databases"
}
} key: value
XML or JSON
• Here is a tiny XML document, and a tiny JSON document
• Notice how they are doing pretty much the same thing?
{
<Teacher> "Teacher": {
<Firstname>Jonas</Firstname> "Firstname": "Jonas",
<Lastname>Duregård</Lastname> "Lastname": "Jonas",
<Course>Databases</Course> "Course": "Databases"
</Teacher> }
}
Four elements and three strings
Two objects, four keys, and three strings
XML or JSON
• Both XML and JSON can be used as semi-structured data formats
• E.g. to receive data from a web server, for data exchange etc.
• Both are used in practice and there are good arguments for using either
• Traditionally this course has taught only XML, the last few year we are
switching focus towards JSON
• It has simpler syntax
• It is growing quickly into the standard data format of the web
• JSON is now used in the Assignment (Task 4)
So what will I need to know for the exam?
• Read and understand XML documents (already done)
• Read, write, validate and query JSON documents
(rest of this weeks lectures, and the exercise on Friday)
• Most old exam questions (pre-2018 or so) about XML can be
translated into corresponding JSON questions (replacing DTD with
JSONSchema and XPath with JSONPath)
Full syntax of JSON
Every JSON document is built from a combination of six types:
Structures:
• Objects: { "key1" : JSON, "key2" : JSON, ...}
Can have 0 or more key:value pairs, values can be any JSON value
• Arrays/lists: [JSON, JSON, ...]
Can have 0 or more items, each can be any JSON value
Literals (usually the "leaves" in the document tree):
• Java-esque strings: "Hello world!\n"
• Numbers: 7, 5.3 Recursive definition
• Booleans: true or false
• null
An example document
[{"city":"Boston", "population":700000}
,{"city": "New York",
"boroughs": [ • An array containg two objects
"The Bronx", • First object has two keys:
"Brooklyn", city (string) and population (number)
"Manhattan",
"Queens",
• Second object has two keys:
"Staten Island" city (string) and boroughs (array)
] • The array in boroughs contains five
} strings
]
Simple paths in JSON [{"city":"Boston", "population":700000}
,{"city": "New York",
"boroughs": [
• We can use Java-like object syntax to "The Bronx",
address sub-values in documents "Brooklyn",
"Manhattan",
• If we call this document d, "Queens",
what is the value of d[1].boroughs[2]? "Staten Island"
]
• Answer: "Manhattan" }
• What about d[0]? ]
id | content
----+------------------------------------------------------
2 | {"prop": {"size": 15434}, "picture": "funnycat.gif"}
Always remember the difference between e.g. a JSON number and an SQL number
Nested access
• Select the size property of all posts
SELECT id, content, content->'prop'->'size' AS postsize
FROM Posts;
id | content | postsize
----+------------------------------------------------------+----------
1 | {"link": "https://xkcd.com/327/", "preview": true} |
2 | {"prop": {"size": 15434}, "picture": "funnycat.gif"} | 15434
Plan:
• Group rows by their author
• for each group use jsonb_agg to create a JSON array item for every post
• use jsonb_build_object to create objects in the array
Table: Posts
Example continued id author …
1 Jonas …
• Group rows by their author
2 Jonas …
• for each group use jsonb_agg to create a JSON array item for 3 sanoJ …
every post
• use jsonb_build_object to create objects in the array
SELECT author, jsonb_agg(jsonb_build_object(
'postid',id,
'user',author)) AS jsondata
FROM Posts
GROUP BY author;
author | jsondata
--------+-----------------------------------------------------------
sanoJ | [{"user":"sanoJ","postid":3}]
Jonas | [{"user":"Jonas","postid":1},{"user":"Jonas","postid": 2}]
Going crazy with correlated queries
• This beautiful query creates a JSON object for each user, containing
their basic information and an array of posts
SELECT A single item in the SELECT, building a massive
json_build_object(
'uid', uname,
JSON object for each row in Users
'email', email,
'posts', (SELECT COALESCE(jsonb_agg(jsonb_build_object(
'postid',id,
'time',created)
Refers to outer query
),'[]') FROM Posts WHERE author = U.uname)
)
FROM Users U;
jsonb_agg gives (SQL) null for empty sets, so coalesce to []
• One row from the result of the query on the last slide:
{"uid" : "Jonas",
"email" : "jonas.duregard@chalmers.se",
"posts" :
[{"time": "2021-11-02T14:52:37.451796",
"postid": 1}
,{"time": "2021-11-02T14:52:37.453225",
"postid": 2}
]
} A number in an object in an array in an object
This is at position x.posts[1].postid
Tells us that one of the posts for user Jonas has ID 2
Regarding the assignment
• In the assignment you are supposed to create a JSON object for a given
student containing lots of information, including lists of passed courses
and such
• Hey, that sounds a lot like the thing on the previous slide!
• It is possible to solve the whole information part of Task 4 using one
glorious query
• Experiment in a .sql file until it works, then move it into Java/Python
• You don’t have to use the JSON features of Postgres at all in the lab, but I
highly recommend it
Tomorrow
• JSON Schema
• Final piece of the puzzle for the assignment!
• JSON Path