0% found this document useful (0 votes)
59 views

Vector DB Guide

Uploaded by

Udaypreet Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Vector DB Guide

Uploaded by

Udaypreet Singh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Beginner’s Guide to

Vector Databases
AI by Hand ✍
Prof. Tom Yeh
Hosted by:

1
Roadmap
+Vector
Database Retrieval

Word Dot Product


Embedding
Sentence
Embedding Transformer
Search Q/A
AI by Hand ✍ 2024 © Tom Yeh 2
Database
Beginner’s Guide to Vector Databases - AI by Hand ✍
Fun fact

There are ___________________ millions dogs in the world!

AI by Hand ✍ 2024 © Tom Yeh 4


How to create a table?
SQL:
__________ TABLE _________
( id __________,
name __________________,
size _________,
pop _________)

id name size pop

AI by Hand ✍ 2024 © Tom Yeh 5


How to insert a record?
SQL:

_____________ INTO animals


_____________ (1, dog, 2, 900)

id name size pop


1 dog 2 900
AI by Hand ✍ 2024 © Tom Yeh 6
Vector Database
Beginner’s Guide to Vector Databases - AI by Hand ✍
How to create a vector database?
SQL: CREATE TABLE animals
( id INT,
name VARCHAR(10),
size INT,
pop INT,
emb _________________ not null )

id name size pop

AI by Hand ✍ 2024 © Tom Yeh 8


How to insert a record with a vector?
SQL:

INSERT INTO animals


VAUES (1, dog, 2, 900, ______________)

id name size pop emb


1 dog 2 900
AI by Hand ✍ 2024 © Tom Yeh 9
Retrieval
Beginner’s Guide to Vector Databases - AI by Hand ✍
Which record is relevant to the query “cat”?

Query

cat id name size pop emb


1 dog 2 900 2 1 0
1 2 0
2 bat 1 10000 0 1 2

AI by Hand ✍ 2024 © Tom Yeh 11


Draw distance vs similarity

distance similarity

AI by Hand ✍ 2024 © Tom Yeh 12


Distance vs similarity on a scale of 1 to 5
asc or desc similarity

asc or desc distance


AI by Hand ✍ 2024 © Tom Yeh 13
How to retrieve by similarity? (dot product)

________ name, emb<___>[__,__,__] AS score


FROM animals
________ BY ______ ASC | DESC ;l

AI by Hand ✍ 2024 © Tom Yeh 14


How to retrieve by distance? (Euclidean)

SELECT name, emb<*>[1, 2, 0] AS score


FROM animals
ORDER BY score DESC;

AI by Hand ✍ 2024 © Tom Yeh 15


Dot Product
Beginner’s Guide to Vector Databases - AI by Hand ✍
How to compute dot product?
Example:

1 2 3 dog 2 1 0
* * * * * *
2 2 0 cat 1 2 0
= = = ∑ = = = ∑
2 4 0 6
Result Result

AI by Hand ✍ 2024 © Tom Yeh 17


How to compute dot product using matrix
multiplication?
Example:
dog
1 2
2 1
3 0

2 2 0 6 cat 1 2 0

AI by Hand ✍ 2024 © Tom Yeh 18


How to compute dot product with multiple
vectors?
Example:
dog bat
1 1 2 0
2 1 1 1
3 1 0 2

2 2 0 6 4 cat 1 2 0 4

AI by Hand ✍ 2024 © Tom Yeh 19


Word Embedding
Beginner’s Guide to Vector Databases - AI by Hand ✍
Where are dog, cat and bat in the “name”
space?

AI by Hand ✍ 2024 © Tom Yeh 21


Where are dog, cat and bat in the “name”
space?

AI by Hand ✍ 2024 © Tom Yeh 22


Which embedding is better?

Embedding 1 Embedding 2

dog cat bat dog cat bat


2 1 0 2 0 1
1 2 1 1 1 0
0 0 2 0 2 2

AI by Hand ✍ 2024 © Tom Yeh 23


Which embedding is better?

dog cat bat dog cat bat dog cat bat


Desired Embedding 1 Embedding 2
2 1 0 2 1 0 2 0 1
dot
product 1 2 1 1 2 1 1 1 0
similarity 0 0 2 0 0 2 0 2 2

dog 2 1 0 H L dog 2 1 0 dog 2 1 0


cat 1 2 0 H L cat 1 2 0 cat 0 1 2
bat 0 1 2 L L bat 0 1 2 bat 1 0 2

AI by Hand ✍ 2024 © Tom Yeh 24


Sentence Embedding
Beginner’s Guide to Vector Databases - AI by Hand ✍
How to embed sentences?

id comment user emb


1 How are you? John ?
2 Who are you? Mary ?

AI by Hand ✍ 2024 © Tom Yeh 26


“How are you” à word embedding vectors
how are you

a an the how why who what are is am be was you we I they she he she me him her
0 -1 0 1 0 1 0 0 -1 1 0 0 0 3 1 0 -1 0 0 0 -1 0
2 0 2 0 0 0 -1 1 0 0 0 2 1 0 2 0 2 0 0 2 0 0
-1 0 -1 1 2 0 0 1 0 1 -1 0 0 -1 0 3 0 0 -1 0 2 -1
0 1 0 0 1 0 1 0 1 0 1 -2 0 0 0 1 0 1 0 1 0 1

AI by Hand ✍ 2024 © Tom Yeh 27


Word vectors à Sentence vector
Method 1: Concatenate

how are you

1 0 0
0 1 1
1 1 0
0 0 0

AI by Hand ✍ 2024 © Tom Yeh 28


Word vectors à Sentence vector
Method 2: Average

how are you

1 0 0
0 1 1
1 1 0
0 0 0

id comment user emb


1 How are you? John
2 Who are you? Mary
AI by Hand ✍ 2024 © Tom Yeh 29
“Who are you” à word embedding vectors
who are you

a an the how why who what are is am be was you we I they she he she me him her
0 -1 0 1 0 1 0 0 -1 1 0 0 0 3 1 0 -1 0 0 0 -1 0
2 0 2 0 0 0 -1 1 0 0 0 2 1 0 2 0 2 0 0 2 0 0
-1 0 -1 1 2 0 0 1 0 1 -1 0 0 -1 0 3 0 0 -1 0 2 -1
0 1 0 0 1 0 1 0 1 0 1 -2 0 0 0 1 0 1 0 1 0 1

0 0
1 1
1 0
0 0
AI by Hand ✍ 2024 © Tom Yeh 30
Word vectors à Sentence vector
Method 2: Average

who are you

1 0 0
0 1 1
0 1 0
0 0 0

id comment user emb


1 How are you? John [1/3, 2/3, 2/3, 0]
2 Who are you? Mary
AI by Hand ✍ 2024 © Tom Yeh 31
How to query by SQL?

________ comment, emb<___>[__,__,__,__] AS score


FROM posts
ORDER BY ______ ASC | DESC ;

AI by Hand ✍ 2024 © Tom Yeh 32


How to query using a high-level API?

query = Query(post_index)
._________(post)
._________(relevance_space.text, Param("________"))

app.query(query, _________ = "who are you?" )

Source: Superlinked.com

AI by Hand ✍ 2024 © Tom Yeh 33


Search
Beginner’s Guide to Vector Databases - AI by Hand ✍
K-Nearest Neighbor, K=3, Dot-Product
Database
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

emb

Query 9 -8 9 9 0 3 1 -6 0 11 3 13 -2 6 15 -9 7 6 -5 8

{ max | min }

AI by Hand ✍ 2024 © Tom Yeh 35


K-Nearest Neighbor, K=3, L2
Database
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

emb

Query 6 8 9 9 1 10 0 9 12 15 2 13 12 6 15 9 7 6 5 8

{ max | min }

AI by Hand ✍ 2024 © Tom Yeh 36


Transformer
Beginner’s Guide to Vector Databases - AI by Hand ✍
How to use a Transformer to get a sentence embedding
vector?

Word Sentence
Embedding Embedding
Vectors Vector
1 0 0
0 1 1
1 1 0
0 0 0

AI by Hand ✍ 2024 © Tom Yeh 38


How to combine across positions?
1
0
1
1 0 0
0 1 1
1 1 0
0 0 0

AI by Hand ✍ 2024 © Tom Yeh 39


How to combine across positions?
1 0
0 1
1 1
1 0 0 1
0 1 1 1
1 1 0 1
0 0 0 0

AI by Hand ✍ 2024 © Tom Yeh 40


How to combine across positions?
1 0 0
0 1 0
1 1 1
1 0 0 1 0
0 1 1 1 2
1 1 0 1 1
0 0 0 0 0

AI by Hand ✍ 2024 © Tom Yeh 41


How to combine across features?

1 0 0
0 1 1
1 1 0
0 0 0
1 1 1

1 0 -1 0 1

AI by Hand ✍ 2024 © Tom Yeh 42


How to combine across features?

1 0 0
0 1 1
1 1 0
0 0 0
1 1 1

1 0 -1 0 1 1 0 1
0 1 1 0 0
AI by Hand ✍ 2024 © Tom Yeh 43
How to combine across positions and
features?
1 0 0
0 1 0
1 1 1
1 0 0 0 0
0 1 1 2 1
1 1 0 1 0
0 0 0 0 0
1 1 1
1 0 -1 0 1

AI by Hand ✍ 2024 © Tom Yeh 44


How to use a Transformer to get a sentence embedding
vector?
1 0 0
Word
Embedding 0 1 0
Vectors 1 1 1
1 0 0 0 0
0 1 1 2 1
1 1 0 1 0 Sentence
Embedding
0 0 0 0 0 Vector
1 1 1
1 0 -1 0 1
0 1 1 0 0 2 3 1
0 0 0 1 1 1 1 1
AI by Hand ✍ 2024 © Tom Yeh 0 0 1 1 0 1 1 0 45
Q/A
Beginner’s Guide to Vector Databases - AI by Hand ✍
AI by Hand ✍ 2024 © Tom Yeh 47

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy