NLP
NLP
Procesamiento de
Lenguaje Natural Summer Camp
Summer Camp
Correo: renato.hermoza@pucp.edu.pe
Web: https://renato145.github.io
Summer Camp
Índice
1. Intro
2. IMDB challenge
3. Classic methods for NLP
4. Deep learning for NLP
5. Language models
6. Fine-tuning
7. Transformer
8. Large language models
9. Future
Summer Camp
0. Some tools
- https://aistudio.google.com
- https://console.anthropic.com
- https://web2md.answer.ai/
1. Intro
IMDB challenge
Classic methods for NLP
Deep learning for NLP
Language models
Fine-tuning
Transformer
Large language models
Future
Summer Camp
1.1 Preguntas
Experiencia con:
● NLP?
● CV?
● DL?
● ML?
● Usando Llms?
Summer Camp
1.2 Setup
- https://colab.google/
- https://github.com/renato145/pucp_bootcamp_202401
Summer Camp
link-to-tweet
Summer Camp
● Accesibilidad
● Modelos abiertos
● Regulaciones
● Problemas éticos
Summer Camp
1.9 Timeline
1997: LSTM (Hochreiter and Jürgen, 1997)
2007: Google translate: SMT
2011: IMDB dataset (Mass et. al., 2011)
2015: ImageNet (Russakovsky et. al., 2015)
2016: Google translate: GNMT
2017: ULMFiT (Howard and Ruder, 2018), Transformer architecture (Vaswani et al., 2017)
2018: ELMo (Peters et al., 2018), GPT-1 (Radford et al., 2018)
2019: GPT-2 (Solaiman et al., 2019)
2020: GPT-3 (Brown et al., 2020)
2023: GPT-4
Summer Camp
1.10 Librerías
● scikit-learn
● hugging-face
● PyTorch
● FastAI
● Axolotl
Intro
2.
1. IMDB challenge
Classic methods for NLP
Deep learning for NLP
Language models
Fine-tuning
Transformer
Large language models
Future
Summer Camp
3.
1. Classic methods for NLP
Deep learning for NLP
Language models
Fine-tuning
Transformer
Large language models
Future
Summer Camp
?
Summer Camp
Document 1 5 4
Document 2 1 2 1
Document 3 2 3 2
…
Summer Camp
link-to-tweet
Summer Camp
Notebook 1
Intro
IMDB challenge
Classic methods for NLP
4.
1. Deep learning for NLP
Language models
Fine-tuning
Transformer
Large language models
Future
Summer Camp
?
Summer Camp
x1 x2 x3 x4 x6 …
…
Summer Camp
Función de
Output y pérdida
Loss
deseado
Summer Camp
Text input
Tokens
RNN Layers
Embeddings Features Model head
(encoder)
Output
Summer Camp
4.7 Timeline
1997: LSTM (Hochreiter and Jürgen, 1997)
2007: Google translate: SMT
2011: IMDB dataset (Mass et. al., 2011)
2015: ImageNet (Russakovsky et. al., 2015)
2016: Google translate: GNMT
2017: ULMFiT (Howard and Ruder, 2018), Transformer architecture (Vaswani et al., 2017)
2018: ELMo (Peters et al., 2018), GPT-1 (Radford et al., 2018)
2019: GPT-2 (Solaiman et al., 2019)
2020: GPT-3 (Brown et al., 2020)
2023: GPT-4
Summer Camp
Notebook 2
Intro
IMDB challenge
Classic methods for NLP
Deep learning for NLP
5.
1. Language models
Fine-tuning
Transformer
Large language models
Future
Summer Camp
La gata
La gata arañó
La gata arañó el
Tokens
AWD_LSTM
Embeddings Features Model head
(encoder)
Next word
Summer Camp
Notebook 3
Intro
IMDB challenge
Classic methods for NLP
Deep learning for NLP
Language models
6.
1. Fine-tuning
Transformer
Large language models
Future
Summer Camp
6.1 Fine-tuning
ImageNet (Russakovsky et. al., 2015)
Summer Camp
6.2 Fine-tuning
Features o
Características
Clasificación
(Red pre-entrenada)
Segmentación
Detección
Survival
....
Convoluciones
Summer Camp
6.3 Fine-tuning
Text input
Tokens
AWD_LSTM
Embeddings Features Model head
(encoder)
Next word
Summer Camp
6.3 Fine-tuning
New task
Text input
AWD_LSTM
Embeddings Features Model head
(encoder)
Next word
Summer Camp
6.4 Timeline
1997: LSTM (Hochreiter and Jürgen, 1997)
2007: Google translate: SMT
2011: IMDB dataset (Mass et. al., 2011)
2015: ImageNet (Russakovsky et. al., 2015)
2016: Google translate: GNMT
2017: ULMFiT (Howard and Ruder, 2018), Transformer architecture (Vaswani et al., 2017)
2018: ELMo (Peters et al., 2018), GPT-1 (Radford et al., 2018)
2019: GPT-2 (Solaiman et al., 2019)
2020: GPT-3 (Brown et al., 2020)
2023: GPT-4
Summer Camp
Notebook 4
Intro
IMDB challenge
Classic methods for NLP
Deep learning for NLP
Language models
Fine-tuning
7.
1. Transformer
Large language models
Future
Summer Camp
7.1 Transformers
Attention Is All You Need (Vaswani et al., 2017).
Summer Camp
7.1 Transformers
Attention Is All You Need (Vaswani et al., 2017).
Summer Camp
7.1 Transformers
Attention Is All You Need (Vaswani et al., 2017).
7.1 Transformers
Attention Is All You Need (Vaswani et al., 2017).
Summer Camp
7.1 Transformers
Attention Is All You Need (Vaswani et al., 2017).
Summer Camp
d dims
Linear
layers
Q1
word-1 K1
V1
Summer Camp
d dims
Linear
layers
Q1
word-1 K1
V1
Linear
layers
Q2
word-2 K2
V2
Summer Camp
d dims
Linear
layers
Q1
word-1 K1
V1
K
Q1
2T
= scalar
Linear
layers
Q2
word-2 K2
V2
Summer Camp
d dims
Linear
layers
Q
V [4xd]
Summer Camp
d dims
Linear
layers
Q
[4x4] V [4xd]
Summer Camp
7.6 Transformers
Input
word-1 word-2 word-3 word-4 word-5 … word-n
Summer Camp
7.6 Transformers
Input
word-1 word-2 word-3 word-4 word-5 … word-n
7.6 Transformers
Input
word-1 word-2 word-3 word-4 word-5 … word-n
7.6 Transformers
Input
word-1 word-2 word-3 word-4 word-5 … word-n
7.6 Transformers
Input
word-1 word-2 word-3 word-4 word-5 … word-n
7.7 Transformers
Attention Is All You Need (Vaswani et al., 2017).
Summer Camp
7.8 Timeline
1997: LSTM (Hochreiter and Jürgen, 1997)
2007: Google translate: SMT
2011: IMDB dataset (Mass et. al., 2011)
2015: ImageNet (Russakovsky et. al., 2015)
2016: Google translate: GNMT
2017: ULMFiT (Howard and Ruder, 2018), Transformer architecture (Vaswani et al., 2017)
2018: ELMo (Peters et al., 2018), GPT-1 (Radford et al., 2018)
2019: GPT-2 (Solaiman et al., 2019)
2020: GPT-3 (Brown et al., 2020)
2023: GPT-4
Summer Camp
7.9 Transformers
From Hugging-Face.
Summer Camp
7.10 Transformers
Notebook 5
Summer Camp
Principales mejoras:
- RoPE: Rotary Positional Embeddings (detalles).
- Alternating Attention.
Summer Camp
8.
1. Large language models
Future
Summer Camp
From Hugging-Face.
Summer Camp
From Hugging-Face.
Summer Camp
From Hugging-Face.
Summer Camp
From OpenAI.
Summer Camp
https://huggingface.co/datasets/xinlai/Math-Step-DPO-10K
Summer Camp
8.5 Timeline
1997: LSTM (Hochreiter and Jürgen, 1997)
2007: Google translate: SMT
2011: IMDB dataset (Mass et. al., 2011)
2015: ImageNet (Russakovsky et. al., 2015)
2016: Google translate: GNMT
2017: ULMFiT (Howard and Ruder, 2018), Transformer architecture (Vaswani et al., 2017)
2018: ELMo (Peters et al., 2018), GPT-1 (Radford et al., 2018)
2019: GPT-2 (Solaiman et al., 2019)
2020: GPT-3 (Brown et al., 2020)
2023: GPT-4
Summer Camp
Notebook 6
Summer Camp
9.
1. Future
Summer Camp
9.1 Future
https://www.fast.ai/p osts/2023-09-04-learning-jumps/
Summer Camp
9.2 Future
Open source LLMs:
● https://huggingface.co/
● https://ai.meta.com/llama/
● https://mistral.ai/
Summer Camp
9.3 Future
https://www.nytimes.com/es/2023/12/27/espanol/new-york-times-demanda-openai-microsoft.html
Summer Camp
9.4 Future
"A framework for understanding unintended consequences of machine learning." (Suresh
and Guttag, 2019).
Summer Camp
9.5 Future
Summer Camp
9.6 Future
Summer Camp
9.7 Future
Ethics for Data Science
https://www.youtube.com/watch?v=krIVOb23EH8
Summer Camp
References
● Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
● Maas, Andrew, et al. "Learning word vectors for sentiment analysis." Proceedings of the 49th annual meeting of the
association for computational linguistics: Human language technologies. 2011.
● Whitelaw, Casey, et al. "Using the web for language independent spellchecking and autocorrection." (2009).
● Russakovsky, Olga, et al. "Imagenet large scale visual recognition challenge." International journal of computer vision 115
(2015): 211-252.
● Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
● Christiano, Paul F., et al. "Deep reinforcement learning from human preferences." Advances in neural information processing
systems 30 (2017).
● Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint
arXiv:1801.06146 (2018).
● Peters, Matthew E. et al. “Deep Contextualized Word Representations.” arXiv preprint arXiv/1802.05365 (2018).
● Radford, Alec, et al. "Improving language understanding by generative pre-training." (2018).
● Solaiman, Irene, et al. "Release strategies and the social impacts of language models." arXiv preprint arXiv:1908.09203 (2019).
● Suresh, Harini, and John V. Guttag. "A framework for understanding unintended consequences of machine learning." arXiv
preprint arXiv:1901.10002 2.8 (2019).
● Brown, Tom, et al. "Language models are few-shot learners." Advances in neural information processing systems 33 (2020):
1877-1901.
● Gu, Albert, and Tri Dao. "Mamba: Linear-time sequence modeling with selective state spaces." arXiv preprint arXiv:2312.00752
(2023).
● Rafailov, Rafael, et al. "Direct preference optimization: Your language model is secretly a reward model." Advances in Neural
Information Processing Systems 36 (2024).