Generative AI¶
LLM¶
Large Language Models
Limitations¶
- Bias
- Hallucinations
- Expensive to build & run
ChatGPT¶
- Train supervised policy
- Provide prompt
- Labeler demonstrates desired output behavior
- Fine-tune model
- Collect comparaison data & train reward model
- Prompt and several model outputs are samples
- Labeler ranks outputs from best to worst
- Data used to train reward model
- Policy optimization
GAN¶
Generative Adversarial Networks
flowchart LR
n[/Noise/] ---> g[Generator] --> d
rd[Real Data] -->
d[Discriminator] -->
rf{Real/Fake} -.->
|Backpropagation| d & g