Skip to content

Generative AI

LLM

Large Language Models

Limitations

  • Bias
  • Hallucinations
  • Expensive to build & run

ChatGPT

  1. Train supervised policy
  2. Provide prompt
  3. Labeler demonstrates desired output behavior
  4. Fine-tune model
  5. Collect comparaison data & train reward model
  6. Prompt and several model outputs are samples
  7. Labeler ranks outputs from best to worst
  8. Data used to train reward model
  9. Policy optimization

GAN

Generative Adversarial Networks

flowchart LR
n[/Noise/] ---> g[Generator] --> d
rd[Real Data] -->
d[Discriminator] -->
rf{Real/Fake} -.->
|Backpropagation| d & g
Last Updated: 2024-05-12 ; Contributors: AhmedThahir

Comments