Bonus session on KV Cache, Tooling and WMDP

Efficiency Safety

KV Caching in LLM:

The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

Must know tools for training/finetuning/serving LLM’s -

  1. Torchtune - Build on top of Pytorch, for training and finetuning LLM’s. Uses yaml based configs for easily running experiments. Github -

  2. axolotl - Built on top on Huggigface peft and transformer library, supports fine-tuning a large number for models like Mistral, LLama etc. Provides support for techniques like RLHF, DPO, LORA, qLORA etc. Github

  3. LitGPT - Build on nanoGPT and Megatron, support pre-training and fine-tuning, has examples like Starcoder, TinyLlama etc. Github -

  4. Maxtext - Jax based library for training LLM’s on Google TPU’s with configs for models like Gemma, Mistral and LLama2 etc. Github

  5. Langchain- https://python.langchain.com/docs/get_started/introduction

  6. haystack.deepset.ai
    • https://github.com/deepset-ai/haystack
    • LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it’s best suited for building RAG, question answering, semantic search or conversational agent chatbots.
  7. LlamaIndex
    • https://docs.llamaindex.ai/en/stable/ LlamaIndex supports Retrieval-Augmented Generation (RAG). Instead of asking LLM to generate an answer immediately, LlamaIndex: retrieves information from your data sources first, / adds it to your question as context, and / asks the LLM to answer based on the enriched prompt.
  8. Making Retrieval Augmented Generation Fast
    • https://www.pinecone.io/learn/fast-retrieval-augmented-generation/
  9. OpenMoE
    • https://github.com/XueFuzhao/OpenMoE

More readings

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Retentive Network: A Successor to Transformer for Large Language Models

RWKV: Reinventing RNNs for the Transformer Era

Blog: