multimodal FMs - video / audio

Multimodal

In this session, our readings cover:

Required Readings:

NVLM: Open Frontier-Class Multimodal LLMs

LLMs Meet Multimodal Generation and Editing: A Survey

A Survey on Speech Large Language Models

Video Understanding with Large Language Models: A Survey

Beta Release of Zonos-v0.1

FEBRUARY 10, 2025 PALO ALTO, CALIFORNIA We are excited to announce the release of Zonos-v0.1 beta, featuring two expressive and real-time text-to-speech (TTS) models with high-fidelity voice cloning. We are releasing our 1.6B transformer and 1.6B hybrid under an Apache 2.0 license.