LTX-2 Open-Source Model Generates Video from Audio with Gemini Prompt Hack
A developer tested the open-weights LTX-2 diffusion model for audio-to-video generation and found it performs poorly off-the-shelf. They discovered that running audio through Google's Gemini to generate a descriptive prompt, then feeding both into LTX-2, significantly improves output alignment with the audio. This workaround makes the accessible but visually inferior LTX-2 more functional for creators without access to closed models like Sora 2 or Veo 3.1.