Show HN: VQAScore – open eval metric/reward model, now for text-to-video

Two years ago we released VQAScore: ask a VLM "does this image show {prompt}?" and use P(Yes) as the score. It became a go-to evaluation metric and reward model for image generation, replacing CLIPScore across the field (2M+ downloads on Hugging Face; used by groups at DeepMind, NVIDIA, ByteDance).

We just added text-to-video evaluation with 20+ VLMs (GPT, Gemini, Qwen). It is free and open-source, and it keeps getting better as the underlying VLMs improve.

Paper: https://arxiv.org/abs/2404.01291

Happy to answer questions and would love feedback.


Comments URL: https://news.ycombinator.com/item?id=48463672

Points: 1

# Comments: 0