ActAlign: Zero-Shot Fine-Grained Video Classification

A zero-shot framework that uses LLM-generated sub-action scripts and sequence alignment to classify fine-grained actions in video without any video–text supervision.

References