Skip to content

AskYoutubeAI/AskVideos-VideoCLIP

Repository files navigation

AskVideos-VideoCLIP

Joint Video-Text embeddings for search, classification and more.

Open In Colab

AskVideos-VideoCLIP

  • AskVideos-VideoCLIP is a language-grounded video embedding model.
  • This model produces a single context-aware embedding for each video clip.
  • 16 frames are sampled from each video clip to generate a video embedding.
  • The model is trained with contrastive and captioning loss to ground the video embeddings to text.

Pre-trained & Fine-tuned Checkpoints

Checkpoint Link
AskVideos-VideoCLIP-v0.1 link
AskVideos-VideoCLIP-v0.2 link

Usage

Environment Preparation

First, install ffmpeg.

apt update
apt install ffmpeg

Then, create a conda environment:

conda create -n askvideosclip python=3.9 
conda activate askvideosclip

Then, install the requiremnts:

pip3 install -U pip
pip3 install -r requirements.txt

How to Run Demo Locally

python video_clip.py

The demo is also available to run on colab.

Model Colab link
AskVideos-VideoCLIP-v0.1 link
AskVideos-VideoCLIP-v0.2 link

Star History

Star History Chart

Term of Use

AskVideos code and models are distributed under the Apache 2.0 license.

Acknowledgement

This model is inspired by the Video-LLaMA Video-Qformer model.

Citation

bibtex
@misc{askvideos2024videoclip,
  title        = {AskVideos-VideoCLIP: Language-grounded video embeddings},
  author       = {AskVideos},
  year         = {2024},
  howpublished = {GitHub},
  url          = {https://github.com/AskYoutubeAI/AskVideos-VideoCLIP}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published