Yet another video caption

Overview

yet-another-video-caption

数据集配置

准备数据集

将原始数据集重新组织成统一的格式后,放置于 ./dataset 中。

数据集的组织格式为:

./dataset
    train/
        video/
            *.avi
        ...
        info.json
    test/
        video/ 
            *.avi
        ...

自动配置

通常你只需要使用数据集的一个子集,此时请考虑运行自动抽取脚本 makedata.py

所有数据位于 ./data 中。

所有视频(包括 train/val/test) 位于 ./data/video 中。

所有视频信息(包括 train/val/test)输入到 ./data/input.json

程序会在 ./data 中产生一些中间信息,请勿修改。

依赖

pip install tqdm pillow pretrainedmodels nltk

此外,请确保已当前环境下已经正确配置 CUDA 运行库,CUDNN,Pytorch(GPU),ffmpeg,JDK

食用步骤

  1. 确保数据集已正确配置

  2. 确保依赖已经正确安装

  3. 抽取数据,将你希望使用的 train/val/test 划分参数输入 makedata.py 中,然后执行该脚本

  4. 依次执行(请自行修改 batch_sizesaved_model 参数!)

python prepro_feats.py --output_dir data/feats/resnet152 --model resnet152
python prepro_vocab.py
python train.py --epochs 3001 --batch_size 1 --checkpoint_path data/save --feats_dir data/feats/resnet152 --model S2VTAttModel --with_c3d 0 --dim_vid 2048
python eval.py --recover_opt data/save/opt_info.json --saved_model data/save/model_10.pth --batch_size 1

速度测试

以下结果测试于单张 2080Ti

预处理(ResNet152 特征提取):共 40min

训练速度(batch_size=32):6.20 it/s

Todo

大小写问题

References

https://github.com/xiadingZ/video-caption.pytorch

You might also like...
PyBrain - Another Python Machine Learning Library.

PyBrain -- the Python Machine Learning Library =============================================== INSTALLATION ------------ Quick answer: make sure you

Allows including an action inside another action (by preprocessing the Yaml file). This is how composite actions should have worked.

actions-includes Allows including an action inside another action (by preprocessing the Yaml file). Instead of using uses or run in your action step,

git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]
git《Beta R-CNN: Looking into Pedestrian Detection from Another Perspective》(NeurIPS 2020) GitHub:[fig3]

Beta R-CNN: Looking into Pedestrian Detection from Another Perspective This is the pytorch implementation of our paper "[Beta R-CNN: Looking into Pede

Another pytorch implementation of FCN (Fully Convolutional Networks)
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car
This project deploys a yolo fastest model in the form of tflite on raspberry 3b+. The model is from another repository of mine called -Trash-Classification-Car

Deploy-yolo-fastest-tflite-on-raspberry 觉得有用的话可以顺手点个star嗷 这个项目将垃圾分类小车中的tflite模型移植到了树莓派3b+上面。 该项目主要是为了记录在树莓派部署yolo fastest tflite的流程 (之后有时间会尝试用C++部署来提升

Another pytorch implementation of FCN (Fully Convolutional Networks)
Another pytorch implementation of FCN (Fully Convolutional Networks)

FCN-pytorch-easiest Trying to be the easiest FCN pytorch implementation and just in a get and use fashion Here I use a handbag semantic segmentation f

Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually
Virtual Dance Reality Stage: a feature that offers you to share a stage with another user virtually

Portrait Segmentation using Tensorflow This script removes the background from an input image. You can read more about segmentation here Setup The scr

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.
Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

vid2vid Project | YouTube(short) | YouTube(full) | arXiv | Paper(full) Pytorch implementation for high-resolution (e.g., 2048x1024) photorealistic vid

Comments
  • 同学你好,请问一下为什么会报这个错误OSError: [Errno 22] Invalid argument,其他的都是跟着流程走的

    同学你好,请问一下为什么会报这个错误OSError: [Errno 22] Invalid argument,其他的都是跟着流程走的

    E:\va>python train.py --epochs 3001 --batch_size 32 --checkpoint_path data/save --feats_dir data/feats/resnet152 --model S2VTAttModel --with_c3d 0 --dim_vid 2048 save opt details to data/save\opt_info.json vocab size is 6019 number of train videos: 3200 number of val videos: 200 number of test videos: 1000 load feats from ['data/feats/resnet152'] max sequence length in data is 28 vocab size is 6019 number of train videos: 3200 number of val videos: 200 number of test videos: 1000 load feats from ['data/feats/resnet152'] max sequence length in data is 28 D:\Python\lib\site-packages\torch\nn\modules\rnn.py:62: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1 warnings.warn("dropout option adds dropout after all but last " D:\Python\lib\site-packages\torch\nn_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead. warnings.warn(warning.format(ret)) D:\Python\lib\site-packages\torch\optim\lr_scheduler.py:129: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call of lr_scheduler.step() before optimizer.step(). " 新轮次 0 正在训练…… 0%| | 0/100 [00:00<?, ?it/s]D:\Python\lib\site-packages\torch\nn\functional.py:1794: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead. warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.") 100%|████████████████████████████████████████████████████████████████████████████████| 100/100 [00:17<00:00, 5.82it/s] 轮次 0 train_loss = 48.52919979095459 模型保存到 data/save\model_0.pth 100%|████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 20.43it/s] 轮次 0 val_loss = 42.59846823556082 E:\va\train.py:51: FutureWarning: pandas.io.json.json_normalize is deprecated, use pandas.json_normalize instead answer_dataframe = json_normalize( Traceback (most recent call last): File "E:\va\train.py", line 219, in main(opt) File "E:\va\train.py", line 205, in main train(dataloader_train, dataloader_val, dataset_val, model, crit, File "E:\va\train.py", line 152, in train validate(model, utils.LanguageModelCriterion(), File "E:\va\train.py", line 70, in validate valid_score = scorer.score(answers, seq_outputs, seq_outputs.keys()) File "E:\va\misc\cocoeval.py", line 85, in score score, scores = scorer.compute_score(gts, res) File "E:\va\coco-caption\pycocoevalcap\meteor\meteor.py", line 41, in compute_score self.meteor_p.stdin.flush() OSError: [Errno 22] Invalid argument

    batch_size 大于1就会出现这个错误,跑不起来

    opened by ghost 4
  • IndexError: list index out of range

    IndexError: list index out of range

    Traceback (most recent call last): File "D:/video-caption/6/yet-another-video-caption-private-main/makedata.py", line 31, in train_videos[i+NTRAIN]["split"] = "val" IndexError: list index out of range 请问怎么解决啊蟹蟹

    opened by cxy123-ai 1
  • ValueError(

    ValueError("Cannot load file containing pickled data")

    训练过程中遇到了错误

    新轮次 42 正在训练…… 0%| | 0/100 [00:00<?, ?it/s]/root/miniconda3/envs/myconda/lib/python3.7/site-packages/torch/nn/functional.py:1340: UserWarning: nn.functional.tanh is deprecated. Use torch.tanh instead. warnings.warn("nn.functional.tanh is deprecated. Use torch.tanh instead.") 67%|██████████████████████████████████████████████████████████████████████████████████████████████▍ | 67/100 [00:13<00:11, 2.78it/s]Traceback (most recent call last): File "train.py", line 245, in main(opt) File "train.py", line 232, in main optimizer, exp_lr_scheduler, opt, rl_crit) File "train.py", line 130, in train for data in tqdm(loader): File "/root/miniconda3/envs/myconda/lib/python3.7/site-packages/tqdm/std.py", line 1104, in iter for obj in iterable: File "/root/miniconda3/envs/myconda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 346, in next data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/root/miniconda3/envs/myconda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/root/miniconda3/envs/myconda/lib/python3.7/site-packages/torch/utils/data/utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/yavc/dataloader.py", line 61, in getitem fc_feat.append(np.load(os.path.join(dir, 'G%05d.npy' % (ix)))) File "/root/miniconda3/envs/myconda/lib/python3.7/site-packages/numpy/lib/npyio.py", line 457, in load raise ValueError("Cannot load file containing pickled data " ValueError: Cannot load file containing pickled data when allow_pickle=False 67%|██████████████████████████████████████████████████████████████████████████████████████████████▍ | 67/100 [00:13<00:06, 4.98it/s]

    记得找个时间修

    opened by mollnn 1
  • IndexError: list index out of range

    IndexError: list index out of range

    Traceback (most recent call last): File "D:/video-caption/6/yet-another-video-caption-private-main/makedata.py", line 31, in train_videos[i+NTRAIN]["split"] = "val" IndexError: list index out of range 请问怎么解决啊蟹蟹

    opened by cxy123-ai 0
Owner
Fan Zhimin
Dreams of love and hope never die.
Fan Zhimin
YARR is Yet Another Robotics and Reinforcement learning framework for PyTorch.

Yet Another Robotics and Reinforcement (YARR) learning framework for PyTorch.

Stephen James 21 Aug 1, 2021
Yet Another Reinforcement Learning Tutorial

This repo contains self-contained RL implementations

Sungjoon 65 Dec 10, 2022
An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

SVM Données Une base d’images contient 490 images pour l’apprentissage (400 voitures et 90 bateaux), et encore 21 images pour fait des tests. Prétrait

Achraf Rahouti 3 Nov 30, 2021
Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search

CLIP-GLaSS Repository for the paper Generating images from caption and vice versa via CLIP-Guided Generative Latent Space Search An in-browser demo is

Federico Galatolo 172 Dec 22, 2022
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

Microsoft 61 Nov 14, 2022
Fine-grained Control of Image Caption Generation with Abstract Scene Graphs

Faster R-CNN pretrained on VisualGenome This repository modifies maskrcnn-benchmark for object detection and attribute prediction on VisualGenome data

Shizhe Chen 7 Apr 20, 2021
Gif-caption - A straightforward GIF Captioner written in Python

Broksy's GIF Captioner Have you ever wanted to easily caption a GIF without havi

null 3 Apr 9, 2022
Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

null 1 Jan 23, 2022
A clear, concise, simple yet powerful and efficient API for deep learning.

The Gluon API Specification The Gluon API specification is an effort to improve speed, flexibility, and accessibility of deep learning technology for

Gluon API 2.3k Dec 17, 2022
SeqTR: A Simple yet Universal Network for Visual Grounding

SeqTR This is the official implementation of SeqTR: A Simple yet Universal Network for Visual Grounding, which simplifies and unifies the modelling fo

seanZhuh 76 Dec 24, 2022