Revisiting Video Saliency: A Large-scale Benchmark and a New Model (CVPR18, PAMI19)

Overview

DHF1K

===========================================================================

Wenguan Wang, J. Shen, M.-M Cheng and A. Borji,

Revisiting Video Saliency: A Large-scale Benchmark and a New Model,

IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018 and

IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2019

===========================================================================

The code (ACLNet) and dataset (DHF1K with raw gaze records, UCF-sports are new added!) can be downloaded from:

Google disk:https://drive.google.com/open?id=1sW0tf9RQMO4RR7SyKhU8Kmbm4jwkFGpQ

Baidu pan: https://pan.baidu.com/s/110NIlwRIiEOTyqRwYdDnVg

The Hollywood-2 (74.6G, including attention maps) can be downloaded from:

Google disk:https://drive.google.com/file/d/1vfRKJloNSIczYEOVjB4zMK8r0k4VJuWk/view?usp=sharing

Baidu pan: link:https://pan.baidu.com/s/16BIAuaGEDDbbjylJ8zziuA code:bt3x

Since so many people are interested in the training code, I decide to upload it in above webdisks. Enjoy it.

===========================================================================

Files:

'video': 1000 videos (videoname.AVI)

'annotation/videoname/maps': continuous saliency maps in '.png' format

'annotation/videoname/fixation': binary eye fixation maps in '.png' format

'annotation/videoname/maps': binary eye fixation maps stored in mat file

'generate_frame.m': used for extracting the frame images from AVI videos.

Please note raw data of individual viewers are stored in 'exportdata_train.rar'.

Note that please do not change the way of naming frames.

===========================================================================

Dataset splitting:

Training set: first 600 videos (001.AVI-600.AVI)

Validation set: 100 videos (601.AVI-700.AVI)

Testing set: 300 videos (701.AVI-1000.AVI)

The annotations for the training and val sets are released, but the

annotations of the testing set are held-out for benchmarking.

===========================================================================

We have corrected some statistics of our results (baseline training setting (iii)) on UCF sports dataset. Please see our newest version in ArXiv.

===========================================================================

Note that, for Holly-wood2 dataset, we used the split videos (each video only contains one shot), instead of the full videos.

===========================================================================

The raw data of gaze record "exportdata_train.rar" has been uploaded.

===========================================================================

For DHF1K dataset, we use following functions to generate continous saliency map:

[x,y]=find(fixations);

densityMap= make_gauss_masks(y,x,[video_res_y,video_res_x]);

make_gauss_masks.m has been uploaded.

For UCF and Hollywood, I directly use following functions:

densityMap = imfilter(fixations,fspecial('gaussian',150,20),'replicate');

===========================================================================

Results submission.

Please orgnize your results in following format:

yourmethod/videoname/framename.png

Note that the frames and framenames should be generated by 'generate_frame.m'.

Then send your results to '[email protected]'.

You can only sumbmit ONCE within One week.

Please first test your model on the val set or other video saliency dataset.

The response may be more than one week.

If you want to list your results on our web, please send your name, model

name, paper title, short description of your method and the link of the web

of your project (if you have).

===========================================================================

We use

Keras: 2.2.2

tensorflow: 1.10.0

to implement our model.

===========================================================================

Citation:

@InProceedings{Wang_2018_CVPR,
author = {Wang, Wenguan and Shen, Jianbing and Guo, Fang and Cheng, Ming-Ming and Borji, Ali},
title = {Revisiting Video Saliency: A Large-Scale Benchmark and a New Model},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition},
year = {2018}
}

@ARTICLE{Wang_2019_revisitingVS, 
author={W. {Wang} and J. {Shen} and J. {Xie} and M. {Cheng} and H. {Ling} and A. {Borji}}, 
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
title={Revisiting Video Saliency Prediction in the Deep Learning Era}, 
year={2019}, 
}

If you find our dataset is useful, please cite above papers.

===========================================================================

Code (ACLNet):

You can find the code in google disk: https://drive.google.com/open?id=1sW0tf9RQMO4RR7SyKhU8Kmbm4jwkFGpQ

===========================================================================

Terms of use:

The dataset and code are licensed under a Creative Commons Attribution 4.0 License.

===========================================================================

Contact Information Email: [email protected]


Comments
  • How are ground truth saliency maps generated from recorded fixations?

    How are ground truth saliency maps generated from recorded fixations?

    In your data collection you gather a set of discrete fixation maps (P in the paper). From this, continuous saliency maps (Q in the paper) are generated. I found no details about how this is done, could you elaborate? I would guess that it involves gaussians centered on the spot of fixation, I am interested in the exact parameters, how you combine fixations from different test subjects and so on.

    Thanks again for providing the dataset!

    opened by wjakobw 11
  • Regarding saliency metric (especially CC)

    Regarding saliency metric (especially CC)

    Hi,

    First of all, thank you for providing such a nice dataset and code for evaluation metrics. I would like to evaluate my saliency results using five metrics that paper talked about.

    I found that linear cross correlation can provide positive and negative values [-1,1]. So, when you present score for CC, did you take absolute value of each linear CC with respect to image and averaging it with each frame and each clip? Your MATLAB code doesn't do that, but that does make sense to me.

    Look forward to your reply.

    opened by snlee81 9
  • question about testing AUC-shuffled

    question about testing AUC-shuffled

    When using the evaluation code in this package, AUC-shuffled score is much lower than that reported in the paper on UCF dataset. I was wondering if it is anything wrong with the evaluation code, or if I missed some important details.

    opened by hkkevinhf 8
  • Is the audio presented to the viewer during fixation collection?

    Is the audio presented to the viewer during fixation collection?

    Hi, thanks for collecting such a valuable dataset! Several things I want to clarify with you:

    1. I noticed they videos are with audios. Are the viewers accessible to them during the data collection?
    2. As for the 1000 video clips, are they the complete clips that you directly downloaded from YouTube or you randomly cut some them from the raw videos?

    Many thanks!

    opened by zijunwei 4
  • Testing setting of Hollywood2 dataset

    Testing setting of Hollywood2 dataset

    Did you use the whole fixation points when training and testing Hollywood2 dataset? Or, did you filter out some points? (e.x. filters out the points at image edge, ....) Also, did you use the whole 884 test videos when testing Hollywood2? How did you sync the fixation points with the video?

    I am asking you this because I want to replicate the same result on Hollywood2 dataset. Could you provide a more detailed information of your setting? (I divided the videos by using shot bounds, as you mentioned)

    opened by kylemin 3
  • UCF download link?

    UCF download link?

    You mentioned in a previous issue :

    Hi, all, the data of Hollywood-2 and UCF have been uploaded.

    The code (ACLNet) and dataset (DHF1K with raw gaze records, UCF-sports are new added!) can be downloaded from:

    Google disk:https://drive.google.com/open?id=1sW0tf9RQMO4RR7SyKhU8Kmbm4jwkFGpQ

    Baidu pan: https://pan.baidu.com/s/110NIlwRIiEOTyqRwYdDnVg

    The Hollywood-2 (74.6G) can be downloaded from:

    Google disk:https://drive.google.com/open?id=1vfRKJloNSIczYEOVjB4zMK8r0k4VJuWk

    Originally posted by @wenguanwang in https://github.com/wenguanwang/DHF1K/issues/2#issuecomment-428440091

    Is there also a link for the UCF-sports dataset?

    opened by Linardos 2
  • The loss is nan.

    The loss is nan.

    Hi, I'm really interested in your work. And I used your training code -- 'ACL_full' to train my data. But during training, the loss always becomes NAN after several iterations: 53/100 [==============>...............] - ETA: 59s - loss: nan - time_distributed_15_loss: nan - time_distributed_16_loss: nan

    I have tuned the base learning rate from 1e-4 to 1e-12, but the results are the same.

    Do you know there are some solutions?

    And what does the 'imgs_path' ('staticimages') in config.py mean?

    Thanks very much!

    opened by yufanLIU 1
  • About ACL.h5

    About ACL.h5

    Hi! Many thanks for your great work!

    The ACL.h5 file could not be opened as a result of running the program. Is it possible that this file is corrupt?

    opened by masanari-umetani 0
  • questions about the paths and files

    questions about the paths and files

    Hi, thank you for your dataset and the source code, I wanna replicate this work with your code, but I am confused about the paths in config.py. I want to know what kinds of data has been used to train the model. In your paper Revisiting Video Saliency: A Large-scale Benchmark and a New Model ,you said that you have used the static dataset SALICON to train the attention module, and in your code there are several paths. Could you tell me:

    • which paths are the video dataset's path and which is for the SALICON? Do you mean that frames_path is all the frames extracted from the video, and imgs_path is for the data in SALICON?
    • do I need to extract all frames from the videos by myself?

    related code are as follows:

    # path of training videos
    videos_train_paths = ['D:/code/attention/DHF1K/training/']
    # path of validation videos
    videos_val_paths = ['D:/code/attention/DHF1K/val/']
    videos_test_path = 'D:/code/attention/DHF1K/testing/'
    
    # path of training maps
    maps_path = '/maps/'
    # path of training fixation maps
    fixs_path = '/fixation/maps/'
    
    frames_path = '/images/'
    
    # path of training images
    imgs_path = 'D:/code/attention/staticimages/training/'
    

    Thankyou.

    opened by Andantino97 0
  • Attributes for first 700 videos

    Attributes for first 700 videos

    Many thanks for your great work! as far as I can see, DHF1k_attribute.xlsx only provides data for the 300 test videos. Could you also provide this kind of attribute data for the first 700 videos? That would save me a lot of work and would be highly appreciated!

    opened by rederoth 5
  • discrepancy in exportdata_train and DHF1K fixation maps?

    discrepancy in exportdata_train and DHF1K fixation maps?

    Hi, thanks for the nice dataset. I want to recreate the fixation maps using the raw gaze records in exportdata_train folder released for DHF1K.

    However, the fixation map obtained using record_mapping.m script and raw data from exportdata_train folder donot match the ones released in DHF1K.

    For example:

    1. 0001.png: this is the fixation map for first frame of 001.AVI copied from: annotation/0001/fixation/0001.png

    0001

    1. 0001_regenerated.png : I regenerated this fixation map using files from exportdata_train folder.

    0001_regenerated

    I used the record_mapping.m file after specifying appropriate paths and modifying line 22 and line 24.

    Could you please help me understand what I might be missing?

    For your reference, here is my copy of record_mapping.m file:

    %This function is used for mapping the fixation record into the corresponding fixation maps.
    screen_res_x = 1440;
    screen_res_y = 900;
    
    parent_dir = 'GIVE PATH TO PARENT DIRECTORY';
    
    datasetFile1 = 'movie';
    datasetFile = 'video';
    gazeFile = 'exportdata_train';
    
    videoFiles = dir(fullfile('./', datasetFile));
    videoNUM = length(videoFiles)-2;
    rate = 30;
      
    full_vid_dir = [parent_dir, datasetFile, '/'];
    
     for videonum = 1:700
            videofolder =  videoFiles(videonum+2).name
            vidObj = VideoReader([full_vid_dir,videofolder]);
            options.infolder = fullfile( './', datasetFile,  videofolder, 'images' );
            % no need to read full video if I can use VideoReader to know
            % dimensions and duration of video
            % Cache all frames in memory
            %[data.frames,names,video_res_y,video_res_x,nframe ]= readAllFrames( options );
            nframe = vidObj.NumberOfFrames;
            video_res_x = vidObj.Width;
            video_res_y = vidObj.Height;
            a=video_res_x/screen_res_x;
            b=(screen_res_y-video_res_y/a)/2;
            all_fixation = zeros(video_res_y,video_res_x,nframe);
            for person = 1:17
                %modified the following line to match the video naming format
                txtloc = fullfile(parent_dir, gazeFile, sprintf('P%02d',person), [sprintf('P%02d_Trail',person), sprintf('%03d.txt',videonum)]);
                if exist(txtloc, 'file')
                    %modified the following line to match the txt file format
                    [time,model,trialnum,diax, diay, x_screen,y_screen,event]=textread(txtloc,'%f%s%f%f%f%f%f%s','headerlines',1);
                    if size(time,1)
                        time = time-time(1);
                        event = cellfun(@(x) x(1), event);
                        for index = 1:nframe
                                eff = find( ((index-1)<rate*time/1000000)&(rate*time/1000000<index)&event=='F'); %framerate = 10;
                                x_stimulus=int32(a*x_screen(eff));
                                y_stimulus=int32(a*(y_screen(eff)-b));
                                t = x_stimulus<=0|x_stimulus>=video_res_x|y_stimulus<=0|y_stimulus>=video_res_y;
                                all_fixation(y_stimulus(~t),x_stimulus(~t),index) = 1;
                        end
                    end
                end
            end 
    end
    
    opened by prashnani 4
Owner
Wenguan Wang
Postdoctoral Scholar
Wenguan Wang
Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020)

GraspNet Baseline Baseline model for "GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping" (CVPR 2020). [paper] [dataset] [API] [do

GraspNet 209 Dec 29, 2022
Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

RGBT Crowd Counting Lingbo Liu, Jiaqi Chen, Hefeng Wu, Guanbin Li, Chenglong Li, Liang Lin. "Cross-Modal Collaborative Representation Learning and a L

null 37 Dec 8, 2022
ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

Hao Su's Lab, UCSD 48 Dec 30, 2022
BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training By Likun Cai, Zhi Zhang, Yi Zhu, Li Zhang, Mu Li, Xiangyang Xue. This

null 290 Dec 29, 2022
an implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation using PyTorch

revisiting-sepconv This is a reference implementation of Revisiting Adaptive Convolutions for Video Frame Interpolation [1] using PyTorch. Given two f

Simon Niklaus 59 Dec 22, 2022
How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

How to Become More Salient? Surfacing Representation Biases of the Saliency Prediction Model

Bogdan Kulynych 49 Nov 5, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation mode

Aiden Nibali 36 Oct 30, 2022
A large-scale video dataset for the training and evaluation of 3D human pose estimation models

ASPset-510 (Australian Sports Pose Dataset) is a large-scale video dataset for the training and evaluation of 3D human pose estimation models. It contains 17 different amateur subjects performing 30 sports-related actions each, for a total of 510 action clips.

Aiden Nibali 25 Jun 20, 2021
A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video

null 45 Nov 29, 2022
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion

CSF Code of Classification Saliency-Based Rule for Visible and Infrared Image Fusion Tips: For testing: CUDA_VISIBLE_DEVICES=0 python main.py For trai

Han Xu 14 Oct 31, 2022
[ECCV 2020] Gradient-Induced Co-Saliency Detection

Gradient-Induced Co-Saliency Detection Zhao Zhang*, Wenda Jin*, Jun Xu, Ming-Ming Cheng ⭐ Project Home » The official repo of the ECCV 2020 paper Grad

Zhao Zhang 35 Nov 25, 2022
source code of “Visual Saliency Transformer” (ICCV2021)

Visual Saliency Transformer (VST) source code for our ICCV 2021 paper “Visual Saliency Transformer” by Nian Liu, Ni Zhang, Kaiyuan Wan, Junwei Han, an

null 89 Dec 21, 2022
Task-related Saliency Network For Few-shot learning

Task-related Saliency Network For Few-shot learning This is an official implementation in Tensorflow of TRSN. Abstract An essential cue of human wisdo

null 1 Nov 18, 2021
PyTorch implementation of saliency map-aided GAN for Auto-demosaic+denosing

Saiency Map-aided GAN for RAW2RGB Mapping The PyTorch implementations and guideline for Saiency Map-aided GAN for RAW2RGB Mapping. 1 Implementations B

Yuzhi ZHAO 20 Oct 24, 2022
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

PGNet Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022, CVPR 2022 (arXiv 2204.05041) Abstract Recent salient objec

CVTEAM 109 Dec 5, 2022
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

Princeton Vision & Learning Lab 115 Jan 4, 2023
Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks.

Heterogeneous Graph Benchmark Revisiting, benchmarking, and refining Heterogeneous Graph Neural Networks. Roadmap We organize our repo by task, and on

THUDM 176 Dec 17, 2022