Can I know what is the size of the Kinetics 400 dataset used to reproduce the result in this repo? #1578

yxchng · 2020-12-31T11:50:22Z

There are many links in Kinetics that have expired. As as result, everyone might not be using the same Kinetics dataset. As a reference, the statistics of the Kinetics dataset used in PySlowFast can be found here, https://github.com/facebookresearch/video-nonlocal-net/blob/master/DATASET.md. However, I cannot seem to find similar information for gluoncv. Will you guys be sharing the statistics and the dataset used? I need the complete dataset to reproduce the result.

bryanyzhu · 2020-12-31T18:19:52Z

Hi, we use a copy from here. There are 240618 training and 19404 validation videos. There is a simple readme for reproduction using GluonCV, and I'm still editing it. Please let me know if you have further questions. Thank you.

irvingzhang0512 · 2021-01-01T03:08:45Z

Hi, we use a copy from here. There are 240618 training and 19404 validation videos. There is a simple readme for reproduction using GluonCV, and I'm still editing it. Please let me know if you have further questions. Thank you.

thank you very much! kinetics-400/700 could be downloaded by Xunlei and Baiduyunpan...

yxchng · 2021-01-01T05:01:57Z

@bryanyzhu Does this link still work I just tried but it is not downloading, always stuck at 0%

@irvingzhang0512 Can you post the link for Xunlei and Baiduyunpan here? Is it the same dataset with 240618 training and 19404 validation videos?

irvingzhang0512 · 2021-01-01T05:45:51Z

@yxchng load the torrents in the above link to baiduyunpan and xunlei

irvingzhang0512 · 2021-01-01T11:40:47Z

After some tests, I find that kinetics-400 val set could be downloaded by xunlei with 1.5M/s. Kinetics-400/700 could be download by baiduyun with slow speed.

StevenJokess · 2021-01-07T16:10:55Z

Setting file /root/.mxnet/datasets/kinetics400/kinetics400_train_list_rawframes.txt doesn't exist.
How to fix it?
It also happened.

bryanyzhu · 2021-01-07T20:25:31Z

Hi @StevenJokess This setting file should be generated on your end, which contains the paths of the videos, duration of the videos and the label of the videos. Please read this tutorial on how to prepare K400 dataset. Thank you.

BTW, if you have further questions on using the code, please open a new issue. This issue is for (downloading) Kinetics400 dataset only.

bychen7 · 2021-01-11T13:04:32Z

@bryanyzhu If i want to train model using video data directly, how can i generate k400_train.txt? and in this tutorial this url was 404, Have you tested the script of this tutorial? Thanks!

bryanyzhu · 2021-01-11T18:00:44Z

I have tested the script half a year ago, but not recently. Thank you for pointing out the missing url issue, I will fix it.

In terms of generating k400_train.txt, once you download the videos from the torrent website (or you already have the data), you can use the following code to generate it.

import os
import sys

video_dir = '/home/ubuntu/data/kinetics400/train_256/'
classes = os.listdir(video_dir)
classes.sort()

anno_file = '/home/ubuntu/data/kinetics400/k400_train.txt'
f = open(anno_file, 'a')

class_id = 0
for each_class in classes:
    class_dir = os.path.join(video_dir, each_class)
    videos = os.listdir(class_dir)
    videos.sort()

    for each_video in videos:
        video_name = os.path.join(each_class, each_video)
        line_info = video_name + ' ' + str(300) + ' ' + str(class_id) + '\r\n'
        f.writelines(line_info)

    class_id += 1

Simply put, each video corresponds to one line in the txt file. There are three items in each line: video path, video duration and video label. Since our video reader can automatically get video duration information during video loading, we just need a placeholder 300 here to indicate the video duration (i.e., number of frames).

bychen7 · 2021-01-12T06:57:39Z

I have tested the script half a year ago, but not recently. Thank you for pointing out the missing url issue, I will fix it.

In terms of generating k400_train.txt, once you download the videos from the torrent website (or you already have the data), you can use the following code to generate it.
import os
import sys

video_dir = '/home/ubuntu/data/kinetics400/train_256/'
classes = os.listdir(video_dir)
classes.sort()

anno_file = '/home/ubuntu/data/kinetics400/k400_train.txt'
f = open(anno_file, 'a')

class_id = 0
for each_class in classes:
    class_dir = os.path.join(video_dir, each_class)
    videos = os.listdir(class_dir)
    videos.sort()

    for each_video in videos:
        video_name = os.path.join(each_class, each_video)
        line_info = video_name + ' ' + str(300) + ' ' + str(class_id) + '\r\n'
        f.writelines(line_info)

    class_id += 1
Simply put, each video corresponds to one line in the txt file. There are three items in each line: video path, video duration and video label. Since our video reader can automatically get video duration information during video loading, we just need a placeholder 300 here to indicate the video duration (i.e., number of frames).

Thanks for your response, I have started training, but when reading with decord, some video cannot be loaded as following, did you know why or how could I solve it?

video cannot be loaded by decord:  /data/kinetics400/videos_train/throwing_axe/Yse3yBzFgPo_1_10.mp4
/action/gluon-cv/gluoncv/torch/data/video_cls/dataset_classification.py:122: UserWarning: video throwing_axe/Yse3yBzFgPo_1_10.mp4 not correctly loaded during training
  warnings.warn("video {} not correctly loaded during training".format(sample))
[buffer @ 0x5607ed561640] Unable to parse option value "-1" as pixel format
[buffer @ 0x5607ed561640] Unable to parse option value "-1" as pixel format
[buffer @ 0x5607ed561640] Error setting option pix_fmt to value -1.
[in @ 0x5607edbff1c0] Error applying options to the filter.

bryanyzhu · 2021-01-12T07:13:47Z

Hi @blackmagicianZ , are you using the videos from the torrent website? If not, could you please open another issue and attach these videos? We can look into the issue of loading these videos. Thank you.

bychen7 · 2021-01-12T08:21:03Z

Hi @blackmagicianZ , are you using the videos from the torrent website? If not, could you please open another issue and attach these videos? We can look into the issue of loading these videos. Thank you.

Maybe some video didn't download successfully, I will download videos from the torrent website, if the videos from the torrent website still has problems, I will open another issue. Thank you.

aaroncodebro · 2021-07-09T15:29:28Z

I have tested the script half a year ago, but not recently. Thank you for pointing out the missing url issue, I will fix it.
In terms of generating k400_train.txt, once you download the videos from the torrent website (or you already have the data), you can use the following code to generate it.
import os
import sys

video_dir = '/home/ubuntu/data/kinetics400/train_256/'
classes = os.listdir(video_dir)
classes.sort()

anno_file = '/home/ubuntu/data/kinetics400/k400_train.txt'
f = open(anno_file, 'a')

class_id = 0
for each_class in classes:
    class_dir = os.path.join(video_dir, each_class)
    videos = os.listdir(class_dir)
    videos.sort()

    for each_video in videos:
        video_name = os.path.join(each_class, each_video)
        line_info = video_name + ' ' + str(300) + ' ' + str(class_id) + '\r\n'
        f.writelines(line_info)

    class_id += 1
Simply put, each video corresponds to one line in the txt file. There are three items in each line: video path, video duration and video label. Since our video reader can automatically get video duration information during video loading, we just need a placeholder 300 here to indicate the video duration (i.e., number of frames).
Thanks for your response, I have started training, but when reading with decord, some video cannot be loaded as following, did you know why or how could I solve it?
video cannot be loaded by decord:  /data/kinetics400/videos_train/throwing_axe/Yse3yBzFgPo_1_10.mp4
/action/gluon-cv/gluoncv/torch/data/video_cls/dataset_classification.py:122: UserWarning: video throwing_axe/Yse3yBzFgPo_1_10.mp4 not correctly loaded during training
  warnings.warn("video {} not correctly loaded during training".format(sample))
[buffer @ 0x5607ed561640] Unable to parse option value "-1" as pixel format
[buffer @ 0x5607ed561640] Unable to parse option value "-1" as pixel format
[buffer @ 0x5607ed561640] Error setting option pix_fmt to value -1.
[in @ 0x5607edbff1c0] Error applying options to the filter.

Hi @blackmagicianZ , did you manage to fix this ?. I have the same issue for my custom dataset.

bychen7 · 2021-07-13T12:13:10Z

I have tested the script half a year ago, but not recently. Thank you for pointing out the missing url issue, I will fix it.
In terms of generating k400_train.txt, once you download the videos from the torrent website (or you already have the data), you can use the following code to generate it.
import os
import sys

video_dir = '/home/ubuntu/data/kinetics400/train_256/'
classes = os.listdir(video_dir)
classes.sort()

anno_file = '/home/ubuntu/data/kinetics400/k400_train.txt'
f = open(anno_file, 'a')

class_id = 0
for each_class in classes:
    class_dir = os.path.join(video_dir, each_class)
    videos = os.listdir(class_dir)
    videos.sort()

    for each_video in videos:
        video_name = os.path.join(each_class, each_video)
        line_info = video_name + ' ' + str(300) + ' ' + str(class_id) + '\r\n'
        f.writelines(line_info)

    class_id += 1
Simply put, each video corresponds to one line in the txt file. There are three items in each line: video path, video duration and video label. Since our video reader can automatically get video duration information during video loading, we just need a placeholder 300 here to indicate the video duration (i.e., number of frames).
Thanks for your response, I have started training, but when reading with decord, some video cannot be loaded as following, did you know why or how could I solve it?
video cannot be loaded by decord:  /data/kinetics400/videos_train/throwing_axe/Yse3yBzFgPo_1_10.mp4
/action/gluon-cv/gluoncv/torch/data/video_cls/dataset_classification.py:122: UserWarning: video throwing_axe/Yse3yBzFgPo_1_10.mp4 not correctly loaded during training
  warnings.warn("video {} not correctly loaded during training".format(sample))
[buffer @ 0x5607ed561640] Unable to parse option value "-1" as pixel format
[buffer @ 0x5607ed561640] Unable to parse option value "-1" as pixel format
[buffer @ 0x5607ed561640] Error setting option pix_fmt to value -1.
[in @ 0x5607edbff1c0] Error applying options to the filter.
Hi @blackmagicianZ , did you manage to fix this ?. I have the same issue for my custom dataset.

This might some videos was broken, check your videos.

bryanyzhu self-assigned this Dec 31, 2020

bryanyzhu added the good first issue Good for newcomers label Jan 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I know what is the size of the Kinetics 400 dataset used to reproduce the result in this repo? #1578

Can I know what is the size of the Kinetics 400 dataset used to reproduce the result in this repo? #1578

yxchng commented Dec 31, 2020

bryanyzhu commented Dec 31, 2020

irvingzhang0512 commented Jan 1, 2021

yxchng commented Jan 1, 2021

irvingzhang0512 commented Jan 1, 2021

irvingzhang0512 commented Jan 1, 2021

StevenJokess commented Jan 7, 2021

bryanyzhu commented Jan 7, 2021

bychen7 commented Jan 11, 2021

bryanyzhu commented Jan 11, 2021

bychen7 commented Jan 12, 2021

bryanyzhu commented Jan 12, 2021

bychen7 commented Jan 12, 2021

aaroncodebro commented Jul 9, 2021

bychen7 commented Jul 13, 2021

Can I know what is the size of the Kinetics 400 dataset used to reproduce the result in this repo? #1578

Can I know what is the size of the Kinetics 400 dataset used to reproduce the result in this repo? #1578

Comments

yxchng commented Dec 31, 2020

bryanyzhu commented Dec 31, 2020

irvingzhang0512 commented Jan 1, 2021

yxchng commented Jan 1, 2021

irvingzhang0512 commented Jan 1, 2021

irvingzhang0512 commented Jan 1, 2021

StevenJokess commented Jan 7, 2021

bryanyzhu commented Jan 7, 2021

bychen7 commented Jan 11, 2021

bryanyzhu commented Jan 11, 2021

bychen7 commented Jan 12, 2021

bryanyzhu commented Jan 12, 2021

bychen7 commented Jan 12, 2021

aaroncodebro commented Jul 9, 2021

bychen7 commented Jul 13, 2021