Skip to content

Preprocessing the original data (where can we find DF40_cdf and DF40_ff datasets) #33

@DeepFaux

Description

@DeepFaux

Hello,

I’m currently working on processing the original data. In preprocess.py, it seems that the dataset_name should either be 'DF40_cdf' or 'DF40_ff'. However, when I downloaded the unprocessed data, it came with two directories named train and test.

Could you please clarify how I should organize these directories to match the expected structure for processing?

Additionally, it is not very clear to me how to add the real data (FF++ and CDF). In which stage it will be balanced with the fake one?

====================================================================
Link to unprocessed data:
https://drive.google.com/drive/folders/1GB3FN4pjf9Q5hhhcBmBTdMmEmtrDe9zZ

code in preprocess.py file

# DF40
elif dataset_name == 'DF40_cdf':
    dataset_path = Path(str(dataset_path).replace('DF40_cdf', 'DF40'))
    aigc_dataset_name = ['StyleGAN2', 'StyleGAN3', 'StyleGANXL', 'ddim', 'ddpm', 'collaborative_diffusion', 'pixart', 'SiT', 'sd1.5', 'sd2.1', 'VQGAN', 'DiT', 'MidJourney', 'dalle2_face']
    # obtain all forgery names within df40
    sub_dataset_names = [d for d in os.listdir(dataset_path) if os.path.isdir(os.path.join(dataset_path, d)) and d not in aigc_dataset_name]
    # obtain all video names for each forgery folder
    sub_dataset_paths = [Path(os.path.join(dataset_path, name, 'cdf')) for name in sub_dataset_names]
elif dataset_name == 'DF40_ff':
    dataset_path = Path(str(dataset_path).replace('DF40_ff', 'DF40'))
    aigc_dataset_name = ['StyleGAN2', 'StyleGAN3', 'StyleGANXL', 'ddim', 'ddpm', 'collaborative_diffusion', 'pixart', 'SiT', 'sd1.5', 'sd2.1', 'VQGAN', 'DiT', 'MidJourney', 'dalle2_face' ]
    # obtain all forgery names within df40
    sub_dataset_names = [d for d in os.listdir(dataset_path) if os.path.isdir(os.path.join(dataset_path, d)) and d not in aigc_dataset_name]
    # obtain all video names for each forgery folder
    sub_dataset_paths = [Path(os.path.join(dataset_path, name, 'ff')) for name in sub_dataset_names]

Thank you for your work and help.
Sahar

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions