Preprocessing the original data (where can we find DF40_cdf and DF40_ff datasets)

Hello,

I’m currently working on processing the original data. In preprocess.py, it seems that the dataset_name should either be '**DF40_cdf**' or '**DF40_ff**'. However, when I downloaded the unprocessed data, it came with two directories named train and test.

Could you please clarify how I should organize these directories to match the expected structure for processing?

Additionally, it is not very clear to me how to add the real data (FF++ and CDF). In which stage it will be balanced with the fake one?

====================================================================
**Link to unprocessed data:**
https://drive.google.com/drive/folders/1GB3FN4pjf9Q5hhhcBmBTdMmEmtrDe9zZ


**code in preprocess.py file**

    # DF40
    elif dataset_name == 'DF40_cdf':
        dataset_path = Path(str(dataset_path).replace('DF40_cdf', 'DF40'))
        aigc_dataset_name = ['StyleGAN2', 'StyleGAN3', 'StyleGANXL', 'ddim', 'ddpm', 'collaborative_diffusion', 'pixart', 'SiT', 'sd1.5', 'sd2.1', 'VQGAN', 'DiT', 'MidJourney', 'dalle2_face']
        # obtain all forgery names within df40
        sub_dataset_names = [d for d in os.listdir(dataset_path) if os.path.isdir(os.path.join(dataset_path, d)) and d not in aigc_dataset_name]
        # obtain all video names for each forgery folder
        sub_dataset_paths = [Path(os.path.join(dataset_path, name, 'cdf')) for name in sub_dataset_names]
    elif dataset_name == 'DF40_ff':
        dataset_path = Path(str(dataset_path).replace('DF40_ff', 'DF40'))
        aigc_dataset_name = ['StyleGAN2', 'StyleGAN3', 'StyleGANXL', 'ddim', 'ddpm', 'collaborative_diffusion', 'pixart', 'SiT', 'sd1.5', 'sd2.1', 'VQGAN', 'DiT', 'MidJourney', 'dalle2_face' ]
        # obtain all forgery names within df40
        sub_dataset_names = [d for d in os.listdir(dataset_path) if os.path.isdir(os.path.join(dataset_path, d)) and d not in aigc_dataset_name]
        # obtain all video names for each forgery folder
        sub_dataset_paths = [Path(os.path.join(dataset_path, name, 'ff')) for name in sub_dataset_names]


Thank you for your work and help.
Sahar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocessing the original data (where can we find DF40_cdf and DF40_ff datasets) #33

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Preprocessing the original data (where can we find DF40_cdf and DF40_ff datasets) #33

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions