bug: AV_MossFormer2 audio shift for detection beyond the beginning of the file

Hi,

AV_MossFormer2_TSE_16K is awesome. I did the following and it generally works but need some clean up.

```
    myClearVoice = ClearVoice(task='target_speaker_extraction', model_names=['AV_MossFormer2_TSE_16K'])

    # #1sd calling method: process an input video and return output video, then write outputs to 'path_to_output_videos_tse'
    output_wav = myClearVoice(input_path='input.mp4', online_write=True, output_path='separate_audio')
```

The issue is that any detection done by the model further into the video (not the start frame), the detected audio starts with the first frame leading to a huge desync in video_est_x.mp4 files.

For example if a speaker detected at 00:05 mark, the corresponding video_est_x.mp4 file will have the audio shifted to the left 5 seconds.

Thank you for advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: AV_MossFormer2 audio shift for detection beyond the beginning of the file #160

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug: AV_MossFormer2 audio shift for detection beyond the beginning of the file #160

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions