I tested MOSS-Audio under common voice 15 en subset and found some of the results confusing. <img width="985" height="318" alt="Image" src="https://github.com/user-attachments/assets/a7e713fa-ce5c-4f39-aac6-e4d019bbe507" /> I do not specify any generation configurations though. What generation configuration do you recommend to avoid this situaion?
I tested MOSS-Audio under common voice 15 en subset and found some of the results confusing.
I do not specify any generation configurations though. What generation configuration do you recommend to avoid this situaion?