Skip to content

[StyleTTS2] feat: add demo and inference integration basics#804

Open
roedoejet wants to merge 7 commits into
mainfrom
dev.ap/styletts2-improvements
Open

[StyleTTS2] feat: add demo and inference integration basics#804
roedoejet wants to merge 7 commits into
mainfrom
dev.ap/styletts2-improvements

Conversation

@roedoejet
Copy link
Copy Markdown
Member

@roedoejet roedoejet commented May 15, 2026

PR Goal?

I previously connected the everyvoice train functionality with StyleTTS2. This PR integrates StyleTTS2 with everyvoice demo, everyvoice synthesize, and everyvoice checkpoint inspect commands.

Fixes?

Part of #686

Feedback sought?

Testing, but also sanity. I think there are actually quite a few places where we tied ourselves a bit too closely to FS2 and its architecture. I think I need some space to be able to tell how to refactor, but any insight is helpful.

I'm mostly looking for high-level analysis about whether the approach to combine repos in this way is reasonable.

Priority?

high

Tests added?

none so far

How to test?

try running everyvoice synthesize, everyvoice demo, and everyvoice checkpoint inspect. note, that I don't think this will work on the model you just trained. I'm not adding backwards compatibility support for that, although the hooks are in place for us to be able to handle this in the future in the same way as FS2

Confidence?

medium

Version change?

n/a, already bumped to 0.5.0

Related PRs?

EveryVoiceTTS/FastSpeech2_lightning#142
EveryVoiceTTS/StyleTTS2#4

@semanticdiff-com
Copy link
Copy Markdown

semanticdiff-com Bot commented May 15, 2026

Review changes with  SemanticDiff

Changed Files
File Status
  everyvoice/cli.py  8% smaller
  everyvoice/tests/test_cli.py  8% smaller
  everyvoice/tests/stubs.py  2% smaller
  everyvoice/base_cli/checkpoint.py  0% smaller
  everyvoice/base_cli/prediction_writing_callback.py  0% smaller
  everyvoice/demo/app.py  0% smaller
  everyvoice/model/e2e/StyleTTS2_lightning  0% smaller
  everyvoice/model/feature_prediction/FastSpeech2_lightning  0% smaller

@roedoejet roedoejet mentioned this pull request May 15, 2026
25 tasks
@roedoejet roedoejet marked this pull request as draft May 16, 2026 00:13
@roedoejet roedoejet marked this pull request as ready for review May 19, 2026 23:39
@roedoejet roedoejet requested a review from joanise May 19, 2026 23:39
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

CLI load time: 0:00.28
Pull Request HEAD: cca19c5a7720cfc586333188fc4da3d220bba87b

Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package
import time:       648 |     117506 |   typer
import time:      5534 |     137890 |               loguru._logger
import time:      3778 |     142884 |             loguru
import time:      2489 |     192312 |           everyvoice.utils
import time:       846 |     193158 |         everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli.benchmark
import time:       543 |     201593 |       everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli.cli
import time:       312 |     201904 |     everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli
import time:      1306 |     203210 |   everyvoice.model.feature_prediction.FastSpeech2_lightning.fs2.cli.check_data
import time:      7048 |     422244 | everyvoice.cli
import time:      5073 |     131130 |   rich.markdown
import time:      4112 |     217495 | typer.rich_utils

@codecov
Copy link
Copy Markdown

codecov Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 39.28571% with 136 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.98%. Comparing base (7f0d5d4) to head (cca19c5).

Files with missing lines Patch % Lines
everyvoice/demo/app.py 3.70% 78 Missing ⚠️
everyvoice/cli.py 54.62% 50 Missing and 4 partials ⚠️
everyvoice/base_cli/checkpoint.py 20.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #804      +/-   ##
==========================================
- Coverage   87.55%   84.98%   -2.57%     
==========================================
  Files          45       46       +1     
  Lines        4033     4217     +184     
  Branches      605      632      +27     
==========================================
+ Hits         3531     3584      +53     
- Misses        365      494     +129     
- Partials      137      139       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@roedoejet roedoejet changed the title feat: add demo and inference integration basics [StyleTTS2] feat: add demo and inference integration basics May 20, 2026
@joanise
Copy link
Copy Markdown
Member

joanise commented May 20, 2026

To resolve the conflict, keep SCHEMAS_TO_OUTPUT: and its preceding comment just before def update_schemas.

Copy link
Copy Markdown
Member

@joanise joanise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really tested yet, but I'm done for today. Generally good but see suggestions below.

Comment thread everyvoice/demo/app.py
return demo


def create_demo_app_styletts2(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any code at all that is shared between the styletts2 demo and the fs2+hfgl one? As I scan, it seems like it's completely separate code, and in that case I would think that this should be placed in styletts2/cli/demo.py instead of here.

That would also facilitate adding it to the styletts2 CLI as styletts2 demo, actually.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this would be good, but I might leave it for another PR since it requires re-factoring code that isn't part of this PR feature. For now, I'll leave this demo code where the other demo code lives, but I'll also make an issue to refactor it in the way you suggest.

Comment thread everyvoice/cli.py Outdated


@demo_group.command(
name="text-to-spec",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, this name won't be intuitive for me: this demo variant does text->spec->wav, while the other does text->wav: both generate wav from text.

I just discussed it with Sam, and he came up with text-to-spec-to-wav for this vs text-to-wav for the other. It's longer, but with auto complete you don't have to type it all. Sam also found that we can create aliases if we don't want to have to always use the long names.

E.g.,

@demo_group.command(name="text-to-spec-to-wav" short_help="...")
@demo_group.command(name="t2s2w", hidden=True)
def demo...()

would define an alias command everyvoice demo t2s2w if we want to have a shorter version available. The alias could even use the model short names, maybe everyvoice demo fs2-hfgl vs everyvoice demo styletts2 could be our aliases for the two variants?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, yes, I totally see what you mean. I do like this better than what I have, but I'm also thinking that maybe we can skirt through this whole issue by reading the metadata in the checkpoint since EV declares which type of model it is. That way, we could just run everyvoice demo path/to/ckpt and let the command figure out which one of these commands to run. That would be cleaner from a UX point of view, because there isn't really any reason the user needs to know exactly what type of model it is when they want to demo it. If you agree with this, I can implement and push.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, that would be great! And we have all the logic needed for inspect, why not use it here.

Obviously, it should be an error to provide a vocoder and a styletts, and a different error to have fs2 without a vocoder, but that's pretty simple logic.

The other advantage of your suggestion: current demo startup commands keep working unchanged.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I'll give this a go

@roedoejet roedoejet force-pushed the dev.ap/styletts2-improvements branch 4 times, most recently from a5b2a20 to e8c84a4 Compare May 21, 2026 20:56
@roedoejet roedoejet requested a review from joanise May 22, 2026 18:01
@roedoejet roedoejet force-pushed the dev.ap/styletts2-improvements branch from 1ac2571 to cca19c5 Compare May 27, 2026 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants