Skip to content

Lift fallback for PyPI package names to obtain_file#5159

Open
Flamefire wants to merge 6 commits intoeasybuilders:developfrom
Flamefire:pypi-pkg-name
Open

Lift fallback for PyPI package names to obtain_file#5159
Flamefire wants to merge 6 commits intoeasybuilders:developfrom
Flamefire:pypi-pkg-name

Conversation

@Flamefire
Copy link
Copy Markdown
Contributor

@Flamefire Flamefire commented Apr 7, 2026

Followup to #4943

When the download URLs include a PYPI source and the download filename looks like a pypi source where the package name contains dashes then retry downloading with dashes replaced by underscores.

This is more generic than #4943 as it handles e.g. PythonPackage easyconfigs, not just extensions with defaulted names.

Done in 2 steps for easier checking:

  1. Factor out a "wrapper" for download_file as a EasyBlock method from obtain_file without any significant changes
    Small change to dry-run logging as "extensions typically have custom source URLs specified, only mention first" didn't seem to match behavior so removed the comment and add the URL to the initial message instead of a second one
  2. Call this again if we have a PYPI URL and at least 2 dashes in the download filename (last is assumed to be the version delimiter)

This is broader than #4943

  • Handle not only extensions but any obtain_file calls, especially top-level easyconfig sources
  • Trigger when PYPI URL appears anywhere in the URLs, not only in first position
  • Trigger when there are at least 2 dashes not only when the extensions source name is defaulted and also for dots and combinations of dashes, dots and underscores. Specs say runs of those should be replaced by single underscore

Issue I had was that at this point we do not know anymore where the (download)filename came from and which template values (for name & version) we should use as extensions temporarily update the easyconfig templates.

Example easyconfig PR that makes use of this: easybuilders/easybuild-easyconfigs#25682

@Flamefire Flamefire force-pushed the pypi-pkg-name branch 3 times, most recently from a5880fc to fe7e752 Compare April 7, 2026 10:32
@boegel boegel added this to the release after 5.3.0 milestone Apr 7, 2026
@boegel boegel changed the title Lift fallback for PYPI package names to obtain_file Lift fallback for PyPI package names to obtain_file Apr 8, 2026
@boegel boegel changed the title Lift fallback for PyPI package names to obtain_file Lift fallback for PyPI package names to obtain_file Apr 8, 2026
@boegel
Copy link
Copy Markdown
Member

boegel commented Apr 8, 2026

@smoors Are you up for taking a look at this?

No rush, it's a bit too late to get this in for EasyBuild v5.3.0...

@Flamefire
Copy link
Copy Markdown
Contributor Author

@smoors can you look into this now that EB 5.3 is released? The changed PyTorch easyconfigs need to be converted to PythonBundle and IIRC the fetch_step isn't run for extensions so the test_step will fail

Copy link
Copy Markdown
Contributor

@smoors smoors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that the retry is moved to obtain_file, would be good to add a test? also for the download_file function?

Comment thread easybuild/framework/easyblock.py Outdated
Comment thread easybuild/framework/easyblock.py Outdated
num_dashes = download_filename.count('-')
if num_dashes >= 2 and any(PYPI_PKG_URL_PATTERN in url for url in source_urls):
# Last dash is the separator between name and version
alt_download_filename = download_filename.replace('-', '_', num_dashes - 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, according to the specs, we should also replace '.' with '_'
https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode

not sure if it should already be done in this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Flamefire
Copy link
Copy Markdown
Contributor Author

I confused the 2 PRs: The more important one is #5162 to get the test file. But also getting this one ready. Writing the test might be the worst...

https://packaging.python.org/en/latest/specifications/binary-distribution-format/#escaping-and-unicode
says that any runs of dashes, dots and underscores should be replaced by a single underscore.
Try first with this rule applied, then with only dashes replaced but
possibly duplication of underscores and keeping dots, and similar for dots as last resort.
@Flamefire
Copy link
Copy Markdown
Contributor Author

Flamefire commented Apr 15, 2026

now that the retry is moved to obtain_file, would be good to add a test? also for the download_file function?

That was tricky to do. But I noticed an unrelated bug (with progress bars) which I fixed in #5162

Used the same approach now here too: unittest.mock to avoid actually downloading anything and being able to check calls.
This actually caught a few cases that were not right: E.g. [.-_] doesn't match - but is easy to overlook.

Rebased to latest develop too.

@Flamefire
Copy link
Copy Markdown
Contributor Author

@smoors Do you still want a unit test for EasyBlock.download_file?

It isn't too complicated and the main parts are covered by this and 2 more obtain_file tests.

Missing:

  • Alternative URL formats
  • Derived PyPI URL usage
  • dry-run
  • handling IOError raised from ft.download_file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants