Skip to content

Don't import packages just for version compatibility check #5723

@gjoseph92

Description

@gjoseph92

In #5695 (comment), we discovered that pandas was always being imported on Nanny workers because, to check version compatibility, all these modules are imported:

required_packages = [
("dask", lambda p: p.__version__),
("distributed", lambda p: p.__version__),
("msgpack", lambda p: ".".join([str(v) for v in p.version])),
("cloudpickle", lambda p: p.__version__),
("tornado", lambda p: p.version),
("toolz", lambda p: p.__version__),
]
optional_packages = [
("numpy", lambda p: p.__version__),
("pandas", lambda p: p.__version__),
("lz4", lambda p: p.__version__),
("blosc", lambda p: p.__version__),
]

Instead of importing, we could use importlib.metadata.version to get the distribution's version number without importing the package. This could speed up worker startup time and reduce memory footprint a little.

cc @crusaderky

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions