Skip to content

TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' #208

@paklui

Description

@paklui

Describe the Bug

A clear and concise description of what the bug is.

Using the latest git rev 656b66c, I run into the TypeError: unsupported operand type(s) for |: 'type' and 'NoneType' error.
Earlier git rev a4dbc72, I was able to get PARAM comms running.
I suspect the version of Python used in my CentOS Stream 9 could be related, as there are changes in certain python syntax in newer python, or differences between Python 3.9 vs Python 3.10+.

(venv-param) [amd@hostname-1e707-b05-2 PARAMcomms]$ python --version
Python 3.9.21

Steps to Reproduce

Steps to reproduce the behavior.
Please include the version information where the bug was observed.

steps:

cd param-656b66c/
cd train/compute/python/
pip install .
cd ../../comms/pt/
pip install .

To run:

ROCM_PATH=${ROCM_PATH:-/opt/rocm}
NFS_PATH=/share2/amd-share
OMPI_INSTALL_DIR=${NFS_PATH}/ompi4-install
RCCL_INSTALL_DIR=${NFS_PATH}/rccl_develop/build/release
RCCL_TESTS_INSTALL_DIR=${NFS_PATH}/rccl-tests/build
export PATH=${OMPI_INSTALL_DIR}/bin:$PATH
export LD_LIBRARY_PATH=${RCCL_INSTALL_DIR}:${OMPI_INSTALL_DIR}/lib:$LD_LIBRARY_PATH
source /share2/PARAMcomms/venv-param/bin/activate

To run:

mpirun --allow-run-as-root -np 8 -x NCCL_DEBUG=INFO -x PYTHONPATH=/usr/bin/python3 -host hostname-1e707-b05-2:8 -map-by ppr:8:node --bind-to none --mca pml ucx --mca btl ^openib -x PATH=${PATH} -x LD_LIBRARY_PATH=${LD_LIBRARY_PATH} -x NCCL_IB_GID_INDEX=3 -x RCCL_ENABLE_INTRANET=1 -x NCCL_IB_HCA=bnxt_re0,bnxt_re1,bnxt_re2,bnxt_re3,bnxt_re4,bnxt_re5,bnxt_re6,bnxt_re7 -x NCCL_IGNORE_CPU_AFFINITY=1 /share2/PARAMcomms/param/train/comms/pt/comms.py --device rocm --master-ip hostname-1e707-b05-2 -b 1 -e 1G -n 10 -f 2 -z 0 --collective all_reduce --data-type float32 

python version:

(venv-param) [amd@hostname-1e707-b05-2 PARAMcomms]$ python --version
Python 3.9.21

pip version:

(venv-param) [amd@hostname-1e707-b05-2 PARAMcomms]$ pip list
Package                  Version
------------------------ ----------------------------
apex                     1.6.0+rocm6.5.0.git004991b6
fbgemm_gpu               1.2.0
filelock                 3.18.0
fsspec                   2025.3.2
future                   1.0.0
gitdb                    4.0.12
GitPython                3.1.44
Jinja2                   3.1.6
MarkupSafe               3.0.2
mpmath                   1.3.0
networkx                 3.2.1
numpy                    2.0.2
parambench-train-comms   0.0.0
parambench-train-compute 1.0.0+git.1747955991
pillow                   11.2.1
pip                      25.1.1
pydot                    4.0.0
pyparsing                3.2.3
pytorch-triton-rocm      3.2.0+rocm6.5.0.git6da9e660
scipy                    1.13.1
setuptools               53.0.0
smmap                    5.0.2
sympy                    1.13.1
torch                    2.6.0+rocm6.5.0.gitcf65c6f2
torchaudio               2.6.0+rocm6.5.0.gitd8831425
torchvision              0.21.0+rocm6.5.0.git7af69879
typing_extensions        4.13.2

Expected Behavior

A clear and concise description of what you expected to happen.

Expect to run. If I use the older version, such as a4dbc72, I was able to run.


+ mpirun --allow-run-as-root -np 8 -x NCCL_DEBUG=VERSION -x PYTHONPATH=/usr/bin/python3 -host hostname-1e707-b05-2:8 -map-by ppr:8:node --bind-to none --mca pml ucx --mca btl '^openib' -x PATH=/share2/PARAMcomms/venv-param/bin:/share2/amd-share/ompi4-install/bin:/share2/PARAMcomms/venv-param/bin:/home/amd/.local/bin:/home/amd/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin -x LD_LIBRARY_PATH=/share2/amd-share/rccl_develop/build/release:/share2/amd-share/ompi4-install/lib: -x NCCL_IB_GID_INDEX=3 -x RCCL_ENABLE_INTRANET=1 -x NCCL_IB_HCA=bnxt_re0,bnxt_re1,bnxt_re2,bnxt_re3,bnxt_re4,bnxt_re5,bnxt_re6,bnxt_re7 -x NCCL_IGNORE_CPU_AFFINITY=1 /share2/PARAMcomms/param/train/comms/pt/comms.py --device rocm --master-ip hostname-1e707-b05-2 --b 4G --e 4G --n 100 --f 2 --z 0 --collective all_reduce --data-type float32
         PARAM COMM environment: {'world_size': 8, 'local_size': 8, 'global_rank': 0, 'local_rank': 0}
         backend: nccl nw-stack: pytorch-dist args.data_types: ['float32'] args.b: 4G args.e: 4G args.f: 2 args.z: 0 args.master_ip: hostname-1e707-b05-2
Hello from Rank 0: [Rank   0] host hostname-1e707-b05-2, device: cuda:0, local_rank: 0 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 1: [Rank   1] host hostname-1e707-b05-2, device: cuda:1, local_rank: 1 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 2: [Rank   2] host hostname-1e707-b05-2, device: cuda:2, local_rank: 2 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 3: [Rank   3] host hostname-1e707-b05-2, device: cuda:3, local_rank: 3 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 4: [Rank   4] host hostname-1e707-b05-2, device: cuda:4, local_rank: 4 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 5: [Rank   5] host hostname-1e707-b05-2, device: cuda:5, local_rank: 5 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 6: [Rank   6] host hostname-1e707-b05-2, device: cuda:6, local_rank: 6 world_size: 8, master_ip: hostname-1e707-b05-2
Hello from Rank 7: [Rank   7] host hostname-1e707-b05-2, device: cuda:7, local_rank: 7 world_size: 8, master_ip: hostname-1e707-b05-2
RCCL version : 2.24.3-HEAD:2c0eecf
HIP version  : 6.5.50421-a90f5536a
ROCm version : 6.5.0.0-990-de37842
Hostname     : hostname-1e707-b05-2
Librccl path : /share2/PARAMcomms/venv-param/lib64/python3.9/site-packages/torch/lib/librccl.so
[Rank   0] allSizes: [4294967296] element_size: 4 local_rank: 0, num_pg 1, groupSize 8
         collective=all_reduce, src_ranks=None, dst_ranks=None

        COMMS-RES                          total-size (B)  nElementsPerRank  nElementsPairPerRank   Latency(us):p50         p75         p95         Min         Max    AlgBW(GB/s) BusBW(GB/s)
        COMMS-RES-all_reduce-float32        4294967296        1073741824           ...

Screenshots

If applicable, add screenshots to help explain your problem.

+ mpirun --allow-run-as-root -np 8 -x NCCL_DEBUG=INFO -x PYTHONPATH=/usr/bin/python3 -host 1e707-b05-2:8 -map-by ppr:8:node --bind-to none --mca pml ucx --mca btl '^openib' -x PATH=/share2/PARAMcomms/venv-param/bin:/share2/amd-share/ompi4-install/bin:/share2/PARAMcomms/venv-param/bin:/home/amd/.local/bin:/home/amd/bin:/usr/share/Modules/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin -x LD_LIBRARY_PATH=/share2/amd-share/rccl_develop/build/release:/share2/amd-share/ompi4-install/lib: -x NCCL_IGNORE_CPU_AFFINITY=1 /share2/PARAMcomms/param/train/comms/pt/comms.py --device rocm --backend nccl --master-ip hostname-1e707-b05-2 -b 1 -e 1G -n 10 -f 2 -z 0 --collective all_reduce --data-type float32
CollectiveArgsMixin does not exist or module not found. Default to empty class.
Traceback (most recent call last):
  File "/share2/PARAMcomms/param/train/comms/pt/comms.py", line 19, in <module>
    from param_bench.train.comms.pt import comms_utils
  File "/share2/PARAMcomms/venv-param/lib64/python3.9/site-packages/param_bench/train/comms/pt/comms_utils.py", line 25, in <module>
    from param_bench.train.comms.pt.pytorch_backend_utils import (
  File "/share2/PARAMcomms/venv-param/lib64/python3.9/site-packages/param_bench/train/comms/pt/pytorch_backend_utils.py", line 392, in <module>
    device: str | None = None,
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[503,1],0]
  Exit code:    1
--------------------------------------------------------------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions