Skip to content

comm_replay encounters issue  #192

@x41lakazam

Description

@x41lakazam

Describe the Bug

When running comm_replay on ET traces I get the following error:

$  comm_replay --enable-profiler --trace-type et --trace-path /workspace/traces --num-replays 1

 0: [rank0]: Traceback (most recent call last):
 0: [rank0]:   File "/usr/local/bin/comm_replay", line 8, in <module>
 0: [rank0]:     sys.exit(main())
 0: [rank0]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 1671, in main
 0: [rank0]:     traceBench.runBench(commsParams)
 0: [rank0]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 1324, in runBench
 0: [rank0]:     self.benchTime(commsParams)
 0: [rank0]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 1236, in benchTime
 0: [rank0]:     self.replayTrace(commsParams=commsParams, warmup=True)
 0: [rank0]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 1063, in replayTrace
 0: [rank0]:     (latency, global_latency) = self.runComms(
 0: [rank0]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 820, in runComms
 0: [rank0]:     self.collectiveArgs.waitObjIds[curComm.req] = retObj
 0: [rank0]: TypeError: unhashable type: 'list'
56: [rank56]: Traceback (most recent call last):
56: [rank56]:   File "/usr/local/bin/comm_replay", line 8, in <module>
56: [rank56]:     sys.exit(main())
56: [rank56]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 1671, in main
56: [rank56]:     traceBench.runBench(commsParams)
56: [rank56]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 1324, in runBench
56: [rank56]:     self.benchTime(commsParams)
56: [rank56]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 1236, in benchTime
56: [rank56]:     self.replayTrace(commsParams=commsParams, warmup=True)
56: [rank56]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 1063, in replayTrace
56: [rank56]:     (latency, global_latency) = self.runComms(
56: [rank56]:   File "/usr/local/lib/python3.10/dist-packages/et_replay/tools/comm_replay.py", line 820, in runComms
56: [rank56]:     self.collectiveArgs.waitObjIds[curComm.req] = retObj
56: [rank56]: TypeError: unhashable type: 'list'

The chakra schema is 1.1.1-chakra.0.0.4.
I've tried with param@main and param@ 7b19f58 as chakra user guide recommends.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions