Skip to content

Debugpy silently crashes because of Broken pipe or FileNotFoundError #1828

@jubueche

Description

@jubueche

Before creating a new issue, please check the FAQ to see if your question is answered there.

Environment data

  • debugpy version: 1.8.12
  • OS and version: Red Hat Enterprise Linux, 9.5 (Plow)
  • Python version (& distribution if applicable, e.g. Anaconda): 3.10.16, Anaconda
  • Using VS Code or Visual Studio: VS Code

Actual behavior

I am on a compute cluster that uses LSF and I launch an interactive job to get into a compute node. In that compute node, I start debugpy using
python -m debugpy --listen 0.0.0.0:1326 --wait-for-client -c "print('hello')"

And the serve is waiting for the client. However, when I try to connect to the server using VSCode, I get ECONNREFUSED. When I inspect the logs using python -m debugpy --log-to logs --listen 0.0.0.0:1326 --wait-for-client -c "print('hello')" I see the following:

debugpy.pydevd.2718046.log

0.32s - pydevd: Use libraries filter: False

0.00s - IDE_PROJECT_ROOTS []

0.00s - Collecting default library roots.
0.00s - LIBRARY_ROOTS ['/u/jub/.local/lib/python3.10/site-packages', '/u/jub/miniconda3/envs/torch/lib/python3.10', '/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages']

0.00s - Apply debug mode: debugpy-dap
0.00s - Preimport: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages;debugpy._vendored.force_pydevd
0.00s - Connecting to 127.0.0.1:47939
0.00s - Connected to: <socket.socket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 43098), raddr=('127.0.0.1', 47939)>.
0.00s - Applying patching to hide pydevd threads (Py3 version).
0.01s - ReaderThread: empty contents received (len(line) == 0).
0.00s - PyDB.dispose_and_kill_all_pydevd_threads (called from: File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 324, in _terminate_on_socket_close)
0.00s - PyDB.dispose_and_kill_all_pydevd_threads (first call)
0.00s - PyDB.dispose_and_kill_all_pydevd_threads no commands being processed.
0.00s - PyDB.dispose_and_kill_all_pydevd_threads killing thread: <ReaderThread(pydevd.Reader, started daemon 22788828034624)>
0.00s - pydevd.Reader received kill signal
0.00s - PyDB.dispose_and_kill_all_pydevd_threads killing thread: <WriterThread(pydevd.Writer, started daemon 22788898952768)>
0.00s - sending cmd (http_json) -->             CMD_EXIT {"type": "event", "event": "terminated", "seq": 2, "body": {}, "pydevd_cmd_id": 129}

0.00s - pydevd.Writer received kill signal
0.00s - PyDB.dispose_and_kill_all_pydevd_threads waiting for pydb daemon threads to finish
0.00s - PyDB.dispose_and_kill_all_pydevd_threads (called from: File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 432, in _on_run)
0.00s - PyDB.dispose_and_kill_all_pydevd_threads (already disposed - wait)
0.10s - Successfully Loaded helper lib to set tracing to all threads.
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - _on_run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py - run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap_inner
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - _on_run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py - run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap_inner
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - __wait_for_threads_to_finish
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - dispose_and_kill_all_pydevd_threads
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py - _terminate_on_socket_close
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py - _on_run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py - run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap_inner
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - wait
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - __wait_for_threads_to_finish
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - dispose_and_kill_all_pydevd_threads
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py - _on_run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_daemon_thread.py - run
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap_inner
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py - _bootstrap
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - set_tracing_for_untraced_contexts
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - _locked_settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - _settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - listen
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - debug
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/public_api.py - wrapper
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - start_debugging
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - run_code
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - main
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__main__.py - <module>
0.00s - Set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py - _run_code
0.00s - Set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py - _run_module_as_main
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/pydevd.py - settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - _settrace
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - listen
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/api.py - debug
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/public_api.py - wrapper
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - start_debugging
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - run_code
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/server/cli.py - main
0.00s - SKIP set tracing of frame: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__main__.py - <module>
0.40s - PyDB.dispose_and_kill_all_pydevd_threads: finished
0.00s - The following pydb threads may not have finished correctly: pydevd.CommandThread, pydevd.Writer
0.00s - PyDB.dispose_and_kill_all_pydevd_threads: finished
0.00s - ReaderThread: exit
Traceback (most recent call last):
  File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 422, in _on_run
    cmd.send(self.sock)
  File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_net_command.py", line 109, in send
    sock.sendall(as_bytes)
BrokenPipeError: [Errno 32] Broken pipe
0.00s - WriterThread: exit

debugpy.server-2718046.log

I+00000.013: Linux-5.14.0-427.42.1.el9_4.x86_64-x86_64-with-glibc2.34 x86_64
             CPython 3.10.16 (64-bit)
             debugpy 1.8.12

I+00000.113: Initial environment:
             
             System paths:
                 sys.executable: /u/jub/miniconda3/envs/torch/bin/python(/u/jub/miniconda3/envs/torch/bin/python3.10)
                 sys.prefix: /u/jub/miniconda3/envs/torch
                 sys.base_prefix: /u/jub/miniconda3/envs/torch
                 sys.real_prefix: <missing>
                 site.getsitepackages(): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 site.getusersitepackages(): /u/jub/.local/lib/python3.10/site-packages
                 sys.path (site-packages): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('stdlib'): /u/jub/miniconda3/envs/torch/lib/python3.10
                 sysconfig.get_path('platstdlib'): /u/jub/miniconda3/envs/torch/lib/python3.10
                 sysconfig.get_path('purelib'): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('platlib'): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('include'): /u/jub/miniconda3/envs/torch/include/python3.10
                 sysconfig.get_path('scripts'): /u/jub/miniconda3/envs/torch/bin
                 sysconfig.get_path('data'): /u/jub/miniconda3/envs/torch
                 os.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/os.py
                 threading.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py
                 debugpy.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__init__.py
             
             Installed packages:
                 kiwisolver==1.4.7
                 tzdata==2024.2
                 Jinja2==3.1.4
                 torch==2.5.1
                 py-cpuinfo==9.0.0
                 filelock==3.16.1
                 multidict==6.1.0
                 pip==24.2
                 pluggy==0.13.1
                 cmake==3.31.2
                 tomlkit==0.13.2
                 pybind11==2.13.6
                 packaging==24.2
                 click==8.1.7
                 huggingface_hub==0.26.5
                 huggingface-hub==0.27.0
                 safetensors==0.4.5
                 peft==0.14.0
                 Brotli==1.0.9
                 networkx==3.2
                 sentry-sdk==2.19.2
                 PyYAML==6.0
                 mypy==0.991
                 mccabe==0.7.0
                 triton-nightly==3.0.0.post20240716052845
                 transformers==4.47.1
                 multiprocess==0.70.16
                 xformers==0.0.29.post1
                 MarkupSafe==2.1.1
                 pylint==2.15.7
                 torchaudio==2.5.1
                 async-timeout==5.0.1
                 annotated-types==0.7.0
                 Pillow==9.2.0
                 pyarrow==18.1.0
                 fast_hadamard_transform==1.0.4.post1
                 PySocks==1.7.1
                 mypy-extensions==1.0.0
                 aiosignal==1.3.2
                 hjson==3.1.0
                 fsspec==2024.9.0
                 fsspec==2024.12.0
                 setuptools==75.1.0
                 datasets==3.2.0
                 six==1.17.0
                 tqdm==4.67.1
                 typing_extensions==4.12.2
                 threadpoolctl==3.5.0
                 debugpy==1.8.12
                 smmap==5.0.1
                 ninja==1.11.1.3
                 frozenlist==1.5.0
                 scipy==1.14.1
                 scipy==1.8.1
                 gmpy2==2.1.2
                 pydantic==2.10.3
                 docker-pycreds==0.4.0
                 protobuf==5.29.2
                 pytest==6.2.4
                 aiohttp==3.11.10
                 gitdb==4.0.11
                 yarl==1.18.3
                 py==1.11.0
                 mpi4py==4.0.1
                 urllib3==2.2.3
                 propcache==0.2.1
                 wrapt==1.17.0
                 lazy-object-proxy==1.10.0
                 scikit-build==0.18.1
                 pycparser==2.22
                 cycler==0.12.1
                 distro==1.9.0
                 iniconfig==2.0.0
                 idna==3.10
                 h2==4.1.0
                 hyperframe==6.0.1
                 triton==3.1.0
                 tomli==2.2.1
                 cffi==1.15.0
                 types-dataclasses==0.6.6
                 wandb==0.19.1
                 fonttools==4.55.3
                 pycodestyle==2.10.0
                 wheel==0.44.0
                 accelerate==1.2.1
                 scikit-learn==1.6.0
                 attrs==24.3.0
                 psutil==6.1.0
                 zstandard==0.19.0
                 dill==0.3.8
                 setproctitle==1.3.4
                 black==24.3.0
                 requests==2.32.3
                 isort==5.13.2
                 mpmath==1.3.0
                 certifi==2024.12.14
                 pyparsing==3.2.0
                 hpack==4.0.0
                 pandas==2.2.3
                 tokenizers==0.21.0
                 regex==2024.11.6
                 pytz==2024.2
                 contourpy==1.3.1
                 pydantic_core==2.27.1
                 aiohappyeyeballs==2.4.4
                 pathspec==0.12.1
                 torchvision==0.20.1
                 astroid==2.13.5
                 GitPython==3.1.43
                 types-requests==2.26.3
                 matplotlib==3.10.0
                 platformdirs==4.3.6
                 parameterized==0.8.1
                 msgpack==1.1.0
                 python-dateutil==2.9.0.post0
                 toml==0.10.2
                 numpy==1.26.4
                 xxhash==3.5.0
                 joblib==1.4.2
                 charset-normalizer==3.4.0
                 colorama==0.4.6
                 sympy==1.13.1
                 aihwkit_lightning==0.0.1
                 sigmamoe==0.0
                 deepspeed==0.15.4+unknown
                 analoglora==0.0
                 analogmoe==0.0

I+00000.113: sys.argv before parsing: ['/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__main__.py', '--log-to', 'logs', '--listen', '0.0.0.0:1326', '--wait-for-client', '-c', "print('hello')"]
                      after parsing:  ['/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__main__.py']

D+00000.114: sys.argv after patching: ['-c']

D+00000.114: configure({'qt': 'none', 'subProcess': True}, {})

D+00000.114: listen(('0.0.0.0', 1326), **{})

I+00000.114: Initial debug configuration: {
                 "qt": "none",
                 "subProcess": true,
                 "python": "/u/jub/miniconda3/envs/torch/bin/python",
                 "pythonEnv": {}
             }

I+00000.114: Waiting for adapter endpoints on 127.0.0.1:37193...

I+00000.114: debugpy.listen() spawning adapter: [
                 "/u/jub/miniconda3/envs/torch/bin/python",
                 "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter",
                 "--for-server",
                 "37193",
                 "--host",
                 "0.0.0.0",
                 "--port",
                 "1326",
                 "--server-access-token",
                 "04ac658025d99f968fb21846b183284a1501cb1b3c8e54537b6a4bdd24772ce2",
                 "--log-dir",
                 "logs"
             ]

I+00000.283: Endpoints received from adapter: {
                 "client": {
                     "host": "0.0.0.0",
                     "port": 1326
                 },
                 "server": {
                     "host": "127.0.0.1",
                     "port": 47939
                 }
             }

I+00000.283: Adapter is accepting incoming client connections on 0.0.0.0:1326

D+00000.283: pydevd.settrace(*(), **{'host': '127.0.0.1', 'port': 47939, 'wait_for_ready_to_run': False, 'block_until_connected': True, 'access_token': '04ac658025d99f968fb21846b183284a1501cb1b3c8e54537b6a4bdd24772ce2', 'suspend': False, 'patch_multiprocessing': True, 'dont_trace_start_patterns': ('/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy',), 'dont_trace_end_patterns': ('debugpy_launcher.py',)})

I+00000.395: pydevd is connected to adapter at 127.0.0.1:47939

D+00000.395: wait_for_client()

debugpy.adapter-2718050.log

I+00000.013: Linux-5.14.0-427.42.1.el9_4.x86_64-x86_64-with-glibc2.34 x86_64
             CPython 3.10.16 (64-bit)
             debugpy 1.8.12

I+00000.127: debugpy.adapter startup environment:
             
             System paths:
                 sys.executable: /u/jub/miniconda3/envs/torch/bin/python(/u/jub/miniconda3/envs/torch/bin/python3.10)
                 sys.prefix: /u/jub/miniconda3/envs/torch
                 sys.base_prefix: /u/jub/miniconda3/envs/torch
                 sys.real_prefix: <missing>
                 site.getsitepackages(): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 site.getusersitepackages(): /u/jub/.local/lib/python3.10/site-packages
                 sys.path (site-packages): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('stdlib'): /u/jub/miniconda3/envs/torch/lib/python3.10
                 sysconfig.get_path('platstdlib'): /u/jub/miniconda3/envs/torch/lib/python3.10
                 sysconfig.get_path('purelib'): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('platlib'): /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages
                 sysconfig.get_path('include'): /u/jub/miniconda3/envs/torch/include/python3.10
                 sysconfig.get_path('scripts'): /u/jub/miniconda3/envs/torch/bin
                 sysconfig.get_path('data'): /u/jub/miniconda3/envs/torch
                 os.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/os.py
                 threading.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/threading.py
                 debugpy.__file__: /u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/../../debugpy/__init__.py(/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/__init__.py)
             
             Installed packages:
                 kiwisolver==1.4.7
                 tzdata==2024.2
                 Jinja2==3.1.4
                 torch==2.5.1
                 py-cpuinfo==9.0.0
                 filelock==3.16.1
                 multidict==6.1.0
                 pip==24.2
                 pluggy==0.13.1
                 cmake==3.31.2
                 tomlkit==0.13.2
                 pybind11==2.13.6
                 packaging==24.2
                 click==8.1.7
                 huggingface_hub==0.26.5
                 huggingface-hub==0.27.0
                 safetensors==0.4.5
                 peft==0.14.0
                 Brotli==1.0.9
                 networkx==3.2
                 sentry-sdk==2.19.2
                 PyYAML==6.0
                 mypy==0.991
                 mccabe==0.7.0
                 triton-nightly==3.0.0.post20240716052845
                 transformers==4.47.1
                 multiprocess==0.70.16
                 xformers==0.0.29.post1
                 MarkupSafe==2.1.1
                 pylint==2.15.7
                 torchaudio==2.5.1
                 async-timeout==5.0.1
                 annotated-types==0.7.0
                 Pillow==9.2.0
                 pyarrow==18.1.0
                 fast_hadamard_transform==1.0.4.post1
                 PySocks==1.7.1
                 mypy-extensions==1.0.0
                 aiosignal==1.3.2
                 hjson==3.1.0
                 fsspec==2024.9.0
                 fsspec==2024.12.0
                 setuptools==75.1.0
                 datasets==3.2.0
                 six==1.17.0
                 tqdm==4.67.1
                 typing_extensions==4.12.2
                 threadpoolctl==3.5.0
                 debugpy==1.8.12
                 smmap==5.0.1
                 ninja==1.11.1.3
                 frozenlist==1.5.0
                 scipy==1.14.1
                 scipy==1.8.1
                 gmpy2==2.1.2
                 pydantic==2.10.3
                 docker-pycreds==0.4.0
                 protobuf==5.29.2
                 pytest==6.2.4
                 aiohttp==3.11.10
                 gitdb==4.0.11
                 yarl==1.18.3
                 py==1.11.0
                 mpi4py==4.0.1
                 urllib3==2.2.3
                 propcache==0.2.1
                 wrapt==1.17.0
                 lazy-object-proxy==1.10.0
                 scikit-build==0.18.1
                 pycparser==2.22
                 cycler==0.12.1
                 distro==1.9.0
                 iniconfig==2.0.0
                 idna==3.10
                 h2==4.1.0
                 hyperframe==6.0.1
                 triton==3.1.0
                 tomli==2.2.1
                 cffi==1.15.0
                 types-dataclasses==0.6.6
                 wandb==0.19.1
                 fonttools==4.55.3
                 pycodestyle==2.10.0
                 wheel==0.44.0
                 accelerate==1.2.1
                 scikit-learn==1.6.0
                 attrs==24.3.0
                 psutil==6.1.0
                 zstandard==0.19.0
                 dill==0.3.8
                 setproctitle==1.3.4
                 black==24.3.0
                 requests==2.32.3
                 isort==5.13.2
                 mpmath==1.3.0
                 certifi==2024.12.14
                 pyparsing==3.2.0
                 hpack==4.0.0
                 pandas==2.2.3
                 tokenizers==0.21.0
                 regex==2024.11.6
                 pytz==2024.2
                 contourpy==1.3.1
                 pydantic_core==2.27.1
                 aiohappyeyeballs==2.4.4
                 pathspec==0.12.1
                 torchvision==0.20.1
                 astroid==2.13.5
                 GitPython==3.1.43
                 types-requests==2.26.3
                 matplotlib==3.10.0
                 platformdirs==4.3.6
                 parameterized==0.8.1
                 msgpack==1.1.0
                 python-dateutil==2.9.0.post0
                 toml==0.10.2
                 numpy==1.26.4
                 xxhash==3.5.0
                 joblib==1.4.2
                 charset-normalizer==3.4.0
                 colorama==0.4.6
                 sympy==1.13.1
                 aihwkit_lightning==0.0.1
                 sigmamoe==0.0
                 deepspeed==0.15.4+unknown
                 analoglora==0.0
                 analogmoe==0.0

I+00000.128: Listening for incoming Client connections on 0.0.0.0:1326...

I+00000.128: Listening for incoming Server connections on 127.0.0.1:47939...

I+00000.129: Sending endpoints info to debug server at localhost:37193:
             {
                 "client": {
                     "host": "0.0.0.0",
                     "port": 1326
                 },
                 "server": {
                     "host": "127.0.0.1",
                     "port": 47939
                 }
             }

I+00000.129: Writing endpoints info to '/tmp/noConfigDebugAdapterEndpoints-368cbf5a634a2ec02ed2/debuggerAdapterEndpoint.txt':
             {
                 "client": {
                     "host": "0.0.0.0",
                     "port": 1326
                 },
                 "server": {
                     "host": "127.0.0.1",
                     "port": 47939
                 }
             }

E+00000.129: Error writing endpoints info to file:
             
             Traceback (most recent call last):
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 115, in main
                 with open(listener_file, "w") as f:
             FileNotFoundError: [Errno 2] No such file or directory: '/tmp/noConfigDebugAdapterEndpoints-368cbf5a634a2ec02ed2/debuggerAdapterEndpoint.txt'
             
             Stack where logged:
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
                 return _run_code(code, main_globals, None,
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py", line 86, in _run_code
                 exec(code, run_globals)
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 233, in <module>
                 main(_parse_argv(sys.argv))
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 119, in main
                 log.reraise_exception("Error writing endpoints info to file:")
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/../../debugpy/common/log.py", line 222, in reraise_exception
                 _exception(format_string, *args, **kwargs)
             

I+00000.129: Not logging to "<stderr>" anymore.


Notably, I see two issues (and I don't know which one causes which or which one comes first etc.):

E+00000.129: Error writing endpoints info to file:
             
             Traceback (most recent call last):
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 115, in main
                 with open(listener_file, "w") as f:
             FileNotFoundError: [Errno 2] No such file or directory: '/tmp/noConfigDebugAdapterEndpoints-368cbf5a634a2ec02ed2/debuggerAdapterEndpoint.txt'
             
             Stack where logged:
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py", line 196, in _run_module_as_main
                 return _run_code(code, main_globals, None,
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/runpy.py", line 86, in _run_code
                 exec(code, run_globals)
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 233, in <module>
                 main(_parse_argv(sys.argv))
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/__main__.py", line 119, in main
                 log.reraise_exception("Error writing endpoints info to file:")
               File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/adapter/../../debugpy/common/log.py", line 222, in reraise_exception
                 _exception(format_string, *args, **kwargs)

and

Traceback (most recent call last):
  File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_comm.py", line 422, in _on_run
    cmd.send(self.sock)
  File "/u/jub/miniconda3/envs/torch/lib/python3.10/site-packages/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_net_command.py", line 109, in send
    sock.sendall(as_bytes)
BrokenPipeError: [Errno 32] Broken pipe

Expected behavior

This was working before on the cluster and now it doesn't. Probably something in the cluster config was changed, but I would like to have some guidance on how to fix it/some understanding what could be going on.

Steps to reproduce:

I am trying to reproduce on a different cluster right now, but it might take a while as it is very busy.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions