Skip to content

Heap buffer overflow when running simulation #6

@HeRaNO

Description

@HeRaNO

Reproduce

  1. Turn on the NS3_SANITIZE https://github.com/aliyun/ns-3-alibabacloud/blob/master/simulation/CMakeLists.txt#L61
  2. Run simulation as normal

Logs

maxRtt=4720 maxBdp=236000
Running Simulation.
The final active chunks per dimension 1 after allocating to queues is: 1
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
ring of node 0, id: 0 dimension: local total nodes in ring: 144 index in ring: 0 offset: 1total nodes in ring: 144
total nodes: 144
Success in opening workload file
model_parallel_NPU_group: is: 8
checkpoints layers are: 
layers initiating fwd_in_bckwd are: 
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
ring of node 0, id: 0 dimension: local total nodes in ring: 8 index in ring: 0 offset: 1total nodes in ring: 8
ring of node 0, id: 0 dimension: local total nodes in ring: 18 index in ring: 0 offset: 8total nodes in ring: 18
id: embedding_layer , depen: -1 , wg_comp_time: 1
type: HYBRID_TRANSFORMER_FWD_IN_BCKWD ,num passes: 1 ,lines: 1 compute scale: 1 ,comm scale: 1
stat path: ./ncclFlowModel_ ,total rows: 1 ,stat row: 0
CSV path and filename: ./ncclFlowModel_detailed_144.csv
CSV path and filename: ./ncclFlowModel_EndToEnd_144.csv
=================================================================
==9941==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x602000fd2f74 at pc 0x7f475725362f bp 0x7fff94b9a270 sp 0x7fff94b9a260
READ of size 4 at 0x602000fd2f74 thread T0
    #0 0x7f475725362e in MockNccl::MockNcclGroup::InterDouBinTreeShift(MockNccl::MockNcclGroup::DoubleBinaryTreeNode*, std::vector<int, std::allocator<int> >) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2038
    #1 0x7f475725200b in MockNccl::MockNcclGroup::genInterDouBinTree(MockNccl::GroupInfo) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2000
    #2 0x7f475724e5e3 in MockNccl::MockNcclGroup::gettreechannels(int, MockNccl::GroupType) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:1893
    #3 0x7f47571cf384 in MockNccl::MockNcclComm::MockNcclComm(int, MockNccl::GroupType, MockNccl::MockNcclGroup*) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclChannel.cc:22
    #4 0x7f475738f260 in AstraSim::Sys::mock_nccl_comms_init() /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:1411
    #5 0x7f4757363d59 in AstraSim::Sys::Sys(AstraSim::AstraNetworkAPI*, AstraSim::AstraMemoryAPI*, int, int, int, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, float, float, int, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, GPUType, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, int) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:297
    #6 0x5562830980ce in main /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/scratch/AstraSimNetwork.cc:311
    #7 0x7f473bda2d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)
    #8 0x7f473bda2e3f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x29e3f)
    #9 0x556283050384 in _start (/root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/build/scratch/ns3.36.1-AstraSimNetwork-debug+0x1d3384)

0x602000fd2f74 is located 0 bytes to the right of 4-byte region [0x602000fd2f70,0x602000fd2f74)
allocated by thread T0 here:
    #0 0x7f47694b51e7 in operator new(unsigned long) ../../../../src/libsanitizer/asan/asan_new_delete.cpp:99
    #1 0x55628316e51c in __gnu_cxx::new_allocator<int>::allocate(unsigned long, void const*) /usr/include/c++/11/ext/new_allocator.h:127
    #2 0x556283156623 in std::allocator_traits<std::allocator<int> >::allocate(std::allocator<int>&, unsigned long) /usr/include/c++/11/bits/alloc_traits.h:464
    #3 0x556283125b33 in std::_Vector_base<int, std::allocator<int> >::_M_allocate(unsigned long) /usr/include/c++/11/bits/stl_vector.h:346
    #4 0x5562830fc49b in std::_Vector_base<int, std::allocator<int> >::_M_create_storage(unsigned long) /usr/include/c++/11/bits/stl_vector.h:361
    #5 0x5562830d302a in std::_Vector_base<int, std::allocator<int> >::_Vector_base(unsigned long, std::allocator<int> const&) /usr/include/c++/11/bits/stl_vector.h:305
    #6 0x5562830affda in std::vector<int, std::allocator<int> >::vector(std::vector<int, std::allocator<int> > const&) /usr/include/c++/11/bits/stl_vector.h:555
    #7 0x7f4757251f96 in MockNccl::MockNcclGroup::genInterDouBinTree(MockNccl::GroupInfo) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2000
    #8 0x7f475724e5e3 in MockNccl::MockNcclGroup::gettreechannels(int, MockNccl::GroupType) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:1893
    #9 0x7f47571cf384 in MockNccl::MockNcclComm::MockNcclComm(int, MockNccl::GroupType, MockNccl::MockNcclGroup*) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclChannel.cc:22
    #10 0x7f475738f260 in AstraSim::Sys::mock_nccl_comms_init() /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:1411
    #11 0x7f4757363d59 in AstraSim::Sys::Sys(AstraSim::AstraNetworkAPI*, AstraSim::AstraMemoryAPI*, int, int, int, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, float, float, float, int, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, GPUType, std::vector<int, std::allocator<int> >, std::vector<int, std::allocator<int> >, int) /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/Sys.cc:297
    #12 0x5562830980ce in main /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/scratch/AstraSimNetwork.cc:311
    #13 0x7f473bda2d8f  (/lib/x86_64-linux-gnu/libc.so.6+0x29d8f)

SUMMARY: AddressSanitizer: heap-buffer-overflow /root/SimAI/astra-sim-alibabacloud/extern/network_backend/ns3-interface/simulation/src/applications/astra-sim/system/MockNcclGroup.cc:2038 in MockNccl::MockNcclGroup::InterDouBinTreeShift(MockNccl::MockNcclGroup::DoubleBinaryTreeNode*, std::vector<int, std::allocator<int> >)
Shadow bytes around the buggy address:
  0x0c04801f2590: fa fa fd fa fa fa fd fd fa fa fd fa fa fa fd fa
  0x0c04801f25a0: fa fa fd fd fa fa fd fa fa fa fd fa fa fa fd fd
  0x0c04801f25b0: fa fa fd fa fa fa fd fa fa fa fd fd fa fa fd fa
  0x0c04801f25c0: fa fa fd fa fa fa fd fd fa fa fd fa fa fa fd fa
  0x0c04801f25d0: fa fa fd fd fa fa fd fa fa fa fd fa fa fa fd fd
=>0x0c04801f25e0: fa fa 04 fa fa fa 04 fa fa fa 00 fa fa fa[04]fa
  0x0c04801f25f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c04801f2600: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c04801f2610: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c04801f2620: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c04801f2630: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==9941==ABORTING

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions