Skip to content

Fix atomic alignment#2075

Open
372046933 wants to merge 1 commit intoNVIDIA:masterfrom
372046933:fix_atomic_alignment
Open

Fix atomic alignment#2075
372046933 wants to merge 1 commit intoNVIDIA:masterfrom
372046933:fix_atomic_alignment

Conversation

@372046933
Copy link
Copy Markdown

Description

Below is the smallest reproducible code. the __attribute__((packed)) on ncclGinProxyGfd_t makes the alignment of ncclGinProxyGfd_t equals to 1. The packed struct here is not needed for two reasons.

  1. We will do atomic operations on the address later. packed breaks the alignment requirement of atomics
  2. ncclGinProxyQword_t is already packed and size of it is 8 bytes. So size of ncclGinProxyGfd_t is 64 bytes. and packed is redundant here.
#include <atomic>

typedef union {
  uint64_t raw;
  struct {
    uint64_t v : 1;
    uint64_t resv : 63;
  } __attribute__((packed)) flag;
} ncclGinProxyQword_t;


static_assert(sizeof(ncclGinProxyQword_t) == sizeof(uint64_t),
              "sizeof(ncclGinProxyQword_t) != sizeof(uint64_t)");

typedef struct __attribute__((packed)) {
  ncclGinProxyQword_t qword[8];
} ncclGinProxyGfd_t;

static_assert(alignof(ncclGinProxyGfd_t) == 1,
              "Incorrect packed implementation");

void f( ncclGinProxyGfd_t *q) {
  ncclGinProxyQword_t qword;
  __atomic_load(&q[0].qword[0].raw, &qword.raw, __ATOMIC_RELAXED);
}

gcc does not give any warning. But clang does.

test_atomic.cc:24:3: warning: misaligned atomic operation may incur significant performance penalty; the expected alignment (8 bytes) exceeds the actual alignment (1 bytes) [-Watomic-alignment]
   24 |   __atomic_load(&q[0].qword[0].raw, &qword.raw, __ATOMIC_RELAXED);
      |   ^
1 warning generated.

Related Issues

If we are going to support clang, this packed must be removed, otherwise libatomic needs to be linked.
#1744

Changes & Impact

This PR does not affect GCC. For clang, the generated code does not need link to libatomic

Performance Impact

No difference for GCC. Significant performance improvement on clang since libatomic call is eliminated

Signed-off-by: 372046933 <wondertx@gmail.com>
@372046933 372046933 force-pushed the fix_atomic_alignment branch from c923306 to 9596173 Compare March 29, 2026 03:06
@372046933
Copy link
Copy Markdown
Author

Or

typedef struct alignas(8) __attribute__((packed)) {
  ncclGinProxyQword_t qword[8];
} ncclGinProxyGfd_t;

IMHO, removing these attribute makes it simple

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant