Skip to content

HIP: add gfx1152 and gfx1153 to RDNA3.5#24129

Open
harkgill-amd wants to merge 1 commit into
ggml-org:masterfrom
harkgill-amd:fix/add-gfx1152-gfx1153-rdna3_5
Open

HIP: add gfx1152 and gfx1153 to RDNA3.5#24129
harkgill-amd wants to merge 1 commit into
ggml-org:masterfrom
harkgill-amd:fix/add-gfx1152-gfx1153-rdna3_5

Conversation

@harkgill-amd
Copy link
Copy Markdown

Overview

Add gfx1152 and gfx1153 definitions to RDNA3.5 macro in ggml/src/ggml-cuda/vendors/hip.h.

Additional information

Resolves ROCm/TheRock#5579 where users report corrupted output with TheRock nightlies + llama.cpp build from source. Patching this change in resolves the issue.

Requirements

  • I have read and agree with the contributing guidelines : Done
  • AI usage disclosure: To understand where changes were needed to be made.

@harkgill-amd harkgill-amd requested a review from IMbackK as a code owner June 4, 2026 14:07
@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 4, 2026
@Atomic-Germ
Copy link
Copy Markdown
Contributor

It seems to work; I have a gfx1152 system. It's still a little slower than Vulkan of course, but it isn't a bunch of stuttering nonsense either.

@itn3rd77
Copy link
Copy Markdown

itn3rd77 commented Jun 4, 2026

Not related to this PR but why is this labeled Nvidia GPU by the bot? I have seen this obviously wrong labeling several times now.

@soulafein83
Copy link
Copy Markdown

It seems to work; I have a gfx1152 system. It's still a little slower than Vulkan of course, but it isn't a bunch of stuttering nonsense either.

On my laptop HP Omnibook with Ryzen 7 AI 350, I have (on CachyOS) with ROCm 17 TPS, with Vulkan 27 TPS. Model Gemma 4 e4b with 32k

@ckuethe
Copy link
Copy Markdown

ckuethe commented Jun 5, 2026

Works for me on System76 Pangolin Pro

Gemma4

0.00.153.930 I device_info:
0.00.154.019 I   - ROCm0   : AMD Radeon 860M Graphics (81920 MiB, 75951 MiB free)
0.00.154.023 I   - CPU     : AMD Ryzen AI 7 350 w/ Radeon 860M (94149 MiB, 94149 MiB free)
...
3.05.452.125 I slot print_timing: id  3 | task 0 | prompt eval time =     318.70 ms /    15 tokens (   21.25 ms per token,    47.07 tokens per second)
3.05.452.129 I slot print_timing: id  3 | task 0 |        eval time =   67519.17 ms /  1227 tokens (   55.03 ms per token,    18.17 tokens per second)
3.05.452.130 I slot print_timing: id  3 | task 0 |       total time =   67837.88 ms /  1242 tokens
3.05.452.137 I slot print_timing: id  3 | task 0 |    graphs reused =       1222
3.05.452.168 I slot      release: id  3 | task 0 | stop processing: n_tokens = 1241, truncated = 0

Bonsai-8B

0.00.137.764 I device_info:
0.00.137.843 I   - ROCm0   : AMD Radeon 860M Graphics (81920 MiB, 76868 MiB free)
0.00.137.847 I   - CPU     : AMD Ryzen AI 7 350 w/ Radeon 860M (94149 MiB, 94149 MiB free)
...
1.26.916.161 I slot print_timing: id  3 | task 0 | prompt eval time =     199.98 ms /    19 tokens (   10.53 ms per token,    95.01 tokens per second)
1.26.916.163 I slot print_timing: id  3 | task 0 |        eval time =   47210.86 ms /  1313 tokens (   35.96 ms per token,    27.81 tokens per second)
1.26.916.164 I slot print_timing: id  3 | task 0 |       total time =   47410.84 ms /  1332 tokens
1.26.916.168 I slot print_timing: id  3 | task 0 |    graphs reused =       1307
1.26.916.196 I slot      release: id  3 | task 0 | stop processing: n_tokens = 1331, truncated = 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Corrupted token outputs (??? / <unused>) on ROCm backend with gfx1152 target (Ryzen AI 350)

6 participants