Skip to content

CUDA: refactor MMQ kernel configuration#24127

Open
JohannesGaessler wants to merge 2 commits into
ggml-org:masterfrom
JohannesGaessler:cuda-mmq-config-3
Open

CUDA: refactor MMQ kernel configuration#24127
JohannesGaessler wants to merge 2 commits into
ggml-org:masterfrom
JohannesGaessler:cuda-mmq-config-3

Conversation

@JohannesGaessler
Copy link
Copy Markdown
Contributor

On master it is not possible to configure the CUDA MMQ kernel as a function of batch size and data type. This PR fixes that with a general refactor of the MMQ kernel that resembles more the mma FA kernel with a table of parameters rather than a bunch of macros and functions that return hard-coded values per architecture. Also I moved a lot of the code out of the main mmq.cuh file since it is pretty overloaded on master with 4k LoC. I removed the variable names mmq_x and mmq_y and replaced them with J and I (same as the FA kernels) to avoid confusion with the x and y data pointers. There are no (intentional) functional changes from this PR other than:

  • On master all template specializations in terms of tile sizes are being compiled for both the high-performance version without out-of-bounds checks in src0->ne[1] direction as well as the fallback version with those checks (something like ~5% end-to-end performance difference). However, for the fallback case it should be fine to compile fewer template specializations; a lot of them are just there to make pp snappier for short prompts where the number of tokens is not necessarily cleanly divided by e.g. 64 or 128. So for the fallback case I reduced the template specializations to only powers of 2. Longer-term we can consider adding a compilation option like GGML_CUDA_FULL as an opt-in for template specializations that are rarely useful but blow up the compilation time.
  • On master __launch_bounds__ is optional, with this PR it becomes mandatory in the configuration. This should only affect RDNA1 where a targeted occupancy of 2 is now given.

Going forward this PR will enable:

  • Better kernel tuning, particularly without side effects where the performance for some data types would increase at the cost of a performance regression for other types. On master the kernels are tuned exclusively for large batch sizes, better tuning for small batch sizes should help with speculative decoding performance.
  • Removal of the legacy SRAM data layout for __dp4a with 4 byte loads, to be replaced with 16 byte loads (~10% end-to-end speedup for e.g. P40) that can to a large degree re-use the SRAM layout for tensor cores. Originally I was going to do this transition first so code is being removed before the refactor but this triggered performance regressions for some combinations of GPUs and data types. So I'm taking a more granular approach where I will do the transition piece-by-piece; the refactor in this PR still has some WIPs and inconsistencies that I will gradually phase out.
  • Support for converting quantized data and activations to FP16 in SRAM. This will be useful for general Volta performance as well as FP16/BF16/FP32 MoE performance on all GPUs.

Requirements

@JohannesGaessler JohannesGaessler requested a review from a team as a code owner June 4, 2026 13:31
@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 4, 2026
@JohannesGaessler
Copy link
Copy Markdown
Contributor Author

JohannesGaessler commented Jun 4, 2026

Performance check NVIDIA
GPU Model Microbatch size Test t/s dc71236 t/s 6838c64 Speedup
P40 llama 8B IQ1_S - 1.5625 bpw 16 pp512 417.07 417.48 1.00
P40 llama 8B IQ1_S - 1.5625 bpw 32 pp512 583.90 584.32 1.00
P40 llama 8B IQ1_S - 1.5625 bpw 64 pp512 704.20 703.08 1.00
P40 llama 8B IQ1_S - 1.5625 bpw 128 pp512 815.40 815.42 1.00
P40 llama 8B IQ1_S - 1.5625 bpw 256 pp512 913.02 913.44 1.00
P40 llama 8B IQ1_S - 1.5625 bpw 512 pp512 943.37 941.57 1.00
P40 llama 8B IQ2_S - 2.5 bpw 16 pp512 377.82 378.27 1.00
P40 llama 8B IQ2_S - 2.5 bpw 32 pp512 498.45 498.46 1.00
P40 llama 8B IQ2_S - 2.5 bpw 64 pp512 674.41 672.90 1.00
P40 llama 8B IQ2_S - 2.5 bpw 128 pp512 790.01 788.64 1.00
P40 llama 8B IQ2_S - 2.5 bpw 256 pp512 866.82 871.85 1.01
P40 llama 8B IQ2_S - 2.5 bpw 512 pp512 878.33 876.21 1.00
P40 llama 8B IQ2_XS - 2.3125 bpw 16 pp512 376.63 377.00 1.00
P40 llama 8B IQ2_XS - 2.3125 bpw 32 pp512 484.94 485.37 1.00
P40 llama 8B IQ2_XS - 2.3125 bpw 64 pp512 680.73 681.06 1.00
P40 llama 8B IQ2_XS - 2.3125 bpw 128 pp512 794.04 793.88 1.00
P40 llama 8B IQ2_XS - 2.3125 bpw 256 pp512 884.20 879.34 0.99
P40 llama 8B IQ2_XS - 2.3125 bpw 512 pp512 891.53 888.41 1.00
P40 llama 8B IQ2_XXS - 2.0625 bpw 16 pp512 419.72 419.73 1.00
P40 llama 8B IQ2_XXS - 2.0625 bpw 32 pp512 587.34 587.36 1.00
P40 llama 8B IQ2_XXS - 2.0625 bpw 64 pp512 698.70 698.04 1.00
P40 llama 8B IQ2_XXS - 2.0625 bpw 128 pp512 820.08 817.54 1.00
P40 llama 8B IQ2_XXS - 2.0625 bpw 256 pp512 913.75 913.23 1.00
P40 llama 8B IQ2_XXS - 2.0625 bpw 512 pp512 935.35 935.04 1.00
P40 llama 8B IQ3_S - 3.4375 bpw 16 pp512 356.14 355.89 1.00
P40 llama 8B IQ3_S - 3.4375 bpw 32 pp512 552.25 551.74 1.00
P40 llama 8B IQ3_S - 3.4375 bpw 64 pp512 652.09 651.88 1.00
P40 llama 8B IQ3_S - 3.4375 bpw 128 pp512 773.14 775.26 1.00
P40 llama 8B IQ3_S - 3.4375 bpw 256 pp512 849.60 844.45 0.99
P40 llama 8B IQ3_S - 3.4375 bpw 512 pp512 857.24 858.25 1.00
P40 llama 8B IQ3_S mix - 3.66 bpw 16 pp512 360.31 361.70 1.00
P40 llama 8B IQ3_S mix - 3.66 bpw 32 pp512 538.23 539.24 1.00
P40 llama 8B IQ3_S mix - 3.66 bpw 64 pp512 659.52 663.02 1.01
P40 llama 8B IQ3_S mix - 3.66 bpw 128 pp512 779.73 784.81 1.01
P40 llama 8B IQ3_S mix - 3.66 bpw 256 pp512 845.80 841.09 0.99
P40 llama 8B IQ3_S mix - 3.66 bpw 512 pp512 856.86 836.62 0.98
P40 llama 8B IQ3_XS - 3.3 bpw 16 pp512 376.80 375.22 1.00
P40 llama 8B IQ3_XS - 3.3 bpw 32 pp512 554.16 552.27 1.00
P40 llama 8B IQ3_XS - 3.3 bpw 64 pp512 654.30 649.64 0.99
P40 llama 8B IQ3_XS - 3.3 bpw 128 pp512 777.21 774.08 1.00
P40 llama 8B IQ3_XS - 3.3 bpw 256 pp512 861.72 858.83 1.00
P40 llama 8B IQ3_XS - 3.3 bpw 512 pp512 869.31 866.83 1.00
P40 llama 8B IQ3_XXS - 3.0625 bpw 16 pp512 392.30 392.12 1.00
P40 llama 8B IQ3_XXS - 3.0625 bpw 32 pp512 542.80 541.97 1.00
P40 llama 8B IQ3_XXS - 3.0625 bpw 64 pp512 664.58 662.58 1.00
P40 llama 8B IQ3_XXS - 3.0625 bpw 128 pp512 784.76 782.46 1.00
P40 llama 8B IQ3_XXS - 3.0625 bpw 256 pp512 872.57 873.67 1.00
P40 llama 8B IQ3_XXS - 3.0625 bpw 512 pp512 879.88 881.94 1.00
P40 llama 8B IQ4_NL - 4.5 bpw 16 pp512 427.27 427.11 1.00
P40 llama 8B IQ4_NL - 4.5 bpw 32 pp512 618.51 618.33 1.00
P40 llama 8B IQ4_NL - 4.5 bpw 64 pp512 736.19 737.21 1.00
P40 llama 8B IQ4_NL - 4.5 bpw 128 pp512 856.05 854.79 1.00
P40 llama 8B IQ4_NL - 4.5 bpw 256 pp512 921.39 918.17 1.00
P40 llama 8B IQ4_NL - 4.5 bpw 512 pp512 937.18 932.31 0.99
P40 llama 8B IQ4_XS - 4.25 bpw 16 pp512 427.15 427.14 1.00
P40 llama 8B IQ4_XS - 4.25 bpw 32 pp512 615.07 614.91 1.00
P40 llama 8B IQ4_XS - 4.25 bpw 64 pp512 756.30 756.73 1.00
P40 llama 8B IQ4_XS - 4.25 bpw 128 pp512 867.37 868.17 1.00
P40 llama 8B IQ4_XS - 4.25 bpw 256 pp512 930.91 919.95 0.99
P40 llama 8B IQ4_XS - 4.25 bpw 512 pp512 947.82 947.12 1.00
P40 llama 8B Q2_K_M 16 pp512 348.04 347.40 1.00
P40 llama 8B Q2_K_M 32 pp512 492.89 491.43 1.00
P40 llama 8B Q2_K_M 64 pp512 649.67 648.57 1.00
P40 llama 8B Q2_K_M 128 pp512 746.26 746.02 1.00
P40 llama 8B Q2_K_M 256 pp512 828.70 828.59 1.00
P40 llama 8B Q2_K_M 512 pp512 866.55 866.67 1.00
P40 llama 8B Q3_K_S 16 pp512 372.61 372.58 1.00
P40 llama 8B Q3_K_S 32 pp512 483.94 483.26 1.00
P40 llama 8B Q3_K_S 64 pp512 658.14 656.53 1.00
P40 llama 8B Q3_K_S 128 pp512 734.63 734.80 1.00
P40 llama 8B Q3_K_S 256 pp512 804.50 803.58 1.00
P40 llama 8B Q3_K_S 512 pp512 822.89 822.98 1.00
P40 llama 8B Q4_0 16 pp512 483.30 483.17 1.00
P40 llama 8B Q4_0 32 pp512 592.22 592.28 1.00
P40 llama 8B Q4_0 64 pp512 796.99 795.20 1.00
P40 llama 8B Q4_0 128 pp512 915.66 915.19 1.00
P40 llama 8B Q4_0 256 pp512 1005.80 1006.37 1.00
P40 llama 8B Q4_0 512 pp512 1045.58 1046.46 1.00
P40 llama 8B Q4_1 16 pp512 482.29 481.77 1.00
P40 llama 8B Q4_1 32 pp512 584.02 583.57 1.00
P40 llama 8B Q4_1 64 pp512 774.98 774.04 1.00
P40 llama 8B Q4_1 128 pp512 894.63 893.83 1.00
P40 llama 8B Q4_1 256 pp512 974.86 974.81 1.00
P40 llama 8B Q4_1 512 pp512 1007.38 1011.94 1.00
P40 llama 8B Q4_K_S 16 pp512 434.56 434.49 1.00
P40 llama 8B Q4_K_S 32 pp512 544.69 544.48 1.00
P40 llama 8B Q4_K_S 64 pp512 717.03 718.58 1.00
P40 llama 8B Q4_K_S 128 pp512 828.94 830.13 1.00
P40 llama 8B Q4_K_S 256 pp512 910.96 912.37 1.00
P40 llama 8B Q4_K_S 512 pp512 937.98 937.02 1.00
P40 llama 8B Q5_0 16 pp512 389.19 389.36 1.00
P40 llama 8B Q5_0 32 pp512 570.33 571.38 1.00
P40 llama 8B Q5_0 64 pp512 719.39 721.33 1.00
P40 llama 8B Q5_0 128 pp512 824.27 824.79 1.00
P40 llama 8B Q5_0 256 pp512 916.49 915.74 1.00
P40 llama 8B Q5_0 512 pp512 951.20 949.81 1.00
P40 llama 8B Q5_1 16 pp512 414.08 414.24 1.00
P40 llama 8B Q5_1 32 pp512 584.73 584.66 1.00
P40 llama 8B Q5_1 64 pp512 726.05 726.18 1.00
P40 llama 8B Q5_1 128 pp512 827.71 826.91 1.00
P40 llama 8B Q5_1 256 pp512 911.44 908.64 1.00
P40 llama 8B Q5_1 512 pp512 942.11 942.69 1.00
P40 llama 8B Q5_K_S 16 pp512 327.47 327.59 1.00
P40 llama 8B Q5_K_S 32 pp512 477.83 477.87 1.00
P40 llama 8B Q5_K_S 64 pp512 684.68 684.65 1.00
P40 llama 8B Q5_K_S 128 pp512 783.82 783.75 1.00
P40 llama 8B Q5_K_S 256 pp512 856.36 858.52 1.00
P40 llama 8B Q5_K_S 512 pp512 880.62 879.38 1.00
P40 llama 8B Q6_K 16 pp512 374.90 374.89 1.00
P40 llama 8B Q6_K 32 pp512 498.62 498.69 1.00
P40 llama 8B Q6_K 64 pp512 673.60 673.56 1.00
P40 llama 8B Q6_K 128 pp512 749.34 748.76 1.00
P40 llama 8B Q6_K 256 pp512 784.63 779.50 0.99
P40 llama 8B Q6_K 512 pp512 791.54 793.86 1.00
P40 llama 8B Q8_0 16 pp512 379.66 379.46 1.00
P40 llama 8B Q8_0 32 pp512 595.82 595.00 1.00
P40 llama 8B Q8_0 64 pp512 705.16 704.54 1.00
P40 llama 8B Q8_0 128 pp512 831.29 832.72 1.00
P40 llama 8B Q8_0 256 pp512 926.27 927.87 1.00
P40 llama 8B Q8_0 512 pp512 965.05 966.76 1.00
RTX 3090 llama 8B IQ1_S - 1.5625 bpw 16 pp512 1487.77 1489.94 1.00
RTX 3090 llama 8B IQ1_S - 1.5625 bpw 32 pp512 2246.34 2238.87 1.00
RTX 3090 llama 8B IQ1_S - 1.5625 bpw 64 pp512 3015.80 3045.49 1.01
RTX 3090 llama 8B IQ1_S - 1.5625 bpw 128 pp512 3490.46 3472.29 0.99
RTX 3090 llama 8B IQ1_S - 1.5625 bpw 256 pp512 4010.75 3964.27 0.99
RTX 3090 llama 8B IQ1_S - 1.5625 bpw 512 pp512 4136.11 4206.91 1.02
RTX 3090 llama 8B IQ2_S - 2.5 bpw 16 pp512 1291.59 1284.44 0.99
RTX 3090 llama 8B IQ2_S - 2.5 bpw 32 pp512 1910.07 1913.45 1.00
RTX 3090 llama 8B IQ2_S - 2.5 bpw 64 pp512 2738.58 2729.25 1.00
RTX 3090 llama 8B IQ2_S - 2.5 bpw 128 pp512 3132.54 3141.41 1.00
RTX 3090 llama 8B IQ2_S - 2.5 bpw 256 pp512 3588.60 3578.59 1.00
RTX 3090 llama 8B IQ2_S - 2.5 bpw 512 pp512 3668.12 3640.84 0.99
RTX 3090 llama 8B IQ2_XS - 2.3125 bpw 16 pp512 1321.48 1317.80 1.00
RTX 3090 llama 8B IQ2_XS - 2.3125 bpw 32 pp512 1892.93 1924.58 1.02
RTX 3090 llama 8B IQ2_XS - 2.3125 bpw 64 pp512 2718.06 2714.88 1.00
RTX 3090 llama 8B IQ2_XS - 2.3125 bpw 128 pp512 3077.54 3049.26 0.99
RTX 3090 llama 8B IQ2_XS - 2.3125 bpw 256 pp512 3538.46 3492.95 0.99
RTX 3090 llama 8B IQ2_XS - 2.3125 bpw 512 pp512 3663.50 3618.02 0.99
RTX 3090 llama 8B IQ2_XXS - 2.0625 bpw 16 pp512 1463.56 1456.68 1.00
RTX 3090 llama 8B IQ2_XXS - 2.0625 bpw 32 pp512 2157.88 2161.74 1.00
RTX 3090 llama 8B IQ2_XXS - 2.0625 bpw 64 pp512 3128.55 3097.33 0.99
RTX 3090 llama 8B IQ2_XXS - 2.0625 bpw 128 pp512 3713.23 3683.03 0.99
RTX 3090 llama 8B IQ2_XXS - 2.0625 bpw 256 pp512 4044.12 4228.46 1.05
RTX 3090 llama 8B IQ2_XXS - 2.0625 bpw 512 pp512 4413.83 4430.70 1.00
RTX 3090 llama 8B IQ3_S - 3.4375 bpw 16 pp512 1190.86 1187.53 1.00
RTX 3090 llama 8B IQ3_S - 3.4375 bpw 32 pp512 1887.22 1886.18 1.00
RTX 3090 llama 8B IQ3_S - 3.4375 bpw 64 pp512 2874.13 2870.40 1.00
RTX 3090 llama 8B IQ3_S - 3.4375 bpw 128 pp512 3535.29 3522.85 1.00
RTX 3090 llama 8B IQ3_S - 3.4375 bpw 256 pp512 4029.51 4034.93 1.00
RTX 3090 llama 8B IQ3_S - 3.4375 bpw 512 pp512 4125.04 4126.43 1.00
RTX 3090 llama 8B IQ3_S mix - 3.66 bpw 16 pp512 1211.51 1208.65 1.00
RTX 3090 llama 8B IQ3_S mix - 3.66 bpw 32 pp512 1927.09 1920.89 1.00
RTX 3090 llama 8B IQ3_S mix - 3.66 bpw 64 pp512 2895.23 2892.64 1.00
RTX 3090 llama 8B IQ3_S mix - 3.66 bpw 128 pp512 3497.47 3498.22 1.00
RTX 3090 llama 8B IQ3_S mix - 3.66 bpw 256 pp512 4016.35 3900.23 0.97
RTX 3090 llama 8B IQ3_S mix - 3.66 bpw 512 pp512 4131.71 4032.90 0.98
RTX 3090 llama 8B IQ3_XS - 3.3 bpw 16 pp512 1241.68 1239.00 1.00
RTX 3090 llama 8B IQ3_XS - 3.3 bpw 32 pp512 1917.80 1921.87 1.00
RTX 3090 llama 8B IQ3_XS - 3.3 bpw 64 pp512 2929.19 2918.30 1.00
RTX 3090 llama 8B IQ3_XS - 3.3 bpw 128 pp512 3554.39 3551.17 1.00
RTX 3090 llama 8B IQ3_XS - 3.3 bpw 256 pp512 4107.08 4063.57 0.99
RTX 3090 llama 8B IQ3_XS - 3.3 bpw 512 pp512 4110.94 4219.75 1.03
RTX 3090 llama 8B IQ3_XXS - 3.0625 bpw 16 pp512 1284.91 1287.88 1.00
RTX 3090 llama 8B IQ3_XXS - 3.0625 bpw 32 pp512 1950.22 1953.16 1.00
RTX 3090 llama 8B IQ3_XXS - 3.0625 bpw 64 pp512 2944.09 2922.51 0.99
RTX 3090 llama 8B IQ3_XXS - 3.0625 bpw 128 pp512 3473.88 3485.98 1.00
RTX 3090 llama 8B IQ3_XXS - 3.0625 bpw 256 pp512 3986.95 3822.38 0.96
RTX 3090 llama 8B IQ3_XXS - 3.0625 bpw 512 pp512 4138.45 4234.13 1.02
RTX 3090 llama 8B IQ4_NL - 4.5 bpw 16 pp512 1382.41 1381.46 1.00
RTX 3090 llama 8B IQ4_NL - 4.5 bpw 32 pp512 2128.26 2125.38 1.00
RTX 3090 llama 8B IQ4_NL - 4.5 bpw 64 pp512 3190.01 3214.63 1.01
RTX 3090 llama 8B IQ4_NL - 4.5 bpw 128 pp512 3768.55 3779.25 1.00
RTX 3090 llama 8B IQ4_NL - 4.5 bpw 256 pp512 4425.16 4367.65 0.99
RTX 3090 llama 8B IQ4_NL - 4.5 bpw 512 pp512 4617.15 4561.90 0.99
RTX 3090 llama 8B IQ4_XS - 4.25 bpw 16 pp512 1535.36 1531.72 1.00
RTX 3090 llama 8B IQ4_XS - 4.25 bpw 32 pp512 2264.88 2269.60 1.00
RTX 3090 llama 8B IQ4_XS - 4.25 bpw 64 pp512 3281.64 3276.04 1.00
RTX 3090 llama 8B IQ4_XS - 4.25 bpw 128 pp512 3796.10 3801.09 1.00
RTX 3090 llama 8B IQ4_XS - 4.25 bpw 256 pp512 4484.60 4502.76 1.00
RTX 3090 llama 8B IQ4_XS - 4.25 bpw 512 pp512 4709.37 4654.62 0.99
RTX 3090 llama 8B Q2_K_M 16 pp512 1429.17 1431.26 1.00
RTX 3090 llama 8B Q2_K_M 32 pp512 1959.80 1932.74 0.99
RTX 3090 llama 8B Q2_K_M 64 pp512 2503.49 2519.67 1.01
RTX 3090 llama 8B Q2_K_M 128 pp512 2559.16 2559.69 1.00
RTX 3090 llama 8B Q2_K_M 256 pp512 3199.41 3222.55 1.01
RTX 3090 llama 8B Q2_K_M 512 pp512 3359.91 3357.03 1.00
RTX 3090 llama 8B Q3_K_S 16 pp512 1385.94 1387.36 1.00
RTX 3090 llama 8B Q3_K_S 32 pp512 1998.07 2002.61 1.00
RTX 3090 llama 8B Q3_K_S 64 pp512 2762.37 2767.40 1.00
RTX 3090 llama 8B Q3_K_S 128 pp512 3173.98 3182.21 1.00
RTX 3090 llama 8B Q3_K_S 256 pp512 3642.58 3594.13 0.99
RTX 3090 llama 8B Q3_K_S 512 pp512 3815.40 3836.09 1.01
RTX 3090 llama 8B Q4_0 16 pp512 1512.11 1525.24 1.01
RTX 3090 llama 8B Q4_0 32 pp512 2313.84 2344.84 1.01
RTX 3090 llama 8B Q4_0 64 pp512 3329.45 3381.50 1.02
RTX 3090 llama 8B Q4_0 128 pp512 3937.63 4026.48 1.02
RTX 3090 llama 8B Q4_0 256 pp512 4661.38 4676.16 1.00
RTX 3090 llama 8B Q4_0 512 pp512 4838.31 4954.72 1.02
RTX 3090 llama 8B Q4_1 16 pp512 1480.44 1486.09 1.00
RTX 3090 llama 8B Q4_1 32 pp512 2386.64 2422.71 1.02
RTX 3090 llama 8B Q4_1 64 pp512 3148.82 3189.95 1.01
RTX 3090 llama 8B Q4_1 128 pp512 3681.91 3702.01 1.01
RTX 3090 llama 8B Q4_1 256 pp512 4248.07 4301.28 1.01
RTX 3090 llama 8B Q4_1 512 pp512 4502.41 4562.06 1.01
RTX 3090 llama 8B Q4_K_S 16 pp512 1452.27 1454.00 1.00
RTX 3090 llama 8B Q4_K_S 32 pp512 2328.13 2325.92 1.00
RTX 3090 llama 8B Q4_K_S 64 pp512 3073.99 3112.74 1.01
RTX 3090 llama 8B Q4_K_S 128 pp512 3571.34 3575.50 1.00
RTX 3090 llama 8B Q4_K_S 256 pp512 4176.14 4182.02 1.00
RTX 3090 llama 8B Q4_K_S 512 pp512 4404.40 4409.01 1.00
RTX 3090 llama 8B Q5_0 16 pp512 1206.00 1210.95 1.00
RTX 3090 llama 8B Q5_0 32 pp512 1996.29 2014.22 1.01
RTX 3090 llama 8B Q5_0 64 pp512 2975.22 2992.76 1.01
RTX 3090 llama 8B Q5_0 128 pp512 3592.86 3622.93 1.01
RTX 3090 llama 8B Q5_0 256 pp512 4148.72 4197.74 1.01
RTX 3090 llama 8B Q5_0 512 pp512 4379.44 4405.93 1.01
RTX 3090 llama 8B Q5_1 16 pp512 1235.87 1236.96 1.00
RTX 3090 llama 8B Q5_1 32 pp512 2147.23 2149.32 1.00
RTX 3090 llama 8B Q5_1 64 pp512 2909.89 2914.33 1.00
RTX 3090 llama 8B Q5_1 128 pp512 3389.79 3430.01 1.01
RTX 3090 llama 8B Q5_1 256 pp512 3919.79 3932.77 1.00
RTX 3090 llama 8B Q5_1 512 pp512 4165.21 4205.26 1.01
RTX 3090 llama 8B Q5_K_S 16 pp512 1353.69 1348.93 1.00
RTX 3090 llama 8B Q5_K_S 32 pp512 2132.32 2145.63 1.01
RTX 3090 llama 8B Q5_K_S 64 pp512 2917.35 2905.91 1.00
RTX 3090 llama 8B Q5_K_S 128 pp512 3434.16 3479.17 1.01
RTX 3090 llama 8B Q5_K_S 256 pp512 4029.42 3990.02 0.99
RTX 3090 llama 8B Q5_K_S 512 pp512 4184.82 4202.72 1.00
RTX 3090 llama 8B Q6_K 16 pp512 1153.39 1155.73 1.00
RTX 3090 llama 8B Q6_K 32 pp512 1796.45 1798.04 1.00
RTX 3090 llama 8B Q6_K 64 pp512 2628.63 2623.05 1.00
RTX 3090 llama 8B Q6_K 128 pp512 3120.33 3112.02 1.00
RTX 3090 llama 8B Q6_K 256 pp512 3577.18 3541.34 0.99
RTX 3090 llama 8B Q6_K 512 pp512 3505.04 3642.99 1.04
RTX 3090 llama 8B Q8_0 16 pp512 1105.39 1107.86 1.00
RTX 3090 llama 8B Q8_0 32 pp512 2058.19 2057.78 1.00
RTX 3090 llama 8B Q8_0 64 pp512 3007.33 3025.65 1.01
RTX 3090 llama 8B Q8_0 128 pp512 3685.90 3734.91 1.01
RTX 3090 llama 8B Q8_0 256 pp512 4339.79 4349.39 1.00
RTX 3090 llama 8B Q8_0 512 pp512 4489.43 4540.95 1.01
RTX 4090 llama 8B IQ1_S - 1.5625 bpw 16 pp512 2097.46 2095.06 1.00
RTX 4090 llama 8B IQ1_S - 1.5625 bpw 32 pp512 4567.72 4564.19 1.00
RTX 4090 llama 8B IQ1_S - 1.5625 bpw 64 pp512 6704.70 6687.75 1.00
RTX 4090 llama 8B IQ1_S - 1.5625 bpw 128 pp512 8322.56 8341.12 1.00
RTX 4090 llama 8B IQ1_S - 1.5625 bpw 256 pp512 10546.92 10546.20 1.00
RTX 4090 llama 8B IQ1_S - 1.5625 bpw 512 pp512 11963.93 11948.55 1.00
RTX 4090 llama 8B IQ2_S - 2.5 bpw 16 pp512 2324.40 2318.87 1.00
RTX 4090 llama 8B IQ2_S - 2.5 bpw 32 pp512 3875.70 3866.36 1.00
RTX 4090 llama 8B IQ2_S - 2.5 bpw 64 pp512 6072.06 6044.32 1.00
RTX 4090 llama 8B IQ2_S - 2.5 bpw 128 pp512 7599.89 7557.64 0.99
RTX 4090 llama 8B IQ2_S - 2.5 bpw 256 pp512 9165.45 9201.25 1.00
RTX 4090 llama 8B IQ2_S - 2.5 bpw 512 pp512 9462.48 9455.37 1.00
RTX 4090 llama 8B IQ2_XS - 2.3125 bpw 16 pp512 2410.86 2413.05 1.00
RTX 4090 llama 8B IQ2_XS - 2.3125 bpw 32 pp512 3993.63 3984.59 1.00
RTX 4090 llama 8B IQ2_XS - 2.3125 bpw 64 pp512 6106.01 6106.08 1.00
RTX 4090 llama 8B IQ2_XS - 2.3125 bpw 128 pp512 7639.76 7639.74 1.00
RTX 4090 llama 8B IQ2_XS - 2.3125 bpw 256 pp512 9487.91 9486.33 1.00
RTX 4090 llama 8B IQ2_XS - 2.3125 bpw 512 pp512 10560.43 10534.07 1.00
RTX 4090 llama 8B IQ2_XXS - 2.0625 bpw 16 pp512 2663.15 2663.80 1.00
RTX 4090 llama 8B IQ2_XXS - 2.0625 bpw 32 pp512 4460.93 4425.81 0.99
RTX 4090 llama 8B IQ2_XXS - 2.0625 bpw 64 pp512 6925.91 6874.92 0.99
RTX 4090 llama 8B IQ2_XXS - 2.0625 bpw 128 pp512 8825.35 8807.73 1.00
RTX 4090 llama 8B IQ2_XXS - 2.0625 bpw 256 pp512 11140.23 11148.80 1.00
RTX 4090 llama 8B IQ2_XXS - 2.0625 bpw 512 pp512 12606.14 12557.59 1.00
RTX 4090 llama 8B IQ3_S - 3.4375 bpw 16 pp512 1811.99 1810.68 1.00
RTX 4090 llama 8B IQ3_S - 3.4375 bpw 32 pp512 3217.95 3216.05 1.00
RTX 4090 llama 8B IQ3_S - 3.4375 bpw 64 pp512 5780.22 5769.71 1.00
RTX 4090 llama 8B IQ3_S - 3.4375 bpw 128 pp512 7870.43 7888.44 1.00
RTX 4090 llama 8B IQ3_S - 3.4375 bpw 256 pp512 9870.78 9863.54 1.00
RTX 4090 llama 8B IQ3_S - 3.4375 bpw 512 pp512 10354.44 10331.33 1.00
RTX 4090 llama 8B IQ3_S mix - 3.66 bpw 16 pp512 1830.05 1827.11 1.00
RTX 4090 llama 8B IQ3_S mix - 3.66 bpw 32 pp512 3271.56 3263.95 1.00
RTX 4090 llama 8B IQ3_S mix - 3.66 bpw 64 pp512 5803.31 5798.24 1.00
RTX 4090 llama 8B IQ3_S mix - 3.66 bpw 128 pp512 7907.77 7892.30 1.00
RTX 4090 llama 8B IQ3_S mix - 3.66 bpw 256 pp512 9860.02 9846.38 1.00
RTX 4090 llama 8B IQ3_S mix - 3.66 bpw 512 pp512 10357.21 10355.19 1.00
RTX 4090 llama 8B IQ3_XS - 3.3 bpw 16 pp512 1941.90 1941.71 1.00
RTX 4090 llama 8B IQ3_XS - 3.3 bpw 32 pp512 3339.29 3332.24 1.00
RTX 4090 llama 8B IQ3_XS - 3.3 bpw 64 pp512 5915.97 5912.08 1.00
RTX 4090 llama 8B IQ3_XS - 3.3 bpw 128 pp512 7754.08 7756.42 1.00
RTX 4090 llama 8B IQ3_XS - 3.3 bpw 256 pp512 9874.42 9885.17 1.00
RTX 4090 llama 8B IQ3_XS - 3.3 bpw 512 pp512 10505.22 10482.14 1.00
RTX 4090 llama 8B IQ3_XXS - 3.0625 bpw 16 pp512 2018.58 2018.72 1.00
RTX 4090 llama 8B IQ3_XXS - 3.0625 bpw 32 pp512 3384.36 3390.65 1.00
RTX 4090 llama 8B IQ3_XXS - 3.0625 bpw 64 pp512 5923.49 5901.39 1.00
RTX 4090 llama 8B IQ3_XXS - 3.0625 bpw 128 pp512 7679.40 7665.88 1.00
RTX 4090 llama 8B IQ3_XXS - 3.0625 bpw 256 pp512 9811.88 9799.02 1.00
RTX 4090 llama 8B IQ3_XXS - 3.0625 bpw 512 pp512 10427.64 10441.16 1.00
RTX 4090 llama 8B IQ4_NL - 4.5 bpw 16 pp512 2053.32 2048.19 1.00
RTX 4090 llama 8B IQ4_NL - 4.5 bpw 32 pp512 3560.07 3546.11 1.00
RTX 4090 llama 8B IQ4_NL - 4.5 bpw 64 pp512 6535.71 6546.54 1.00
RTX 4090 llama 8B IQ4_NL - 4.5 bpw 128 pp512 8716.91 8675.21 1.00
RTX 4090 llama 8B IQ4_NL - 4.5 bpw 256 pp512 11059.23 11082.94 1.00
RTX 4090 llama 8B IQ4_NL - 4.5 bpw 512 pp512 12352.26 12321.54 1.00
RTX 4090 llama 8B IQ4_XS - 4.25 bpw 16 pp512 2261.95 2262.37 1.00
RTX 4090 llama 8B IQ4_XS - 4.25 bpw 32 pp512 3921.39 3909.80 1.00
RTX 4090 llama 8B IQ4_XS - 4.25 bpw 64 pp512 6748.58 6729.39 1.00
RTX 4090 llama 8B IQ4_XS - 4.25 bpw 128 pp512 8871.58 8853.26 1.00
RTX 4090 llama 8B IQ4_XS - 4.25 bpw 256 pp512 11254.49 11249.81 1.00
RTX 4090 llama 8B IQ4_XS - 4.25 bpw 512 pp512 12766.20 12717.09 1.00
RTX 4090 llama 8B Q2_K_M 16 pp512 2405.19 2408.56 1.00
RTX 4090 llama 8B Q2_K_M 32 pp512 3890.67 3875.20 1.00
RTX 4090 llama 8B Q2_K_M 64 pp512 5622.39 5644.60 1.00
RTX 4090 llama 8B Q2_K_M 128 pp512 5624.22 5604.32 1.00
RTX 4090 llama 8B Q2_K_M 256 pp512 7548.69 7555.48 1.00
RTX 4090 llama 8B Q2_K_M 512 pp512 9250.45 9250.96 1.00
RTX 4090 llama 8B Q3_K_S 16 pp512 2213.21 2210.75 1.00
RTX 4090 llama 8B Q3_K_S 32 pp512 3709.47 3701.89 1.00
RTX 4090 llama 8B Q3_K_S 64 pp512 5941.13 5931.42 1.00
RTX 4090 llama 8B Q3_K_S 128 pp512 7932.73 7938.61 1.00
RTX 4090 llama 8B Q3_K_S 256 pp512 9871.05 9898.21 1.00
RTX 4090 llama 8B Q3_K_S 512 pp512 10828.58 10840.66 1.00
RTX 4090 llama 8B Q4_0 16 pp512 2224.75 2225.79 1.00
RTX 4090 llama 8B Q4_0 32 pp512 3892.80 3884.11 1.00
RTX 4090 llama 8B Q4_0 64 pp512 6512.64 6511.06 1.00
RTX 4090 llama 8B Q4_0 128 pp512 8898.16 8877.71 1.00
RTX 4090 llama 8B Q4_0 256 pp512 11529.31 11550.96 1.00
RTX 4090 llama 8B Q4_0 512 pp512 13064.96 13083.51 1.00
RTX 4090 llama 8B Q4_1 16 pp512 2047.71 2049.33 1.00
RTX 4090 llama 8B Q4_1 32 pp512 3851.29 3845.39 1.00
RTX 4090 llama 8B Q4_1 64 pp512 6337.36 6359.94 1.00
RTX 4090 llama 8B Q4_1 128 pp512 8508.82 8525.36 1.00
RTX 4090 llama 8B Q4_1 256 pp512 10997.06 10998.35 1.00
RTX 4090 llama 8B Q4_1 512 pp512 12345.78 12321.04 1.00
RTX 4090 llama 8B Q4_K_S 16 pp512 2115.57 2112.92 1.00
RTX 4090 llama 8B Q4_K_S 32 pp512 4044.81 4029.77 1.00
RTX 4090 llama 8B Q4_K_S 64 pp512 6415.01 6409.33 1.00
RTX 4090 llama 8B Q4_K_S 128 pp512 8545.91 8518.75 1.00
RTX 4090 llama 8B Q4_K_S 256 pp512 11021.73 10999.14 1.00
RTX 4090 llama 8B Q4_K_S 512 pp512 12394.00 12382.64 1.00
RTX 4090 llama 8B Q5_0 16 pp512 1827.73 1826.57 1.00
RTX 4090 llama 8B Q5_0 32 pp512 3462.79 3463.06 1.00
RTX 4090 llama 8B Q5_0 64 pp512 5773.58 5761.29 1.00
RTX 4090 llama 8B Q5_0 128 pp512 8077.77 8079.40 1.00
RTX 4090 llama 8B Q5_0 256 pp512 10467.52 10473.10 1.00
RTX 4090 llama 8B Q5_0 512 pp512 11810.81 11769.91 1.00
RTX 4090 llama 8B Q5_1 16 pp512 1762.34 1760.56 1.00
RTX 4090 llama 8B Q5_1 32 pp512 3481.53 3484.00 1.00
RTX 4090 llama 8B Q5_1 64 pp512 5785.93 5788.12 1.00
RTX 4090 llama 8B Q5_1 128 pp512 7970.87 7991.35 1.00
RTX 4090 llama 8B Q5_1 256 pp512 10099.20 10098.28 1.00
RTX 4090 llama 8B Q5_1 512 pp512 11281.45 11259.47 1.00
RTX 4090 llama 8B Q5_K_S 16 pp512 1919.44 1919.80 1.00
RTX 4090 llama 8B Q5_K_S 32 pp512 3572.16 3573.92 1.00
RTX 4090 llama 8B Q5_K_S 64 pp512 5885.23 5901.62 1.00
RTX 4090 llama 8B Q5_K_S 128 pp512 8214.76 8214.21 1.00
RTX 4090 llama 8B Q5_K_S 256 pp512 10547.85 10546.33 1.00
RTX 4090 llama 8B Q5_K_S 512 pp512 11853.58 11828.46 1.00
RTX 4090 llama 8B Q6_K 16 pp512 1596.40 1592.97 1.00
RTX 4090 llama 8B Q6_K 32 pp512 2937.34 2923.29 1.00
RTX 4090 llama 8B Q6_K 64 pp512 5090.07 5075.76 1.00
RTX 4090 llama 8B Q6_K 128 pp512 7238.61 7223.21 1.00
RTX 4090 llama 8B Q6_K 256 pp512 9156.79 9168.36 1.00
RTX 4090 llama 8B Q6_K 512 pp512 10134.51 10109.44 1.00
RTX 4090 llama 8B Q8_0 16 pp512 1459.97 1458.40 1.00
RTX 4090 llama 8B Q8_0 32 pp512 2780.47 2778.81 1.00
RTX 4090 llama 8B Q8_0 64 pp512 4751.79 4750.85 1.00
RTX 4090 llama 8B Q8_0 128 pp512 7383.99 7375.65 1.00
RTX 4090 llama 8B Q8_0 256 pp512 10124.91 10116.89 1.00
RTX 4090 llama 8B Q8_0 512 pp512 12275.26 12243.62 1.00
RTX 5090 llama 8B IQ1_S - 1.5625 bpw 16 pp512 1740.81 1738.35 1.00
RTX 5090 llama 8B IQ1_S - 1.5625 bpw 32 pp512 3219.25 3214.64 1.00
RTX 5090 llama 8B IQ1_S - 1.5625 bpw 64 pp512 5506.05 5502.60 1.00
RTX 5090 llama 8B IQ1_S - 1.5625 bpw 128 pp512 8309.76 8301.90 1.00
RTX 5090 llama 8B IQ1_S - 1.5625 bpw 256 pp512 9951.47 9930.84 1.00
RTX 5090 llama 8B IQ1_S - 1.5625 bpw 512 pp512 11409.08 11385.36 1.00
RTX 5090 llama 8B IQ2_S - 2.5 bpw 16 pp512 2485.01 2479.27 1.00
RTX 5090 llama 8B IQ2_S - 2.5 bpw 32 pp512 4238.94 4232.81 1.00
RTX 5090 llama 8B IQ2_S - 2.5 bpw 64 pp512 6450.79 6429.85 1.00
RTX 5090 llama 8B IQ2_S - 2.5 bpw 128 pp512 8182.60 8173.13 1.00
RTX 5090 llama 8B IQ2_S - 2.5 bpw 256 pp512 9597.76 9600.42 1.00
RTX 5090 llama 8B IQ2_S - 2.5 bpw 512 pp512 10150.35 10148.84 1.00
RTX 5090 llama 8B IQ2_XS - 2.3125 bpw 16 pp512 3021.95 3022.31 1.00
RTX 5090 llama 8B IQ2_XS - 2.3125 bpw 32 pp512 4935.82 4939.71 1.00
RTX 5090 llama 8B IQ2_XS - 2.3125 bpw 64 pp512 7195.18 7186.17 1.00
RTX 5090 llama 8B IQ2_XS - 2.3125 bpw 128 pp512 8718.91 8720.22 1.00
RTX 5090 llama 8B IQ2_XS - 2.3125 bpw 256 pp512 10603.84 10593.63 1.00
RTX 5090 llama 8B IQ2_XS - 2.3125 bpw 512 pp512 12014.16 12022.04 1.00
RTX 5090 llama 8B IQ2_XXS - 2.0625 bpw 16 pp512 3358.52 3359.38 1.00
RTX 5090 llama 8B IQ2_XXS - 2.0625 bpw 32 pp512 5665.57 5659.73 1.00
RTX 5090 llama 8B IQ2_XXS - 2.0625 bpw 64 pp512 8475.90 8455.28 1.00
RTX 5090 llama 8B IQ2_XXS - 2.0625 bpw 128 pp512 10508.42 10495.50 1.00
RTX 5090 llama 8B IQ2_XXS - 2.0625 bpw 256 pp512 13164.38 13168.51 1.00
RTX 5090 llama 8B IQ2_XXS - 2.0625 bpw 512 pp512 15331.10 15309.44 1.00
RTX 5090 llama 8B IQ3_S - 3.4375 bpw 16 pp512 1073.00 1073.33 1.00
RTX 5090 llama 8B IQ3_S - 3.4375 bpw 32 pp512 2049.47 2049.08 1.00
RTX 5090 llama 8B IQ3_S - 3.4375 bpw 64 pp512 3681.09 3680.37 1.00
RTX 5090 llama 8B IQ3_S - 3.4375 bpw 128 pp512 5989.30 5990.27 1.00
RTX 5090 llama 8B IQ3_S - 3.4375 bpw 256 pp512 6890.10 6893.55 1.00
RTX 5090 llama 8B IQ3_S - 3.4375 bpw 512 pp512 7185.64 7187.38 1.00
RTX 5090 llama 8B IQ3_S mix - 3.66 bpw 16 pp512 1233.40 1233.56 1.00
RTX 5090 llama 8B IQ3_S mix - 3.66 bpw 32 pp512 2330.08 2329.53 1.00
RTX 5090 llama 8B IQ3_S mix - 3.66 bpw 64 pp512 4140.17 4136.59 1.00
RTX 5090 llama 8B IQ3_S mix - 3.66 bpw 128 pp512 6480.65 6482.93 1.00
RTX 5090 llama 8B IQ3_S mix - 3.66 bpw 256 pp512 7528.14 7527.13 1.00
RTX 5090 llama 8B IQ3_S mix - 3.66 bpw 512 pp512 7888.47 7889.19 1.00
RTX 5090 llama 8B IQ3_XS - 3.3 bpw 16 pp512 1649.53 1643.79 1.00
RTX 5090 llama 8B IQ3_XS - 3.3 bpw 32 pp512 3033.66 3025.09 1.00
RTX 5090 llama 8B IQ3_XS - 3.3 bpw 64 pp512 5150.79 5126.75 1.00
RTX 5090 llama 8B IQ3_XS - 3.3 bpw 128 pp512 7574.31 7558.52 1.00
RTX 5090 llama 8B IQ3_XS - 3.3 bpw 256 pp512 9002.49 8972.26 1.00
RTX 5090 llama 8B IQ3_XS - 3.3 bpw 512 pp512 9506.68 9485.78 1.00
RTX 5090 llama 8B IQ3_XXS - 3.0625 bpw 16 pp512 2317.81 2315.91 1.00
RTX 5090 llama 8B IQ3_XXS - 3.0625 bpw 32 pp512 4142.17 4132.44 1.00
RTX 5090 llama 8B IQ3_XXS - 3.0625 bpw 64 pp512 6520.82 6507.30 1.00
RTX 5090 llama 8B IQ3_XXS - 3.0625 bpw 128 pp512 8780.30 8778.97 1.00
RTX 5090 llama 8B IQ3_XXS - 3.0625 bpw 256 pp512 10733.64 10724.59 1.00
RTX 5090 llama 8B IQ3_XXS - 3.0625 bpw 512 pp512 11505.27 11489.21 1.00
RTX 5090 llama 8B IQ4_NL - 4.5 bpw 16 pp512 3021.83 3018.53 1.00
RTX 5090 llama 8B IQ4_NL - 4.5 bpw 32 pp512 5253.69 5254.86 1.00
RTX 5090 llama 8B IQ4_NL - 4.5 bpw 64 pp512 8121.45 8108.33 1.00
RTX 5090 llama 8B IQ4_NL - 4.5 bpw 128 pp512 10146.47 10145.40 1.00
RTX 5090 llama 8B IQ4_NL - 4.5 bpw 256 pp512 12973.58 12976.32 1.00
RTX 5090 llama 8B IQ4_NL - 4.5 bpw 512 pp512 14933.62 14925.08 1.00
RTX 5090 llama 8B IQ4_XS - 4.25 bpw 16 pp512 3099.00 3093.70 1.00
RTX 5090 llama 8B IQ4_XS - 4.25 bpw 32 pp512 5301.90 5301.10 1.00
RTX 5090 llama 8B IQ4_XS - 4.25 bpw 64 pp512 8197.78 8184.96 1.00
RTX 5090 llama 8B IQ4_XS - 4.25 bpw 128 pp512 10148.92 10156.96 1.00
RTX 5090 llama 8B IQ4_XS - 4.25 bpw 256 pp512 13184.44 13204.78 1.00
RTX 5090 llama 8B IQ4_XS - 4.25 bpw 512 pp512 15322.62 15335.88 1.00
RTX 5090 llama 8B Q2_K_M 16 pp512 3082.86 3085.67 1.00
RTX 5090 llama 8B Q2_K_M 32 pp512 4901.08 4891.69 1.00
RTX 5090 llama 8B Q2_K_M 64 pp512 6923.91 6922.45 1.00
RTX 5090 llama 8B Q2_K_M 128 pp512 7520.97 7524.55 1.00
RTX 5090 llama 8B Q2_K_M 256 pp512 9201.08 9193.96 1.00
RTX 5090 llama 8B Q2_K_M 512 pp512 10768.29 10762.03 1.00
RTX 5090 llama 8B Q3_K_S 16 pp512 2908.31 2908.43 1.00
RTX 5090 llama 8B Q3_K_S 32 pp512 4860.67 4859.55 1.00
RTX 5090 llama 8B Q3_K_S 64 pp512 7206.47 7199.70 1.00
RTX 5090 llama 8B Q3_K_S 128 pp512 9031.09 9019.57 1.00
RTX 5090 llama 8B Q3_K_S 256 pp512 11020.24 10996.29 1.00
RTX 5090 llama 8B Q3_K_S 512 pp512 12592.00 12583.36 1.00
RTX 5090 llama 8B Q4_0 16 pp512 3059.46 3058.36 1.00
RTX 5090 llama 8B Q4_0 32 pp512 5311.11 5309.97 1.00
RTX 5090 llama 8B Q4_0 64 pp512 8161.16 8144.62 1.00
RTX 5090 llama 8B Q4_0 128 pp512 10688.82 10679.26 1.00
RTX 5090 llama 8B Q4_0 256 pp512 13802.45 13798.63 1.00
RTX 5090 llama 8B Q4_0 512 pp512 16424.65 16439.99 1.00
RTX 5090 llama 8B Q4_1 16 pp512 2943.86 2943.92 1.00
RTX 5090 llama 8B Q4_1 32 pp512 5169.32 5166.92 1.00
RTX 5090 llama 8B Q4_1 64 pp512 7962.64 7951.47 1.00
RTX 5090 llama 8B Q4_1 128 pp512 10239.84 10251.68 1.00
RTX 5090 llama 8B Q4_1 256 pp512 12875.85 12922.64 1.00
RTX 5090 llama 8B Q4_1 512 pp512 15077.24 15074.40 1.00
RTX 5090 llama 8B Q4_K_S 16 pp512 3017.85 3015.04 1.00
RTX 5090 llama 8B Q4_K_S 32 pp512 5140.93 5145.18 1.00
RTX 5090 llama 8B Q4_K_S 64 pp512 7972.05 7967.96 1.00
RTX 5090 llama 8B Q4_K_S 128 pp512 10189.26 10178.65 1.00
RTX 5090 llama 8B Q4_K_S 256 pp512 12907.53 12893.04 1.00
RTX 5090 llama 8B Q4_K_S 512 pp512 15137.85 15142.52 1.00
RTX 5090 llama 8B Q5_0 16 pp512 2614.51 2612.94 1.00
RTX 5090 llama 8B Q5_0 32 pp512 4616.91 4615.24 1.00
RTX 5090 llama 8B Q5_0 64 pp512 7308.57 7294.81 1.00
RTX 5090 llama 8B Q5_0 128 pp512 9870.46 9869.91 1.00
RTX 5090 llama 8B Q5_0 256 pp512 12610.19 12598.79 1.00
RTX 5090 llama 8B Q5_0 512 pp512 14630.65 14625.72 1.00
RTX 5090 llama 8B Q5_1 16 pp512 2668.89 2668.32 1.00
RTX 5090 llama 8B Q5_1 32 pp512 4708.55 4712.36 1.00
RTX 5090 llama 8B Q5_1 64 pp512 7271.05 7263.35 1.00
RTX 5090 llama 8B Q5_1 128 pp512 9795.71 9797.57 1.00
RTX 5090 llama 8B Q5_1 256 pp512 12200.22 12200.33 1.00
RTX 5090 llama 8B Q5_1 512 pp512 14027.25 14047.68 1.00
RTX 5090 llama 8B Q5_K_S 16 pp512 2783.45 2782.10 1.00
RTX 5090 llama 8B Q5_K_S 32 pp512 4892.17 4886.75 1.00
RTX 5090 llama 8B Q5_K_S 64 pp512 7567.02 7554.40 1.00
RTX 5090 llama 8B Q5_K_S 128 pp512 9879.67 9879.48 1.00
RTX 5090 llama 8B Q5_K_S 256 pp512 12445.16 12449.03 1.00
RTX 5090 llama 8B Q5_K_S 512 pp512 14628.29 14612.81 1.00
RTX 5090 llama 8B Q6_K 16 pp512 2382.06 2382.82 1.00
RTX 5090 llama 8B Q6_K 32 pp512 4240.58 4240.32 1.00
RTX 5090 llama 8B Q6_K 64 pp512 6447.36 6435.36 1.00
RTX 5090 llama 8B Q6_K 128 pp512 8427.11 8422.71 1.00
RTX 5090 llama 8B Q6_K 256 pp512 10356.84 10344.99 1.00
RTX 5090 llama 8B Q6_K 512 pp512 11674.04 11657.86 1.00
RTX 5090 llama 8B Q8_0 16 pp512 2163.16 2162.06 1.00
RTX 5090 llama 8B Q8_0 32 pp512 3951.37 3948.60 1.00
RTX 5090 llama 8B Q8_0 64 pp512 6678.53 6668.40 1.00
RTX 5090 llama 8B Q8_0 128 pp512 9582.04 9579.52 1.00
RTX 5090 llama 8B Q8_0 256 pp512 12868.56 12854.34 1.00
RTX 5090 llama 8B Q8_0 512 pp512 15313.12 15315.24 1.00
V100-PCIE-32GB llama 8B IQ1_S - 1.5625 bpw 16 pp512 701.25 700.60 1.00
V100-PCIE-32GB llama 8B IQ1_S - 1.5625 bpw 32 pp512 1031.11 1030.11 1.00
V100-PCIE-32GB llama 8B IQ1_S - 1.5625 bpw 64 pp512 486.84 487.14 1.00
V100-PCIE-32GB llama 8B IQ1_S - 1.5625 bpw 128 pp512 910.75 910.21 1.00
V100-PCIE-32GB llama 8B IQ1_S - 1.5625 bpw 256 pp512 1644.66 1644.86 1.00
V100-PCIE-32GB llama 8B IQ1_S - 1.5625 bpw 512 pp512 2565.78 2566.72 1.00
V100-PCIE-32GB llama 8B IQ2_S - 2.5 bpw 16 pp512 629.12 629.14 1.00
V100-PCIE-32GB llama 8B IQ2_S - 2.5 bpw 32 pp512 973.36 973.04 1.00
V100-PCIE-32GB llama 8B IQ2_S - 2.5 bpw 64 pp512 477.19 478.02 1.00
V100-PCIE-32GB llama 8B IQ2_S - 2.5 bpw 128 pp512 890.67 892.11 1.00
V100-PCIE-32GB llama 8B IQ2_S - 2.5 bpw 256 pp512 1578.06 1580.72 1.00
V100-PCIE-32GB llama 8B IQ2_S - 2.5 bpw 512 pp512 2461.57 2463.23 1.00
V100-PCIE-32GB llama 8B IQ2_XS - 2.3125 bpw 16 pp512 647.20 647.40 1.00
V100-PCIE-32GB llama 8B IQ2_XS - 2.3125 bpw 32 pp512 992.68 992.33 1.00
V100-PCIE-32GB llama 8B IQ2_XS - 2.3125 bpw 64 pp512 484.89 484.88 1.00
V100-PCIE-32GB llama 8B IQ2_XS - 2.3125 bpw 128 pp512 906.33 906.31 1.00
V100-PCIE-32GB llama 8B IQ2_XS - 2.3125 bpw 256 pp512 1647.29 1647.17 1.00
V100-PCIE-32GB llama 8B IQ2_XS - 2.3125 bpw 512 pp512 2556.64 2555.80 1.00
V100-PCIE-32GB llama 8B IQ2_XXS - 2.0625 bpw 16 pp512 703.38 703.24 1.00
V100-PCIE-32GB llama 8B IQ2_XXS - 2.0625 bpw 32 pp512 1055.93 1054.97 1.00
V100-PCIE-32GB llama 8B IQ2_XXS - 2.0625 bpw 64 pp512 482.10 481.44 1.00
V100-PCIE-32GB llama 8B IQ2_XXS - 2.0625 bpw 128 pp512 901.92 900.46 1.00
V100-PCIE-32GB llama 8B IQ2_XXS - 2.0625 bpw 256 pp512 1577.17 1575.23 1.00
V100-PCIE-32GB llama 8B IQ2_XXS - 2.0625 bpw 512 pp512 2552.72 2548.11 1.00
V100-PCIE-32GB llama 8B IQ3_S - 3.4375 bpw 16 pp512 637.93 637.53 1.00
V100-PCIE-32GB llama 8B IQ3_S - 3.4375 bpw 32 pp512 967.03 966.22 1.00
V100-PCIE-32GB llama 8B IQ3_S - 3.4375 bpw 64 pp512 471.88 474.48 1.01
V100-PCIE-32GB llama 8B IQ3_S - 3.4375 bpw 128 pp512 881.97 886.60 1.01
V100-PCIE-32GB llama 8B IQ3_S - 3.4375 bpw 256 pp512 1595.28 1600.75 1.00
V100-PCIE-32GB llama 8B IQ3_S - 3.4375 bpw 512 pp512 2442.53 2439.55 1.00
V100-PCIE-32GB llama 8B IQ3_S mix - 3.66 bpw 16 pp512 646.29 645.59 1.00
V100-PCIE-32GB llama 8B IQ3_S mix - 3.66 bpw 32 pp512 968.11 966.51 1.00
V100-PCIE-32GB llama 8B IQ3_S mix - 3.66 bpw 64 pp512 486.43 486.51 1.00
V100-PCIE-32GB llama 8B IQ3_S mix - 3.66 bpw 128 pp512 906.93 907.35 1.00
V100-PCIE-32GB llama 8B IQ3_S mix - 3.66 bpw 256 pp512 1598.56 1598.00 1.00
V100-PCIE-32GB llama 8B IQ3_S mix - 3.66 bpw 512 pp512 2484.38 2478.45 1.00
V100-PCIE-32GB llama 8B IQ3_XS - 3.3 bpw 16 pp512 630.52 628.06 1.00
V100-PCIE-32GB llama 8B IQ3_XS - 3.3 bpw 32 pp512 957.57 953.98 1.00
V100-PCIE-32GB llama 8B IQ3_XS - 3.3 bpw 64 pp512 473.21 472.25 1.00
V100-PCIE-32GB llama 8B IQ3_XS - 3.3 bpw 128 pp512 883.74 883.16 1.00
V100-PCIE-32GB llama 8B IQ3_XS - 3.3 bpw 256 pp512 1565.75 1564.90 1.00
V100-PCIE-32GB llama 8B IQ3_XS - 3.3 bpw 512 pp512 2446.01 2443.35 1.00
V100-PCIE-32GB llama 8B IQ3_XXS - 3.0625 bpw 16 pp512 627.38 628.58 1.00
V100-PCIE-32GB llama 8B IQ3_XXS - 3.0625 bpw 32 pp512 960.58 960.71 1.00
V100-PCIE-32GB llama 8B IQ3_XXS - 3.0625 bpw 64 pp512 473.76 472.63 1.00
V100-PCIE-32GB llama 8B IQ3_XXS - 3.0625 bpw 128 pp512 885.50 885.55 1.00
V100-PCIE-32GB llama 8B IQ3_XXS - 3.0625 bpw 256 pp512 1602.64 1602.06 1.00
V100-PCIE-32GB llama 8B IQ3_XXS - 3.0625 bpw 512 pp512 2451.36 2447.74 1.00
V100-PCIE-32GB llama 8B IQ4_NL - 4.5 bpw 16 pp512 759.36 757.56 1.00
V100-PCIE-32GB llama 8B IQ4_NL - 4.5 bpw 32 pp512 1113.44 1111.83 1.00
V100-PCIE-32GB llama 8B IQ4_NL - 4.5 bpw 64 pp512 646.36 647.75 1.00
V100-PCIE-32GB llama 8B IQ4_NL - 4.5 bpw 128 pp512 1180.89 1182.79 1.00
V100-PCIE-32GB llama 8B IQ4_NL - 4.5 bpw 256 pp512 2086.03 2087.94 1.00
V100-PCIE-32GB llama 8B IQ4_NL - 4.5 bpw 512 pp512 2957.41 2944.20 1.00
V100-PCIE-32GB llama 8B IQ4_XS - 4.25 bpw 16 pp512 746.45 747.71 1.00
V100-PCIE-32GB llama 8B IQ4_XS - 4.25 bpw 32 pp512 1110.95 1111.41 1.00
V100-PCIE-32GB llama 8B IQ4_XS - 4.25 bpw 64 pp512 643.06 643.94 1.00
V100-PCIE-32GB llama 8B IQ4_XS - 4.25 bpw 128 pp512 1175.76 1176.64 1.00
V100-PCIE-32GB llama 8B IQ4_XS - 4.25 bpw 256 pp512 2033.61 2030.15 1.00
V100-PCIE-32GB llama 8B IQ4_XS - 4.25 bpw 512 pp512 2958.51 2787.97 0.94
V100-PCIE-32GB llama 8B Q2_K_M 16 pp512 673.85 673.94 1.00
V100-PCIE-32GB llama 8B Q2_K_M 32 pp512 924.26 924.75 1.00
V100-PCIE-32GB llama 8B Q2_K_M 64 pp512 723.07 722.86 1.00
V100-PCIE-32GB llama 8B Q2_K_M 128 pp512 1312.26 1312.54 1.00
V100-PCIE-32GB llama 8B Q2_K_M 256 pp512 2226.06 2229.34 1.00
V100-PCIE-32GB llama 8B Q2_K_M 512 pp512 3181.78 3185.12 1.00
V100-PCIE-32GB llama 8B Q3_K_S 16 pp512 689.12 689.78 1.00
V100-PCIE-32GB llama 8B Q3_K_S 32 pp512 982.77 983.16 1.00
V100-PCIE-32GB llama 8B Q3_K_S 64 pp512 689.78 690.86 1.00
V100-PCIE-32GB llama 8B Q3_K_S 128 pp512 1255.57 1255.42 1.00
V100-PCIE-32GB llama 8B Q3_K_S 256 pp512 2203.39 2202.50 1.00
V100-PCIE-32GB llama 8B Q3_K_S 512 pp512 3101.72 3088.39 1.00
V100-PCIE-32GB llama 8B Q4_0 16 pp512 829.04 827.42 1.00
V100-PCIE-32GB llama 8B Q4_0 32 pp512 1214.38 1212.98 1.00
V100-PCIE-32GB llama 8B Q4_0 64 pp512 643.79 643.87 1.00
V100-PCIE-32GB llama 8B Q4_0 128 pp512 1181.19 1180.54 1.00
V100-PCIE-32GB llama 8B Q4_0 256 pp512 2044.91 2043.59 1.00
V100-PCIE-32GB llama 8B Q4_0 512 pp512 3119.81 3122.92 1.00
V100-PCIE-32GB llama 8B Q4_1 16 pp512 818.72 819.19 1.00
V100-PCIE-32GB llama 8B Q4_1 32 pp512 1172.77 1173.75 1.00
V100-PCIE-32GB llama 8B Q4_1 64 pp512 643.41 643.89 1.00
V100-PCIE-32GB llama 8B Q4_1 128 pp512 1179.26 1179.22 1.00
V100-PCIE-32GB llama 8B Q4_1 256 pp512 2093.66 2093.66 1.00
V100-PCIE-32GB llama 8B Q4_1 512 pp512 3102.14 3103.90 1.00
V100-PCIE-32GB llama 8B Q4_K_S 16 pp512 768.42 766.16 1.00
V100-PCIE-32GB llama 8B Q4_K_S 32 pp512 1115.42 1113.92 1.00
V100-PCIE-32GB llama 8B Q4_K_S 64 pp512 636.69 636.84 1.00
V100-PCIE-32GB llama 8B Q4_K_S 128 pp512 1167.53 1167.09 1.00
V100-PCIE-32GB llama 8B Q4_K_S 256 pp512 2073.96 2072.82 1.00
V100-PCIE-32GB llama 8B Q4_K_S 512 pp512 3028.34 3020.74 1.00
V100-PCIE-32GB llama 8B Q5_0 16 pp512 668.82 669.58 1.00
V100-PCIE-32GB llama 8B Q5_0 32 pp512 1043.87 1044.85 1.00
V100-PCIE-32GB llama 8B Q5_0 64 pp512 824.32 824.35 1.00
V100-PCIE-32GB llama 8B Q5_0 128 pp512 1472.08 1472.49 1.00
V100-PCIE-32GB llama 8B Q5_0 256 pp512 2425.07 2421.77 1.00
V100-PCIE-32GB llama 8B Q5_0 512 pp512 3331.28 3326.19 1.00
V100-PCIE-32GB llama 8B Q5_1 16 pp512 718.50 719.50 1.00
V100-PCIE-32GB llama 8B Q5_1 32 pp512 1061.50 1063.43 1.00
V100-PCIE-32GB llama 8B Q5_1 64 pp512 821.89 822.75 1.00
V100-PCIE-32GB llama 8B Q5_1 128 pp512 1468.94 1464.05 1.00
V100-PCIE-32GB llama 8B Q5_1 256 pp512 2415.19 2412.41 1.00
V100-PCIE-32GB llama 8B Q5_1 512 pp512 3303.14 3294.43 1.00
V100-PCIE-32GB llama 8B Q5_K_S 16 pp512 701.70 701.04 1.00
V100-PCIE-32GB llama 8B Q5_K_S 32 pp512 1025.04 1024.09 1.00
V100-PCIE-32GB llama 8B Q5_K_S 64 pp512 743.88 743.71 1.00
V100-PCIE-32GB llama 8B Q5_K_S 128 pp512 1347.00 1346.94 1.00
V100-PCIE-32GB llama 8B Q5_K_S 256 pp512 2281.21 2277.05 1.00
V100-PCIE-32GB llama 8B Q5_K_S 512 pp512 3226.74 3214.04 1.00
V100-PCIE-32GB llama 8B Q6_K 16 pp512 658.08 657.72 1.00
V100-PCIE-32GB llama 8B Q6_K 32 pp512 974.37 974.35 1.00
V100-PCIE-32GB llama 8B Q6_K 64 pp512 746.69 746.75 1.00
V100-PCIE-32GB llama 8B Q6_K 128 pp512 1348.28 1348.58 1.00
V100-PCIE-32GB llama 8B Q6_K 256 pp512 2284.94 2293.88 1.00
V100-PCIE-32GB llama 8B Q6_K 512 pp512 3221.82 3197.51 0.99
V100-PCIE-32GB llama 8B Q8_0 16 pp512 664.50 664.65 1.00
V100-PCIE-32GB llama 8B Q8_0 32 pp512 1004.12 1005.64 1.00
V100-PCIE-32GB llama 8B Q8_0 64 pp512 904.08 903.10 1.00
V100-PCIE-32GB llama 8B Q8_0 128 pp512 1596.96 1597.09 1.00
V100-PCIE-32GB llama 8B Q8_0 256 pp512 2646.51 2648.00 1.00
V100-PCIE-32GB llama 8B Q8_0 512 pp512 3604.49 3589.75 1.00
Performance check AMD
GPU Model Microbatch size Test t/s dc71236 t/s 1f9dea4 Speedup
MI60 / MI50 llama 8B IQ1_S - 1.5625 bpw 16 pp2048 240.13 241.75 1.01
MI60 / MI50 llama 8B IQ1_S - 1.5625 bpw 32 pp2048 353.75 364.28 1.03
MI60 / MI50 llama 8B IQ1_S - 1.5625 bpw 64 pp2048 451.89 444.89 0.98
MI60 / MI50 llama 8B IQ1_S - 1.5625 bpw 128 pp2048 492.61 485.84 0.99
MI60 / MI50 llama 8B IQ1_S - 1.5625 bpw 256 pp2048 596.10 591.09 0.99
MI60 / MI50 llama 8B IQ1_S - 1.5625 bpw 512 pp2048 666.52 666.14 1.00
MI60 / MI50 llama 8B IQ1_S - 1.5625 bpw 1024 pp2048 705.15 704.21 1.00
MI60 / MI50 llama 8B IQ1_S - 1.5625 bpw 2048 pp2048 707.61 707.17 1.00
MI60 / MI50 llama 8B IQ2_S - 2.5 bpw 16 pp2048 181.74 183.88 1.01
MI60 / MI50 llama 8B IQ2_S - 2.5 bpw 32 pp2048 318.51 328.36 1.03
MI60 / MI50 llama 8B IQ2_S - 2.5 bpw 64 pp2048 429.70 421.71 0.98
MI60 / MI50 llama 8B IQ2_S - 2.5 bpw 128 pp2048 470.66 465.81 0.99
MI60 / MI50 llama 8B IQ2_S - 2.5 bpw 256 pp2048 570.32 565.69 0.99
MI60 / MI50 llama 8B IQ2_S - 2.5 bpw 512 pp2048 639.19 638.02 1.00
MI60 / MI50 llama 8B IQ2_S - 2.5 bpw 1024 pp2048 677.18 673.53 0.99
MI60 / MI50 llama 8B IQ2_S - 2.5 bpw 2048 pp2048 671.11 669.77 1.00
MI60 / MI50 llama 8B IQ2_XS - 2.3125 bpw 16 pp2048 181.38 182.60 1.01
MI60 / MI50 llama 8B IQ2_XS - 2.3125 bpw 32 pp2048 315.93 325.35 1.03
MI60 / MI50 llama 8B IQ2_XS - 2.3125 bpw 64 pp2048 405.87 398.93 0.98
MI60 / MI50 llama 8B IQ2_XS - 2.3125 bpw 128 pp2048 440.42 434.49 0.99
MI60 / MI50 llama 8B IQ2_XS - 2.3125 bpw 256 pp2048 531.83 526.95 0.99
MI60 / MI50 llama 8B IQ2_XS - 2.3125 bpw 512 pp2048 594.30 593.28 1.00
MI60 / MI50 llama 8B IQ2_XS - 2.3125 bpw 1024 pp2048 628.16 626.42 1.00
MI60 / MI50 llama 8B IQ2_XS - 2.3125 bpw 2048 pp2048 630.17 626.75 0.99
MI60 / MI50 llama 8B IQ2_XXS - 2.0625 bpw 16 pp2048 218.35 214.43 0.98
MI60 / MI50 llama 8B IQ2_XXS - 2.0625 bpw 32 pp2048 307.03 316.33 1.03
MI60 / MI50 llama 8B IQ2_XXS - 2.0625 bpw 64 pp2048 434.20 426.06 0.98
MI60 / MI50 llama 8B IQ2_XXS - 2.0625 bpw 128 pp2048 471.23 465.58 0.99
MI60 / MI50 llama 8B IQ2_XXS - 2.0625 bpw 256 pp2048 569.78 565.92 0.99
MI60 / MI50 llama 8B IQ2_XXS - 2.0625 bpw 512 pp2048 637.37 638.01 1.00
MI60 / MI50 llama 8B IQ2_XXS - 2.0625 bpw 1024 pp2048 673.78 673.94 1.00
MI60 / MI50 llama 8B IQ2_XXS - 2.0625 bpw 2048 pp2048 676.13 674.43 1.00
MI60 / MI50 llama 8B IQ3_S - 3.4375 bpw 16 pp2048 200.88 207.06 1.03
MI60 / MI50 llama 8B IQ3_S - 3.4375 bpw 32 pp2048 317.51 329.29 1.04
MI60 / MI50 llama 8B IQ3_S - 3.4375 bpw 64 pp2048 443.21 434.46 0.98
MI60 / MI50 llama 8B IQ3_S - 3.4375 bpw 128 pp2048 487.83 481.19 0.99
MI60 / MI50 llama 8B IQ3_S - 3.4375 bpw 256 pp2048 591.08 585.14 0.99
MI60 / MI50 llama 8B IQ3_S - 3.4375 bpw 512 pp2048 663.30 660.97 1.00
MI60 / MI50 llama 8B IQ3_S - 3.4375 bpw 1024 pp2048 691.75 690.14 1.00
MI60 / MI50 llama 8B IQ3_S - 3.4375 bpw 2048 pp2048 688.22 687.50 1.00
MI60 / MI50 llama 8B IQ3_S mix - 3.66 bpw 16 pp2048 202.87 212.44 1.05
MI60 / MI50 llama 8B IQ3_S mix - 3.66 bpw 32 pp2048 314.70 338.63 1.08
MI60 / MI50 llama 8B IQ3_S mix - 3.66 bpw 64 pp2048 450.70 423.97 0.94
MI60 / MI50 llama 8B IQ3_S mix - 3.66 bpw 128 pp2048 510.22 496.50 0.97
MI60 / MI50 llama 8B IQ3_S mix - 3.66 bpw 256 pp2048 617.10 596.48 0.97
MI60 / MI50 llama 8B IQ3_S mix - 3.66 bpw 512 pp2048 692.06 679.46 0.98
MI60 / MI50 llama 8B IQ3_S mix - 3.66 bpw 1024 pp2048 716.90 706.12 0.98
MI60 / MI50 llama 8B IQ3_S mix - 3.66 bpw 2048 pp2048 714.70 705.30 0.99
MI60 / MI50 llama 8B IQ3_XS - 3.3 bpw 16 pp2048 194.20 197.48 1.02
MI60 / MI50 llama 8B IQ3_XS - 3.3 bpw 32 pp2048 306.72 316.22 1.03
MI60 / MI50 llama 8B IQ3_XS - 3.3 bpw 64 pp2048 445.75 435.04 0.98
MI60 / MI50 llama 8B IQ3_XS - 3.3 bpw 128 pp2048 490.52 483.22 0.99
MI60 / MI50 llama 8B IQ3_XS - 3.3 bpw 256 pp2048 593.29 587.79 0.99
MI60 / MI50 llama 8B IQ3_XS - 3.3 bpw 512 pp2048 666.87 665.70 1.00
MI60 / MI50 llama 8B IQ3_XS - 3.3 bpw 1024 pp2048 700.89 699.37 1.00
MI60 / MI50 llama 8B IQ3_XS - 3.3 bpw 2048 pp2048 696.80 695.93 1.00
MI60 / MI50 llama 8B IQ3_XXS - 3.0625 bpw 16 pp2048 183.73 185.36 1.01
MI60 / MI50 llama 8B IQ3_XXS - 3.0625 bpw 32 pp2048 299.76 308.39 1.03
MI60 / MI50 llama 8B IQ3_XXS - 3.0625 bpw 64 pp2048 441.75 431.88 0.98
MI60 / MI50 llama 8B IQ3_XXS - 3.0625 bpw 128 pp2048 487.33 481.93 0.99
MI60 / MI50 llama 8B IQ3_XXS - 3.0625 bpw 256 pp2048 591.69 586.92 0.99
MI60 / MI50 llama 8B IQ3_XXS - 3.0625 bpw 512 pp2048 665.32 664.05 1.00
MI60 / MI50 llama 8B IQ3_XXS - 3.0625 bpw 1024 pp2048 704.35 701.11 1.00
MI60 / MI50 llama 8B IQ3_XXS - 3.0625 bpw 2048 pp2048 698.97 697.05 1.00
MI60 / MI50 llama 8B IQ4_NL - 4.5 bpw 16 pp2048 204.67 205.62 1.00
MI60 / MI50 llama 8B IQ4_NL - 4.5 bpw 32 pp2048 399.04 394.90 0.99
MI60 / MI50 llama 8B IQ4_NL - 4.5 bpw 64 pp2048 383.92 384.92 1.00
MI60 / MI50 llama 8B IQ4_NL - 4.5 bpw 128 pp2048 421.40 421.82 1.00
MI60 / MI50 llama 8B IQ4_NL - 4.5 bpw 256 pp2048 512.87 513.64 1.00
MI60 / MI50 llama 8B IQ4_NL - 4.5 bpw 512 pp2048 574.29 575.39 1.00
MI60 / MI50 llama 8B IQ4_NL - 4.5 bpw 1024 pp2048 610.88 611.67 1.00
MI60 / MI50 llama 8B IQ4_NL - 4.5 bpw 2048 pp2048 611.68 612.52 1.00
MI60 / MI50 llama 8B IQ4_XS - 4.25 bpw 16 pp2048 224.13 224.94 1.00
MI60 / MI50 llama 8B IQ4_XS - 4.25 bpw 32 pp2048 403.15 398.87 0.99
MI60 / MI50 llama 8B IQ4_XS - 4.25 bpw 64 pp2048 489.40 490.61 1.00
MI60 / MI50 llama 8B IQ4_XS - 4.25 bpw 128 pp2048 544.64 545.62 1.00
MI60 / MI50 llama 8B IQ4_XS - 4.25 bpw 256 pp2048 672.25 673.44 1.00
MI60 / MI50 llama 8B IQ4_XS - 4.25 bpw 512 pp2048 745.87 747.51 1.00
MI60 / MI50 llama 8B IQ4_XS - 4.25 bpw 1024 pp2048 785.70 788.52 1.00
MI60 / MI50 llama 8B IQ4_XS - 4.25 bpw 2048 pp2048 782.05 784.14 1.00
MI60 / MI50 llama 8B Q2_K_M 16 pp2048 172.68 172.95 1.00
MI60 / MI50 llama 8B Q2_K_M 32 pp2048 251.94 256.77 1.02
MI60 / MI50 llama 8B Q2_K_M 64 pp2048 209.02 208.29 1.00
MI60 / MI50 llama 8B Q2_K_M 128 pp2048 223.00 221.70 0.99
MI60 / MI50 llama 8B Q2_K_M 256 pp2048 251.75 250.83 1.00
MI60 / MI50 llama 8B Q2_K_M 512 pp2048 266.07 264.47 0.99
MI60 / MI50 llama 8B Q2_K_M 1024 pp2048 273.68 271.91 0.99
MI60 / MI50 llama 8B Q2_K_M 2048 pp2048 273.69 272.33 1.00
MI60 / MI50 llama 8B Q3_K_S 16 pp2048 128.42 127.45 0.99
MI60 / MI50 llama 8B Q3_K_S 32 pp2048 289.77 289.74 1.00
MI60 / MI50 llama 8B Q3_K_S 64 pp2048 384.97 384.59 1.00
MI60 / MI50 llama 8B Q3_K_S 128 pp2048 419.70 419.22 1.00
MI60 / MI50 llama 8B Q3_K_S 256 pp2048 506.90 506.37 1.00
MI60 / MI50 llama 8B Q3_K_S 512 pp2048 563.69 563.34 1.00
MI60 / MI50 llama 8B Q3_K_S 1024 pp2048 598.25 600.06 1.00
MI60 / MI50 llama 8B Q3_K_S 2048 pp2048 594.97 603.42 1.01
MI60 / MI50 llama 8B Q4_0 16 pp2048 250.92 249.39 0.99
MI60 / MI50 llama 8B Q4_0 32 pp2048 367.71 517.36 1.41
MI60 / MI50 llama 8B Q4_0 64 pp2048 626.03 485.14 0.77
MI60 / MI50 llama 8B Q4_0 128 pp2048 842.46 731.19 0.87
MI60 / MI50 llama 8B Q4_0 256 pp2048 1003.52 836.74 0.83
MI60 / MI50 llama 8B Q4_0 512 pp2048 1118.19 974.31 0.87
MI60 / MI50 llama 8B Q4_0 1024 pp2048 1173.85 1027.19 0.88
MI60 / MI50 llama 8B Q4_0 2048 pp2048 1178.32 1035.54 0.88
MI60 / MI50 llama 8B Q4_1 16 pp2048 240.68 245.17 1.02
MI60 / MI50 llama 8B Q4_1 32 pp2048 364.36 513.23 1.41
MI60 / MI50 llama 8B Q4_1 64 pp2048 608.38 485.42 0.80
MI60 / MI50 llama 8B Q4_1 128 pp2048 830.56 722.73 0.87
MI60 / MI50 llama 8B Q4_1 256 pp2048 958.42 824.81 0.86
MI60 / MI50 llama 8B Q4_1 512 pp2048 1075.28 943.30 0.88
MI60 / MI50 llama 8B Q4_1 1024 pp2048 1134.84 998.55 0.88
MI60 / MI50 llama 8B Q4_1 2048 pp2048 1146.48 1012.01 0.88
MI60 / MI50 llama 8B Q4_K_S 16 pp2048 238.32 260.88 1.09
MI60 / MI50 llama 8B Q4_K_S 32 pp2048 354.48 458.76 1.29
MI60 / MI50 llama 8B Q4_K_S 64 pp2048 534.40 423.61 0.79
MI60 / MI50 llama 8B Q4_K_S 128 pp2048 675.89 589.83 0.87
MI60 / MI50 llama 8B Q4_K_S 256 pp2048 796.30 711.61 0.89
MI60 / MI50 llama 8B Q4_K_S 512 pp2048 883.84 817.06 0.92
MI60 / MI50 llama 8B Q4_K_S 1024 pp2048 935.34 864.06 0.92
MI60 / MI50 llama 8B Q4_K_S 2048 pp2048 942.70 872.58 0.93
MI60 / MI50 llama 8B Q5_0 16 pp2048 180.59 181.87 1.01
MI60 / MI50 llama 8B Q5_0 32 pp2048 343.42 343.56 1.00
MI60 / MI50 llama 8B Q5_0 64 pp2048 360.94 362.16 1.00
MI60 / MI50 llama 8B Q5_0 128 pp2048 394.36 394.99 1.00
MI60 / MI50 llama 8B Q5_0 256 pp2048 481.63 482.53 1.00
MI60 / MI50 llama 8B Q5_0 512 pp2048 538.18 539.64 1.00
MI60 / MI50 llama 8B Q5_0 1024 pp2048 572.71 573.20 1.00
MI60 / MI50 llama 8B Q5_0 2048 pp2048 570.72 571.19 1.00
MI60 / MI50 llama 8B Q5_1 16 pp2048 167.85 167.86 1.00
MI60 / MI50 llama 8B Q5_1 32 pp2048 368.41 367.84 1.00
MI60 / MI50 llama 8B Q5_1 64 pp2048 363.29 364.02 1.00
MI60 / MI50 llama 8B Q5_1 128 pp2048 396.64 397.14 1.00
MI60 / MI50 llama 8B Q5_1 256 pp2048 487.19 487.33 1.00
MI60 / MI50 llama 8B Q5_1 512 pp2048 545.15 544.92 1.00
MI60 / MI50 llama 8B Q5_1 1024 pp2048 579.79 579.59 1.00
MI60 / MI50 llama 8B Q5_1 2048 pp2048 580.39 579.87 1.00
MI60 / MI50 llama 8B Q5_K_S 16 pp2048 137.30 134.26 0.98
MI60 / MI50 llama 8B Q5_K_S 32 pp2048 390.73 343.47 0.88
MI60 / MI50 llama 8B Q5_K_S 64 pp2048 353.58 357.81 1.01
MI60 / MI50 llama 8B Q5_K_S 128 pp2048 387.84 391.56 1.01
MI60 / MI50 llama 8B Q5_K_S 256 pp2048 474.80 479.79 1.01
MI60 / MI50 llama 8B Q5_K_S 512 pp2048 530.88 537.23 1.01
MI60 / MI50 llama 8B Q5_K_S 1024 pp2048 565.71 572.57 1.01
MI60 / MI50 llama 8B Q5_K_S 2048 pp2048 568.23 576.14 1.01
MI60 / MI50 llama 8B Q6_K 16 pp2048 119.85 140.17 1.17
MI60 / MI50 llama 8B Q6_K 32 pp2048 268.31 289.39 1.08
MI60 / MI50 llama 8B Q6_K 64 pp2048 385.00 365.85 0.95
MI60 / MI50 llama 8B Q6_K 128 pp2048 413.39 397.65 0.96
MI60 / MI50 llama 8B Q6_K 256 pp2048 500.09 488.61 0.98
MI60 / MI50 llama 8B Q6_K 512 pp2048 549.90 546.57 0.99
MI60 / MI50 llama 8B Q6_K 1024 pp2048 579.96 582.62 1.00
MI60 / MI50 llama 8B Q6_K 2048 pp2048 576.78 583.55 1.01
MI60 / MI50 llama 8B Q8_0 16 pp2048 116.30 116.94 1.01
MI60 / MI50 llama 8B Q8_0 32 pp2048 388.34 388.39 1.00
MI60 / MI50 llama 8B Q8_0 64 pp2048 334.73 336.32 1.00
MI60 / MI50 llama 8B Q8_0 128 pp2048 362.94 364.16 1.00
MI60 / MI50 llama 8B Q8_0 256 pp2048 447.50 449.43 1.00
MI60 / MI50 llama 8B Q8_0 512 pp2048 500.67 502.65 1.00
MI60 / MI50 llama 8B Q8_0 1024 pp2048 533.15 535.41 1.00
MI60 / MI50 llama 8B Q8_0 2048 pp2048 534.13 536.25 1.00
MI100 llama 8B IQ1_S - 1.5625 bpw 16 pp512 891.55 893.26 1.00
MI100 llama 8B IQ1_S - 1.5625 bpw 32 pp512 1416.17 1429.59 1.01
MI100 llama 8B IQ1_S - 1.5625 bpw 64 pp512 1786.94 1800.36 1.01
MI100 llama 8B IQ1_S - 1.5625 bpw 128 pp512 2127.93 2137.47 1.00
MI100 llama 8B IQ1_S - 1.5625 bpw 256 pp512 2502.35 2512.51 1.00
MI100 llama 8B IQ1_S - 1.5625 bpw 512 pp512 3318.71 3324.97 1.00
MI100 llama 8B IQ2_S - 2.5 bpw 16 pp512 655.27 651.83 0.99
MI100 llama 8B IQ2_S - 2.5 bpw 32 pp512 1031.09 1030.40 1.00
MI100 llama 8B IQ2_S - 2.5 bpw 64 pp512 1456.70 1463.92 1.00
MI100 llama 8B IQ2_S - 2.5 bpw 128 pp512 1678.09 1687.26 1.01
MI100 llama 8B IQ2_S - 2.5 bpw 256 pp512 2435.66 2442.15 1.00
MI100 llama 8B IQ2_S - 2.5 bpw 512 pp512 3181.28 3191.66 1.00
MI100 llama 8B IQ2_XS - 2.3125 bpw 16 pp512 669.71 667.98 1.00
MI100 llama 8B IQ2_XS - 2.3125 bpw 32 pp512 1041.81 1041.42 1.00
MI100 llama 8B IQ2_XS - 2.3125 bpw 64 pp512 1459.57 1467.52 1.01
MI100 llama 8B IQ2_XS - 2.3125 bpw 128 pp512 1673.89 1681.26 1.00
MI100 llama 8B IQ2_XS - 2.3125 bpw 256 pp512 2470.64 2472.91 1.00
MI100 llama 8B IQ2_XS - 2.3125 bpw 512 pp512 3280.19 3288.49 1.00
MI100 llama 8B IQ2_XXS - 2.0625 bpw 16 pp512 756.19 760.35 1.01
MI100 llama 8B IQ2_XXS - 2.0625 bpw 32 pp512 1222.23 1217.65 1.00
MI100 llama 8B IQ2_XXS - 2.0625 bpw 64 pp512 1765.71 1771.11 1.00
MI100 llama 8B IQ2_XXS - 2.0625 bpw 128 pp512 2117.08 2117.64 1.00
MI100 llama 8B IQ2_XXS - 2.0625 bpw 256 pp512 2482.51 2482.95 1.00
MI100 llama 8B IQ2_XXS - 2.0625 bpw 512 pp512 3295.42 3303.03 1.00
MI100 llama 8B IQ3_S - 3.4375 bpw 16 pp512 693.05 690.33 1.00
MI100 llama 8B IQ3_S - 3.4375 bpw 32 pp512 1132.56 1131.70 1.00
MI100 llama 8B IQ3_S - 3.4375 bpw 64 pp512 1614.88 1620.94 1.00
MI100 llama 8B IQ3_S - 3.4375 bpw 128 pp512 1886.62 1887.79 1.00
MI100 llama 8B IQ3_S - 3.4375 bpw 256 pp512 2427.83 2427.53 1.00
MI100 llama 8B IQ3_S - 3.4375 bpw 512 pp512 3172.21 3174.82 1.00
MI100 llama 8B IQ3_S mix - 3.66 bpw 16 pp512 705.34 710.72 1.01
MI100 llama 8B IQ3_S mix - 3.66 bpw 32 pp512 1149.71 1150.73 1.00
MI100 llama 8B IQ3_S mix - 3.66 bpw 64 pp512 1616.27 1626.07 1.01
MI100 llama 8B IQ3_S mix - 3.66 bpw 128 pp512 1903.68 1904.76 1.00
MI100 llama 8B IQ3_S mix - 3.66 bpw 256 pp512 2435.43 2434.95 1.00
MI100 llama 8B IQ3_S mix - 3.66 bpw 512 pp512 3171.68 3170.21 1.00
MI100 llama 8B IQ3_XS - 3.3 bpw 16 pp512 679.67 676.39 1.00
MI100 llama 8B IQ3_XS - 3.3 bpw 32 pp512 1119.11 1100.70 0.98
MI100 llama 8B IQ3_XS - 3.3 bpw 64 pp512 1571.14 1573.04 1.00
MI100 llama 8B IQ3_XS - 3.3 bpw 128 pp512 1869.37 1872.98 1.00
MI100 llama 8B IQ3_XS - 3.3 bpw 256 pp512 2430.91 2425.70 1.00
MI100 llama 8B IQ3_XS - 3.3 bpw 512 pp512 3174.97 3176.70 1.00
MI100 llama 8B IQ3_XXS - 3.0625 bpw 16 pp512 655.22 654.21 1.00
MI100 llama 8B IQ3_XXS - 3.0625 bpw 32 pp512 1073.80 1067.66 0.99
MI100 llama 8B IQ3_XXS - 3.0625 bpw 64 pp512 1558.10 1559.53 1.00
MI100 llama 8B IQ3_XXS - 3.0625 bpw 128 pp512 1855.18 1855.30 1.00
MI100 llama 8B IQ3_XXS - 3.0625 bpw 256 pp512 2431.57 2438.44 1.00
MI100 llama 8B IQ3_XXS - 3.0625 bpw 512 pp512 3179.99 3169.62 1.00
MI100 llama 8B IQ4_NL - 4.5 bpw 16 pp512 792.79 953.98 1.20
MI100 llama 8B IQ4_NL - 4.5 bpw 32 pp512 1446.73 1485.25 1.03
MI100 llama 8B IQ4_NL - 4.5 bpw 64 pp512 1951.36 1920.26 0.98
MI100 llama 8B IQ4_NL - 4.5 bpw 128 pp512 2351.54 2337.79 0.99
MI100 llama 8B IQ4_NL - 4.5 bpw 256 pp512 2429.08 2428.46 1.00
MI100 llama 8B IQ4_NL - 4.5 bpw 512 pp512 3224.39 3233.19 1.00
MI100 llama 8B IQ4_XS - 4.25 bpw 16 pp512 981.39 782.42 0.80
MI100 llama 8B IQ4_XS - 4.25 bpw 32 pp512 1509.30 1526.17 1.01
MI100 llama 8B IQ4_XS - 4.25 bpw 64 pp512 1970.68 1971.84 1.00
MI100 llama 8B IQ4_XS - 4.25 bpw 128 pp512 2404.49 2426.70 1.01
MI100 llama 8B IQ4_XS - 4.25 bpw 256 pp512 2437.24 2433.75 1.00
MI100 llama 8B IQ4_XS - 4.25 bpw 512 pp512 3245.71 3239.42 1.00
MI100 llama 8B Q2_K_M 16 pp512 807.67 805.31 1.00
MI100 llama 8B Q2_K_M 32 pp512 1090.91 1073.12 0.98
MI100 llama 8B Q2_K_M 64 pp512 1378.83 1395.12 1.01
MI100 llama 8B Q2_K_M 128 pp512 1643.78 1645.59 1.00
MI100 llama 8B Q2_K_M 256 pp512 2452.30 2454.53 1.00
MI100 llama 8B Q2_K_M 512 pp512 3285.95 3285.89 1.00
MI100 llama 8B Q3_K_S 16 pp512 871.93 871.93 1.00
MI100 llama 8B Q3_K_S 32 pp512 1217.85 1201.87 0.99
MI100 llama 8B Q3_K_S 64 pp512 1577.94 1581.81 1.00
MI100 llama 8B Q3_K_S 128 pp512 1831.10 1832.55 1.00
MI100 llama 8B Q3_K_S 256 pp512 2373.88 2377.47 1.00
MI100 llama 8B Q3_K_S 512 pp512 3222.90 3223.57 1.00
MI100 llama 8B Q4_0 16 pp512 928.87 839.12 0.90
MI100 llama 8B Q4_0 32 pp512 1477.54 1496.31 1.01
MI100 llama 8B Q4_0 64 pp512 1998.07 1870.92 0.94
MI100 llama 8B Q4_0 128 pp512 2436.55 2277.36 0.93
MI100 llama 8B Q4_0 256 pp512 2662.61 2485.19 0.93
MI100 llama 8B Q4_0 512 pp512 2821.40 2590.33 0.92
MI100 llama 8B Q4_1 16 pp512 942.67 1013.36 1.07
MI100 llama 8B Q4_1 32 pp512 1350.22 1475.49 1.09
MI100 llama 8B Q4_1 64 pp512 1761.05 1770.18 1.01
MI100 llama 8B Q4_1 128 pp512 2097.22 2098.89 1.00
MI100 llama 8B Q4_1 256 pp512 2272.45 2276.74 1.00
MI100 llama 8B Q4_1 512 pp512 2367.63 2369.06 1.00
MI100 llama 8B Q4_K_S 16 pp512 949.24 976.07 1.03
MI100 llama 8B Q4_K_S 32 pp512 1465.01 1475.12 1.01
MI100 llama 8B Q4_K_S 64 pp512 1795.33 1789.71 1.00
MI100 llama 8B Q4_K_S 128 pp512 2130.05 2130.97 1.00
MI100 llama 8B Q4_K_S 256 pp512 2318.43 2320.70 1.00
MI100 llama 8B Q4_K_S 512 pp512 3305.03 3312.22 1.00
MI100 llama 8B Q5_0 16 pp512 724.13 719.13 0.99
MI100 llama 8B Q5_0 32 pp512 1228.74 1277.12 1.04
MI100 llama 8B Q5_0 64 pp512 1810.20 1723.53 0.95
MI100 llama 8B Q5_0 128 pp512 2153.22 2018.13 0.94
MI100 llama 8B Q5_0 256 pp512 2338.52 2169.13 0.93
MI100 llama 8B Q5_0 512 pp512 2450.61 2247.02 0.92
MI100 llama 8B Q5_1 16 pp512 825.16 816.09 0.99
MI100 llama 8B Q5_1 32 pp512 1201.92 1382.15 1.15
MI100 llama 8B Q5_1 64 pp512 1837.44 1659.35 0.90
MI100 llama 8B Q5_1 128 pp512 2191.38 1963.14 0.90
MI100 llama 8B Q5_1 256 pp512 2384.31 2104.99 0.88
MI100 llama 8B Q5_1 512 pp512 2515.61 2174.19 0.86
MI100 llama 8B Q5_K_S 16 pp512 769.94 933.16 1.21
MI100 llama 8B Q5_K_S 32 pp512 1400.11 1395.46 1.00
MI100 llama 8B Q5_K_S 64 pp512 1805.20 1712.09 0.95
MI100 llama 8B Q5_K_S 128 pp512 2153.21 2009.64 0.93
MI100 llama 8B Q5_K_S 256 pp512 2339.78 2179.16 0.93
MI100 llama 8B Q5_K_S 512 pp512 3294.02 3295.47 1.00
MI100 llama 8B Q6_K 16 pp512 712.59 726.13 1.02
MI100 llama 8B Q6_K 32 pp512 949.04 1096.42 1.16
MI100 llama 8B Q6_K 64 pp512 1394.64 1302.66 0.93
MI100 llama 8B Q6_K 128 pp512 1587.05 1480.97 0.93
MI100 llama 8B Q6_K 256 pp512 2470.00 2454.29 0.99
MI100 llama 8B Q6_K 512 pp512 3259.50 3268.01 1.00
MI100 llama 8B Q8_0 16 pp512 809.23 820.88 1.01
MI100 llama 8B Q8_0 32 pp512 1297.90 1316.86 1.01
MI100 llama 8B Q8_0 64 pp512 1887.80 1861.64 0.99
MI100 llama 8B Q8_0 128 pp512 2248.23 2234.97 0.99
MI100 llama 8B Q8_0 256 pp512 2923.85 2917.92 1.00
MI100 llama 8B Q8_0 512 pp512 3697.45 3700.54 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 16 pp512 432.31 430.36 1.00
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 32 pp512 470.66 480.38 1.02
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 64 pp512 314.86 322.91 1.03
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 128 pp512 876.54 897.38 1.02
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 256 pp512 911.60 928.12 1.02
Radeon 8060S Graphics llama 8B IQ1_S - 1.5625 bpw 512 pp512 924.22 943.54 1.02
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 16 pp512 304.80 310.16 1.02
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 32 pp512 426.88 432.85 1.01
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 64 pp512 299.46 300.78 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 128 pp512 740.43 737.78 1.00
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 256 pp512 776.74 768.30 0.99
Radeon 8060S Graphics llama 8B IQ2_S - 2.5 bpw 512 pp512 759.13 759.59 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 16 pp512 300.72 305.25 1.02
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 32 pp512 429.32 423.01 0.99
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 64 pp512 285.97 286.11 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 128 pp512 830.96 835.75 1.01
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 256 pp512 866.48 868.86 1.00
Radeon 8060S Graphics llama 8B IQ2_XS - 2.3125 bpw 512 pp512 889.02 890.55 1.00
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 16 pp512 359.15 366.96 1.02
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 32 pp512 400.90 414.02 1.03
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 64 pp512 683.35 673.79 0.99
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 128 pp512 417.96 403.32 0.96
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 256 pp512 439.63 420.44 0.96
Radeon 8060S Graphics llama 8B IQ2_XXS - 2.0625 bpw 512 pp512 447.70 426.06 0.95
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 16 pp512 323.56 340.93 1.05
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 32 pp512 475.14 522.01 1.10
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 64 pp512 624.27 627.34 1.00
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 128 pp512 394.69 381.00 0.97
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 256 pp512 414.53 396.54 0.96
Radeon 8060S Graphics llama 8B IQ3_S - 3.4375 bpw 512 pp512 416.46 401.36 0.96
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 16 pp512 332.98 348.89 1.05
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 32 pp512 480.05 525.18 1.09
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 64 pp512 627.02 625.33 1.00
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 128 pp512 424.53 409.47 0.96
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 256 pp512 443.84 423.99 0.96
Radeon 8060S Graphics llama 8B IQ3_S mix - 3.66 bpw 512 pp512 440.88 428.37 0.97
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 16 pp512 330.19 345.56 1.05
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 32 pp512 500.75 517.60 1.03
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 64 pp512 642.12 639.21 1.00
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 128 pp512 396.36 384.74 0.97
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 256 pp512 412.82 399.67 0.97
Radeon 8060S Graphics llama 8B IQ3_XS - 3.3 bpw 512 pp512 419.79 407.70 0.97
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 16 pp512 335.95 340.26 1.01
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 32 pp512 499.09 495.67 0.99
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 64 pp512 549.46 547.57 1.00
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 128 pp512 425.72 418.25 0.98
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 256 pp512 446.96 431.72 0.97
Radeon 8060S Graphics llama 8B IQ3_XXS - 3.0625 bpw 512 pp512 443.89 431.60 0.97
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 16 pp512 446.37 442.86 0.99
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 32 pp512 506.52 496.40 0.98
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 64 pp512 576.88 572.26 0.99
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 128 pp512 377.01 371.50 0.99
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 256 pp512 395.44 390.60 0.99
Radeon 8060S Graphics llama 8B IQ4_NL - 4.5 bpw 512 pp512 399.86 395.70 0.99
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 16 pp512 470.28 468.75 1.00
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 32 pp512 519.99 496.87 0.96
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 64 pp512 570.64 565.09 0.99
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 128 pp512 382.90 374.08 0.98
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 256 pp512 401.18 390.66 0.97
Radeon 8060S Graphics llama 8B IQ4_XS - 4.25 bpw 512 pp512 404.49 398.27 0.98
Radeon 8060S Graphics llama 8B Q2_K_M 16 pp512 248.87 247.97 1.00
Radeon 8060S Graphics llama 8B Q2_K_M 32 pp512 392.15 385.68 0.98
Radeon 8060S Graphics llama 8B Q2_K_M 64 pp512 383.70 381.65 0.99
Radeon 8060S Graphics llama 8B Q2_K_M 128 pp512 665.22 660.81 0.99
Radeon 8060S Graphics llama 8B Q2_K_M 256 pp512 725.93 716.75 0.99
Radeon 8060S Graphics llama 8B Q2_K_M 512 pp512 899.37 889.81 0.99
Radeon 8060S Graphics llama 8B Q3_K_S 16 pp512 299.43 296.95 0.99
Radeon 8060S Graphics llama 8B Q3_K_S 32 pp512 411.56 408.87 0.99
Radeon 8060S Graphics llama 8B Q3_K_S 64 pp512 279.25 277.83 0.99
Radeon 8060S Graphics llama 8B Q3_K_S 128 pp512 975.33 968.08 0.99
Radeon 8060S Graphics llama 8B Q3_K_S 256 pp512 1035.12 1019.13 0.98
Radeon 8060S Graphics llama 8B Q3_K_S 512 pp512 1041.66 1025.58 0.98
Radeon 8060S Graphics llama 8B Q4_0 16 pp512 418.25 439.89 1.05
Radeon 8060S Graphics llama 8B Q4_0 32 pp512 404.04 501.56 1.24
Radeon 8060S Graphics llama 8B Q4_0 64 pp512 568.11 598.39 1.05
Radeon 8060S Graphics llama 8B Q4_0 128 pp512 379.51 1102.61 2.91
Radeon 8060S Graphics llama 8B Q4_0 256 pp512 399.41 1212.48 3.04
Radeon 8060S Graphics llama 8B Q4_0 512 pp512 410.18 1163.22 2.84
Radeon 8060S Graphics llama 8B Q4_1 16 pp512 416.40 417.05 1.00
Radeon 8060S Graphics llama 8B Q4_1 32 pp512 398.36 452.02 1.13
Radeon 8060S Graphics llama 8B Q4_1 64 pp512 255.63 280.18 1.10
Radeon 8060S Graphics llama 8B Q4_1 128 pp512 961.16 993.61 1.03
Radeon 8060S Graphics llama 8B Q4_1 256 pp512 1025.89 1069.44 1.04
Radeon 8060S Graphics llama 8B Q4_1 512 pp512 1041.25 1062.83 1.02
Radeon 8060S Graphics llama 8B Q4_K_S 16 pp512 415.99 417.90 1.00
Radeon 8060S Graphics llama 8B Q4_K_S 32 pp512 520.40 509.30 0.98
Radeon 8060S Graphics llama 8B Q4_K_S 64 pp512 646.58 637.99 0.99
Radeon 8060S Graphics llama 8B Q4_K_S 128 pp512 1011.97 1037.80 1.03
Radeon 8060S Graphics llama 8B Q4_K_S 256 pp512 1071.14 1100.65 1.03
Radeon 8060S Graphics llama 8B Q4_K_S 512 pp512 1054.20 1080.98 1.03
Radeon 8060S Graphics llama 8B Q5_0 16 pp512 361.95 379.12 1.05
Radeon 8060S Graphics llama 8B Q5_0 32 pp512 324.76 377.67 1.16
Radeon 8060S Graphics llama 8B Q5_0 64 pp512 534.08 558.42 1.05
Radeon 8060S Graphics llama 8B Q5_0 128 pp512 335.21 348.67 1.04
Radeon 8060S Graphics llama 8B Q5_0 256 pp512 358.94 369.74 1.03
Radeon 8060S Graphics llama 8B Q5_0 512 pp512 365.58 381.78 1.04
Radeon 8060S Graphics llama 8B Q5_1 16 pp512 297.33 301.59 1.01
Radeon 8060S Graphics llama 8B Q5_1 32 pp512 262.34 306.58 1.17
Radeon 8060S Graphics llama 8B Q5_1 64 pp512 210.25 220.52 1.05
Radeon 8060S Graphics llama 8B Q5_1 128 pp512 893.60 946.55 1.06
Radeon 8060S Graphics llama 8B Q5_1 256 pp512 970.09 1042.96 1.08
Radeon 8060S Graphics llama 8B Q5_1 512 pp512 991.34 1047.67 1.06
Radeon 8060S Graphics llama 8B Q5_K_S 16 pp512 418.53 408.75 0.98
Radeon 8060S Graphics llama 8B Q5_K_S 32 pp512 365.39 315.49 0.86
Radeon 8060S Graphics llama 8B Q5_K_S 64 pp512 238.18 215.89 0.91
Radeon 8060S Graphics llama 8B Q5_K_S 128 pp512 950.24 1000.42 1.05
Radeon 8060S Graphics llama 8B Q5_K_S 256 pp512 1017.95 1069.37 1.05
Radeon 8060S Graphics llama 8B Q5_K_S 512 pp512 1024.76 1064.55 1.04
Radeon 8060S Graphics llama 8B Q6_K 16 pp512 325.51 327.07 1.00
Radeon 8060S Graphics llama 8B Q6_K 32 pp512 129.86 127.35 0.98
Radeon 8060S Graphics llama 8B Q6_K 64 pp512 620.14 617.78 1.00
Radeon 8060S Graphics llama 8B Q6_K 128 pp512 720.20 722.26 1.00
Radeon 8060S Graphics llama 8B Q6_K 256 pp512 763.90 765.13 1.00
Radeon 8060S Graphics llama 8B Q6_K 512 pp512 799.49 795.90 1.00
Radeon 8060S Graphics llama 8B Q8_0 16 pp512 319.64 316.61 0.99
Radeon 8060S Graphics llama 8B Q8_0 32 pp512 430.36 404.97 0.94
Radeon 8060S Graphics llama 8B Q8_0 64 pp512 559.60 556.20 0.99
Radeon 8060S Graphics llama 8B Q8_0 128 pp512 350.51 343.27 0.98
Radeon 8060S Graphics llama 8B Q8_0 256 pp512 367.78 357.63 0.97
Radeon 8060S Graphics llama 8B Q8_0 512 pp512 377.56 364.03 0.96
RX 6800 llama 8B IQ1_S - 1.5625 bpw 16 pp512 247.24 245.03 0.99
RX 6800 llama 8B IQ1_S - 1.5625 bpw 32 pp512 391.70 393.63 1.00
RX 6800 llama 8B IQ1_S - 1.5625 bpw 64 pp512 527.11 531.94 1.01
RX 6800 llama 8B IQ1_S - 1.5625 bpw 128 pp512 675.99 679.68 1.01
RX 6800 llama 8B IQ1_S - 1.5625 bpw 256 pp512 805.68 805.54 1.00
RX 6800 llama 8B IQ1_S - 1.5625 bpw 512 pp512 867.29 867.89 1.00
RX 6800 llama 8B IQ2_S - 2.5 bpw 16 pp512 171.12 168.27 0.98
RX 6800 llama 8B IQ2_S - 2.5 bpw 32 pp512 324.09 324.93 1.00
RX 6800 llama 8B IQ2_S - 2.5 bpw 64 pp512 463.71 467.85 1.01
RX 6800 llama 8B IQ2_S - 2.5 bpw 128 pp512 593.38 596.61 1.01
RX 6800 llama 8B IQ2_S - 2.5 bpw 256 pp512 695.71 697.29 1.00
RX 6800 llama 8B IQ2_S - 2.5 bpw 512 pp512 742.69 745.22 1.00
RX 6800 llama 8B IQ2_XS - 2.3125 bpw 16 pp512 168.81 166.10 0.98
RX 6800 llama 8B IQ2_XS - 2.3125 bpw 32 pp512 348.28 348.47 1.00
RX 6800 llama 8B IQ2_XS - 2.3125 bpw 64 pp512 450.98 454.73 1.01
RX 6800 llama 8B IQ2_XS - 2.3125 bpw 128 pp512 574.58 577.14 1.00
RX 6800 llama 8B IQ2_XS - 2.3125 bpw 256 pp512 678.23 678.75 1.00
RX 6800 llama 8B IQ2_XS - 2.3125 bpw 512 pp512 729.22 730.98 1.00
RX 6800 llama 8B IQ2_XXS - 2.0625 bpw 16 pp512 182.01 227.23 1.25
RX 6800 llama 8B IQ2_XXS - 2.0625 bpw 32 pp512 217.63 222.28 1.02
RX 6800 llama 8B IQ2_XXS - 2.0625 bpw 64 pp512 489.77 493.71 1.01
RX 6800 llama 8B IQ2_XXS - 2.0625 bpw 128 pp512 627.38 630.54 1.01
RX 6800 llama 8B IQ2_XXS - 2.0625 bpw 256 pp512 745.55 744.93 1.00
RX 6800 llama 8B IQ2_XXS - 2.0625 bpw 512 pp512 802.22 801.79 1.00
RX 6800 llama 8B IQ3_S - 3.4375 bpw 16 pp512 215.99 212.34 0.98
RX 6800 llama 8B IQ3_S - 3.4375 bpw 32 pp512 215.00 220.16 1.02
RX 6800 llama 8B IQ3_S - 3.4375 bpw 64 pp512 496.43 501.06 1.01
RX 6800 llama 8B IQ3_S - 3.4375 bpw 128 pp512 635.14 638.91 1.01
RX 6800 llama 8B IQ3_S - 3.4375 bpw 256 pp512 750.84 751.93 1.00
RX 6800 llama 8B IQ3_S - 3.4375 bpw 512 pp512 803.56 805.07 1.00
RX 6800 llama 8B IQ3_S mix - 3.66 bpw 16 pp512 220.86 217.66 0.99
RX 6800 llama 8B IQ3_S mix - 3.66 bpw 32 pp512 226.83 232.15 1.02
RX 6800 llama 8B IQ3_S mix - 3.66 bpw 64 pp512 478.76 486.45 1.02
RX 6800 llama 8B IQ3_S mix - 3.66 bpw 128 pp512 605.36 617.42 1.02
RX 6800 llama 8B IQ3_S mix - 3.66 bpw 256 pp512 721.92 731.27 1.01
RX 6800 llama 8B IQ3_S mix - 3.66 bpw 512 pp512 776.84 786.00 1.01
RX 6800 llama 8B IQ3_XS - 3.3 bpw 16 pp512 217.53 214.03 0.98
RX 6800 llama 8B IQ3_XS - 3.3 bpw 32 pp512 207.79 210.51 1.01
RX 6800 llama 8B IQ3_XS - 3.3 bpw 64 pp512 494.27 498.76 1.01
RX 6800 llama 8B IQ3_XS - 3.3 bpw 128 pp512 634.11 637.78 1.01
RX 6800 llama 8B IQ3_XS - 3.3 bpw 256 pp512 751.57 752.87 1.00
RX 6800 llama 8B IQ3_XS - 3.3 bpw 512 pp512 804.01 805.96 1.00
RX 6800 llama 8B IQ3_XXS - 3.0625 bpw 16 pp512 209.00 206.31 0.99
RX 6800 llama 8B IQ3_XXS - 3.0625 bpw 32 pp512 220.45 220.99 1.00
RX 6800 llama 8B IQ3_XXS - 3.0625 bpw 64 pp512 489.67 495.23 1.01
RX 6800 llama 8B IQ3_XXS - 3.0625 bpw 128 pp512 630.40 634.26 1.01
RX 6800 llama 8B IQ3_XXS - 3.0625 bpw 256 pp512 745.87 747.20 1.00
RX 6800 llama 8B IQ3_XXS - 3.0625 bpw 512 pp512 799.20 801.65 1.00
RX 6800 llama 8B IQ4_NL - 4.5 bpw 16 pp512 236.70 226.78 0.96
RX 6800 llama 8B IQ4_NL - 4.5 bpw 32 pp512 181.85 182.17 1.00
RX 6800 llama 8B IQ4_NL - 4.5 bpw 64 pp512 561.31 561.38 1.00
RX 6800 llama 8B IQ4_NL - 4.5 bpw 128 pp512 725.16 723.88 1.00
RX 6800 llama 8B IQ4_NL - 4.5 bpw 256 pp512 861.48 860.23 1.00
RX 6800 llama 8B IQ4_NL - 4.5 bpw 512 pp512 924.36 923.69 1.00
RX 6800 llama 8B IQ4_XS - 4.25 bpw 16 pp512 224.36 215.04 0.96
RX 6800 llama 8B IQ4_XS - 4.25 bpw 32 pp512 182.65 182.63 1.00
RX 6800 llama 8B IQ4_XS - 4.25 bpw 64 pp512 562.20 561.19 1.00
RX 6800 llama 8B IQ4_XS - 4.25 bpw 128 pp512 726.28 726.54 1.00
RX 6800 llama 8B IQ4_XS - 4.25 bpw 256 pp512 862.71 861.55 1.00
RX 6800 llama 8B IQ4_XS - 4.25 bpw 512 pp512 923.43 923.34 1.00
RX 6800 llama 8B Q2_K_M 16 pp512 219.61 221.96 1.01
RX 6800 llama 8B Q2_K_M 32 pp512 306.66 306.87 1.00
RX 6800 llama 8B Q2_K_M 64 pp512 340.60 340.65 1.00
RX 6800 llama 8B Q2_K_M 128 pp512 427.03 425.04 1.00
RX 6800 llama 8B Q2_K_M 256 pp512 514.50 509.21 0.99
RX 6800 llama 8B Q2_K_M 512 pp512 553.73 548.36 0.99
RX 6800 llama 8B Q3_K_S 16 pp512 182.76 182.72 1.00
RX 6800 llama 8B Q3_K_S 32 pp512 340.71 340.62 1.00
RX 6800 llama 8B Q3_K_S 64 pp512 406.47 406.62 1.00
RX 6800 llama 8B Q3_K_S 128 pp512 513.60 514.09 1.00
RX 6800 llama 8B Q3_K_S 256 pp512 601.91 602.16 1.00
RX 6800 llama 8B Q3_K_S 512 pp512 649.16 649.20 1.00
RX 6800 llama 8B Q4_0 16 pp512 300.69 304.05 1.01
RX 6800 llama 8B Q4_0 32 pp512 467.26 467.26 1.00
RX 6800 llama 8B Q4_0 64 pp512 599.96 598.85 1.00
RX 6800 llama 8B Q4_0 128 pp512 765.67 766.89 1.00
RX 6800 llama 8B Q4_0 256 pp512 901.70 904.15 1.00
RX 6800 llama 8B Q4_0 512 pp512 972.21 974.40 1.00
RX 6800 llama 8B Q4_1 16 pp512 307.17 291.00 0.95
RX 6800 llama 8B Q4_1 32 pp512 436.49 436.55 1.00
RX 6800 llama 8B Q4_1 64 pp512 565.49 577.10 1.02
RX 6800 llama 8B Q4_1 128 pp512 726.79 741.31 1.02
RX 6800 llama 8B Q4_1 256 pp512 857.58 874.32 1.02
RX 6800 llama 8B Q4_1 512 pp512 927.41 943.09 1.02
RX 6800 llama 8B Q4_K_S 16 pp512 257.76 253.67 0.98
RX 6800 llama 8B Q4_K_S 32 pp512 365.71 366.65 1.00
RX 6800 llama 8B Q4_K_S 64 pp512 390.53 412.70 1.06
RX 6800 llama 8B Q4_K_S 128 pp512 478.91 515.86 1.08
RX 6800 llama 8B Q4_K_S 256 pp512 585.76 624.65 1.07
RX 6800 llama 8B Q4_K_S 512 pp512 638.76 679.08 1.06
RX 6800 llama 8B Q5_0 16 pp512 217.65 217.90 1.00
RX 6800 llama 8B Q5_0 32 pp512 169.78 169.71 1.00
RX 6800 llama 8B Q5_0 64 pp512 535.99 537.32 1.00
RX 6800 llama 8B Q5_0 128 pp512 681.97 683.63 1.00
RX 6800 llama 8B Q5_0 256 pp512 797.52 797.82 1.00
RX 6800 llama 8B Q5_0 512 pp512 856.05 855.72 1.00
RX 6800 llama 8B Q5_1 16 pp512 199.28 199.47 1.00
RX 6800 llama 8B Q5_1 32 pp512 398.97 399.30 1.00
RX 6800 llama 8B Q5_1 64 pp512 536.91 537.65 1.00
RX 6800 llama 8B Q5_1 128 pp512 694.94 694.66 1.00
RX 6800 llama 8B Q5_1 256 pp512 817.27 818.28 1.00
RX 6800 llama 8B Q5_1 512 pp512 880.52 881.12 1.00
RX 6800 llama 8B Q5_K_S 16 pp512 235.56 166.97 0.71
RX 6800 llama 8B Q5_K_S 32 pp512 367.02 372.94 1.02
RX 6800 llama 8B Q5_K_S 64 pp512 390.55 388.75 1.00
RX 6800 llama 8B Q5_K_S 128 pp512 481.30 477.65 0.99
RX 6800 llama 8B Q5_K_S 256 pp512 589.37 584.07 0.99
RX 6800 llama 8B Q5_K_S 512 pp512 641.94 636.43 0.99
RX 6800 llama 8B Q6_K 16 pp512 156.74 153.98 0.98
RX 6800 llama 8B Q6_K 32 pp512 332.98 329.94 0.99
RX 6800 llama 8B Q6_K 64 pp512 342.94 342.76 1.00
RX 6800 llama 8B Q6_K 128 pp512 425.22 424.53 1.00
RX 6800 llama 8B Q6_K 256 pp512 514.83 514.11 1.00
RX 6800 llama 8B Q6_K 512 pp512 558.28 557.63 1.00
RX 6800 llama 8B Q8_0 16 pp512 267.61 271.50 1.01
RX 6800 llama 8B Q8_0 32 pp512 267.61 223.29 0.83
RX 6800 llama 8B Q8_0 64 pp512 563.32 562.79 1.00
RX 6800 llama 8B Q8_0 128 pp512 723.25 723.77 1.00
RX 6800 llama 8B Q8_0 256 pp512 862.30 861.38 1.00
RX 6800 llama 8B Q8_0 512 pp512 931.91 931.91 1.00
RX 9060 XT llama 8B IQ1_S - 1.5625 bpw 16 pp512 659.89 660.99 1.00
RX 9060 XT llama 8B IQ1_S - 1.5625 bpw 32 pp512 905.17 910.45 1.01
RX 9060 XT llama 8B IQ1_S - 1.5625 bpw 64 pp512 1428.91 1424.52 1.00
RX 9060 XT llama 8B IQ1_S - 1.5625 bpw 128 pp512 1932.97 2100.75 1.09
RX 9060 XT llama 8B IQ1_S - 1.5625 bpw 256 pp512 2149.10 2347.54 1.09
RX 9060 XT llama 8B IQ1_S - 1.5625 bpw 512 pp512 2195.83 2397.54 1.09
RX 9060 XT llama 8B IQ2_S - 2.5 bpw 16 pp512 430.47 428.13 0.99
RX 9060 XT llama 8B IQ2_S - 2.5 bpw 32 pp512 719.50 724.33 1.01
RX 9060 XT llama 8B IQ2_S - 2.5 bpw 64 pp512 1185.70 1186.23 1.00
RX 9060 XT llama 8B IQ2_S - 2.5 bpw 128 pp512 1759.29 1795.86 1.02
RX 9060 XT llama 8B IQ2_S - 2.5 bpw 256 pp512 1905.36 1933.93 1.01
RX 9060 XT llama 8B IQ2_S - 2.5 bpw 512 pp512 1923.13 1947.77 1.01
RX 9060 XT llama 8B IQ2_XS - 2.3125 bpw 16 pp512 417.30 414.13 0.99
RX 9060 XT llama 8B IQ2_XS - 2.3125 bpw 32 pp512 706.23 705.32 1.00
RX 9060 XT llama 8B IQ2_XS - 2.3125 bpw 64 pp512 1166.29 1160.91 1.00
RX 9060 XT llama 8B IQ2_XS - 2.3125 bpw 128 pp512 1681.34 1704.51 1.01
RX 9060 XT llama 8B IQ2_XS - 2.3125 bpw 256 pp512 1836.28 1854.80 1.01
RX 9060 XT llama 8B IQ2_XS - 2.3125 bpw 512 pp512 1877.40 1899.61 1.01
RX 9060 XT llama 8B IQ2_XXS - 2.0625 bpw 16 pp512 546.92 548.34 1.00
RX 9060 XT llama 8B IQ2_XXS - 2.0625 bpw 32 pp512 745.78 746.16 1.00
RX 9060 XT llama 8B IQ2_XXS - 2.0625 bpw 64 pp512 1372.93 1371.60 1.00
RX 9060 XT llama 8B IQ2_XXS - 2.0625 bpw 128 pp512 2065.81 2104.73 1.02
RX 9060 XT llama 8B IQ2_XXS - 2.0625 bpw 256 pp512 2271.26 2327.47 1.02
RX 9060 XT llama 8B IQ2_XXS - 2.0625 bpw 512 pp512 2328.54 2393.90 1.03
RX 9060 XT llama 8B IQ3_S - 3.4375 bpw 16 pp512 527.74 530.30 1.00
RX 9060 XT llama 8B IQ3_S - 3.4375 bpw 32 pp512 720.59 777.47 1.08
RX 9060 XT llama 8B IQ3_S - 3.4375 bpw 64 pp512 1344.11 1338.93 1.00
RX 9060 XT llama 8B IQ3_S - 3.4375 bpw 128 pp512 2081.21 2099.25 1.01
RX 9060 XT llama 8B IQ3_S - 3.4375 bpw 256 pp512 2269.83 2298.34 1.01
RX 9060 XT llama 8B IQ3_S - 3.4375 bpw 512 pp512 2305.49 2341.83 1.02
RX 9060 XT llama 8B IQ3_S mix - 3.66 bpw 16 pp512 531.13 531.95 1.00
RX 9060 XT llama 8B IQ3_S mix - 3.66 bpw 32 pp512 732.60 788.71 1.08
RX 9060 XT llama 8B IQ3_S mix - 3.66 bpw 64 pp512 1348.39 1360.42 1.01
RX 9060 XT llama 8B IQ3_S mix - 3.66 bpw 128 pp512 2068.09 2117.69 1.02
RX 9060 XT llama 8B IQ3_S mix - 3.66 bpw 256 pp512 2262.12 2307.63 1.02
RX 9060 XT llama 8B IQ3_S mix - 3.66 bpw 512 pp512 2294.13 2346.22 1.02
RX 9060 XT llama 8B IQ3_XS - 3.3 bpw 16 pp512 533.88 533.88 1.00
RX 9060 XT llama 8B IQ3_XS - 3.3 bpw 32 pp512 721.65 747.16 1.04
RX 9060 XT llama 8B IQ3_XS - 3.3 bpw 64 pp512 1346.18 1349.49 1.00
RX 9060 XT llama 8B IQ3_XS - 3.3 bpw 128 pp512 2072.51 2110.04 1.02
RX 9060 XT llama 8B IQ3_XS - 3.3 bpw 512 pp512 2314.56 2353.27 1.02
RX 9060 XT llama 8B IQ3_XXS - 3.0625 bpw 16 pp512 518.49 519.71 1.00
RX 9060 XT llama 8B IQ3_XXS - 3.0625 bpw 32 pp512 727.24 725.63 1.00
RX 9060 XT llama 8B IQ3_XXS - 3.0625 bpw 64 pp512 1330.10 1343.37 1.01
RX 9060 XT llama 8B IQ3_XXS - 3.0625 bpw 128 pp512 2031.64 2075.09 1.02
RX 9060 XT llama 8B IQ3_XXS - 3.0625 bpw 256 pp512 2233.41 2265.61 1.01
RX 9060 XT llama 8B IQ3_XXS - 3.0625 bpw 512 pp512 2269.45 2296.51 1.01
RX 9060 XT llama 8B IQ4_NL - 4.5 bpw 16 pp512 652.31 647.12 0.99
RX 9060 XT llama 8B IQ4_NL - 4.5 bpw 32 pp512 958.43 963.69 1.01
RX 9060 XT llama 8B IQ4_NL - 4.5 bpw 64 pp512 1600.81 1599.25 1.00
RX 9060 XT llama 8B IQ4_NL - 4.5 bpw 128 pp512 2411.46 2466.90 1.02
RX 9060 XT llama 8B IQ4_NL - 4.5 bpw 256 pp512 2680.88 2718.89 1.01
RX 9060 XT llama 8B IQ4_NL - 4.5 bpw 512 pp512 2775.91 2787.84 1.00
RX 9060 XT llama 8B IQ4_XS - 4.25 bpw 16 pp512 679.52 677.92 1.00
RX 9060 XT llama 8B IQ4_XS - 4.25 bpw 32 pp512 975.74 975.18 1.00
RX 9060 XT llama 8B IQ4_XS - 4.25 bpw 64 pp512 1640.01 1643.70 1.00
RX 9060 XT llama 8B IQ4_XS - 4.25 bpw 128 pp512 2459.76 2493.38 1.01
RX 9060 XT llama 8B IQ4_XS - 4.25 bpw 256 pp512 2743.60 2758.06 1.01
RX 9060 XT llama 8B IQ4_XS - 4.25 bpw 512 pp512 2855.57 2849.36 1.00
RX 9060 XT llama 8B Q2_K_M 16 pp512 431.41 421.73 0.98
RX 9060 XT llama 8B Q2_K_M 32 pp512 585.97 584.73 1.00
RX 9060 XT llama 8B Q2_K_M 64 pp512 904.64 897.74 0.99
RX 9060 XT llama 8B Q2_K_M 128 pp512 1164.03 1156.76 0.99
RX 9060 XT llama 8B Q2_K_M 256 pp512 1267.53 1260.11 0.99
RX 9060 XT llama 8B Q2_K_M 512 pp512 1258.00 1267.00 1.01
RX 9060 XT llama 8B Q3_K_S 16 pp512 516.26 443.90 0.86
RX 9060 XT llama 8B Q3_K_S 32 pp512 790.32 791.46 1.00
RX 9060 XT llama 8B Q3_K_S 64 pp512 1284.07 1283.05 1.00
RX 9060 XT llama 8B Q3_K_S 128 pp512 1871.61 1896.22 1.01
RX 9060 XT llama 8B Q3_K_S 256 pp512 2041.17 2061.35 1.01
RX 9060 XT llama 8B Q3_K_S 512 pp512 2111.65 2131.52 1.01
RX 9060 XT llama 8B Q4_0 16 pp512 631.41 628.67 1.00
RX 9060 XT llama 8B Q4_0 32 pp512 925.50 921.58 1.00
RX 9060 XT llama 8B Q4_0 64 pp512 1548.62 1540.35 0.99
RX 9060 XT llama 8B Q4_0 128 pp512 2369.19 2384.34 1.01
RX 9060 XT llama 8B Q4_0 256 pp512 2636.61 2654.60 1.01
RX 9060 XT llama 8B Q4_0 512 pp512 2742.78 2760.74 1.01
RX 9060 XT llama 8B Q4_1 16 pp512 610.28 610.20 1.00
RX 9060 XT llama 8B Q4_1 32 pp512 905.37 908.38 1.00
RX 9060 XT llama 8B Q4_1 64 pp512 1489.81 1479.42 0.99
RX 9060 XT llama 8B Q4_1 128 pp512 2012.36 2234.12 1.11
RX 9060 XT llama 8B Q4_1 256 pp512 2209.06 2466.31 1.12
RX 9060 XT llama 8B Q4_1 512 pp512 2296.51 2555.77 1.11
RX 9060 XT llama 8B Q4_K_S 16 pp512 573.68 571.62 1.00
RX 9060 XT llama 8B Q4_K_S 32 pp512 862.93 861.61 1.00
RX 9060 XT llama 8B Q4_K_S 64 pp512 1435.03 1434.54 1.00
RX 9060 XT llama 8B Q4_K_S 128 pp512 2036.57 2186.89 1.07
RX 9060 XT llama 8B Q4_K_S 256 pp512 2237.76 2419.16 1.08
RX 9060 XT llama 8B Q4_K_S 512 pp512 2300.93 2483.36 1.08
RX 9060 XT llama 8B Q5_0 16 pp512 552.90 548.76 0.99
RX 9060 XT llama 8B Q5_0 32 pp512 827.20 845.08 1.02
RX 9060 XT llama 8B Q5_0 64 pp512 1405.35 1422.70 1.01
RX 9060 XT llama 8B Q5_0 128 pp512 2232.28 2223.87 1.00
RX 9060 XT llama 8B Q5_0 256 pp512 2465.04 2452.00 0.99
RX 9060 XT llama 8B Q5_0 512 pp512 2556.69 2534.07 0.99
RX 9060 XT llama 8B Q5_1 16 pp512 446.84 440.49 0.99
RX 9060 XT llama 8B Q5_1 32 pp512 690.70 693.79 1.00
RX 9060 XT llama 8B Q5_1 64 pp512 1283.22 1289.65 1.01
RX 9060 XT llama 8B Q5_1 128 pp512 1842.13 2039.10 1.11
RX 9060 XT llama 8B Q5_1 256 pp512 2037.35 2289.46 1.12
RX 9060 XT llama 8B Q5_1 512 pp512 2115.79 2394.65 1.13
RX 9060 XT llama 8B Q5_K_S 16 pp512 591.76 588.73 0.99
RX 9060 XT llama 8B Q5_K_S 32 pp512 890.93 890.42 1.00
RX 9060 XT llama 8B Q5_K_S 64 pp512 1451.85 1457.23 1.00
RX 9060 XT llama 8B Q5_K_S 128 pp512 1955.74 2201.97 1.13
RX 9060 XT llama 8B Q5_K_S 256 pp512 2149.69 2427.90 1.13
RX 9060 XT llama 8B Q5_K_S 512 pp512 2224.17 2511.84 1.13
RX 9060 XT llama 8B Q6_K 16 pp512 467.79 464.99 0.99
RX 9060 XT llama 8B Q6_K 32 pp512 663.46 663.80 1.00
RX 9060 XT llama 8B Q6_K 64 pp512 973.54 978.99 1.01
RX 9060 XT llama 8B Q6_K 128 pp512 1236.01 1288.82 1.04
RX 9060 XT llama 8B Q6_K 256 pp512 1347.84 1406.46 1.04
RX 9060 XT llama 8B Q6_K 512 pp512 1374.01 1430.88 1.04
RX 9060 XT llama 8B Q8_0 16 pp512 474.30 473.32 1.00
RX 9060 XT llama 8B Q8_0 32 pp512 775.68 774.15 1.00
RX 9060 XT llama 8B Q8_0 64 pp512 1359.89 1378.58 1.01
RX 9060 XT llama 8B Q8_0 128 pp512 2184.25 2225.27 1.02
RX 9060 XT llama 8B Q8_0 256 pp512 2460.16 2517.06 1.02
RX 9060 XT llama 8B Q8_0 512 pp512 2574.13 2637.62 1.02

On my NVIDIA hardware I am seeing no changes to performance beyond statistical fluctuations. On my AMD hardware however the performance is changing and quite frankly I don't understand why, the only thing that should have really changed is that there are now some if constexpr rather than macros. On average the impact is at least slightly positive; truth be told these unforeseen and hard to explain changes in AMD performance from seemingly innocuous code changes are a major reason why I want a more granular way to configure the kernel in the first place.

@ggerganov
Copy link
Copy Markdown
Member

@JohannesGaessler There are some failing tests on DGX Spark with MXFP4 and NVFP:

https://github.com/ggml-org/llama.cpp/actions/runs/26955882676/job/79532858384?pr=24127#step:3:38920

@JohannesGaessler
Copy link
Copy Markdown
Contributor Author

Should be fixed now, the Blackwell config was wrong but in such a way that did not consistently result in incorrect outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants