Git commit
sd-master-06accf2-bin-win-cuda12-x64
Operating System & Version
Windows 11
GGML backends
CUDA
Command-line arguments used
.\sd-cli.exe -m "I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors" -p "a lovely cat holding a sign say 'hidream o1 cpp'" --cfg-scale 1.0 -v -H 1024 -W 1024
Steps to reproduce
Start cli with same params than SD server...
Generate image with both
Observe the speed
What you expected to happen
Same speed when generating image
What actually happened
Almost 10x slower speed
Logs / error messages / stack trace
sd-cli.exe
PS C:\Users\copyhere2\Downloads\sd-master-06accf2-bin-win-cuda12-x64> .\sd-cli.exe -m "I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors" -p "a lovely cat holding a sign say 'hidream o1 cpp'" --cfg-scale 1.0 -v -H 1024 -W 1024
[DEBUG] main.cpp:597 - version: stable-diffusion.cpp version unknown, commit 06accf2
[DEBUG] main.cpp:598 - System Info:
SSE3 = 1 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | VSX = 0 |
[DEBUG] main.cpp:599 - SDCliParams {
mode: img_gen,
output_path: "output.png",
image_path: "",
metadata_format: "text",
verbose: true,
color: false,
canny_preprocess: false,
convert_name: false,
preview_method: none,
preview_interval: 1,
preview_path: "preview.png",
preview_fps: 16,
taesd_preview: false,
preview_noisy: false,
metadata_raw: false,
metadata_brief: false,
metadata_all: false
}
[DEBUG] main.cpp:600 - SDContextParams {
n_threads: 10,
model_path: "I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors",
clip_l_path: "",
clip_g_path: "",
clip_vision_path: "",
t5xxl_path: "",
llm_path: "",
llm_vision_path: "",
diffusion_model_path: "",
high_noise_diffusion_model_path: "",
embeddings_connectors_path: "",
vae_path: "",
audio_vae_path: "",
taesd_path: "",
esrgan_path: "",
control_net_path: "",
embedding_dir: "",
embeddings: {
}
wtype: NONE,
tensor_type_rules: "",
lora_model_dir: ".",
hires_upscalers_dir: "",
photo_maker_path: "",
rng_type: cuda,
sampler_rng_type: NONE,
offload_params_to_cpu: false,
max_vram: 0,
backend: "",
params_backend: "",
enable_mmap: false,
control_net_cpu: false,
clip_on_cpu: false,
vae_on_cpu: false,
flash_attn: false,
diffusion_flash_attn: false,
diffusion_conv_direct: false,
vae_conv_direct: false,
circular: false,
circular_x: false,
circular_y: false,
chroma_use_dit_mask: true,
qwen_image_zero_cond_t: false,
chroma_use_t5_mask: false,
chroma_t5_mask_pad: 1,
prediction: NONE,
lora_apply_mode: auto,
force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:601 - SDGenerationParams {
loras: "{
}",
high_noise_loras: "{
}",
prompt: "a lovely cat holding a sign say 'hidream o1 cpp'",
negative_prompt: "",
clip_skip: -1,
width: 1024,
height: 1024,
batch_count: 1,
init_image_path: "",
end_image_path: "",
mask_image_path: "",
control_image_path: "",
ref_image_paths: [],
control_video_path: "",
auto_resize_ref_image: true,
increase_ref_index: false,
pm_id_images_dir: "",
pm_id_embed_path: "",
pm_style_strength: 20,
skip_layers: [7, 8, 9],
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: inf, shifted_timestep: 0, flow_shift: inf, extra_sample_args: ),
high_noise_skip_layers: [7, 8, 9],
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: inf, shifted_timestep: 0, flow_shift: inf, extra_sample_args: ),
custom_sigmas: [],
cache_mode: "",
cache_option: "",
cache: disabled (threshold=inf, start=0.15, end=0.95),
moe_boundary: 0.875,
video_frames: 1,
fps: 16,
vace_strength: 1,
strength: 0.75,
control_strength: 0.9,
seed: 42,
upscale_repeats: 1,
upscale_tile_size: 128,
hires: { enabled: false, upscaler: "Latent", model_path: "", scale: 2, target_width: 0, target_height: 0, steps: 0, denoising_strength: 0.7, upscale_tile_size: 128 },
vae_tiling_params: { 0, 0, 0, 0, 0.5, 0, 0 },
}
[INFO ] ggml_extend.hpp:63 - ggml_cuda_init: found 1 CUDA devices (Total VRAM: 16375 MiB):
[INFO ] ggml_extend.hpp:63 - Device 0: NVIDIA GeForce RTX 4070 Ti SUPER, compute capability 8.9, VMM: yes, VRAM: 16375 MiB
[DEBUG] ggml_extend_backend.cpp:311 - Found 2 backend devices:
[DEBUG] ggml_extend_backend.cpp:314 - #0: CUDA0
[DEBUG] ggml_extend_backend.cpp:314 - #1: CPU
[DEBUG] ggml_extend_backend.cpp:291 - Initializing backend: CUDA0
[INFO ] stable-diffusion.cpp:249 - loading model from 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors'
[INFO ] model.cpp:219 - load I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors using safetensors format
[DEBUG] model.cpp:294 - init from 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:358 - Version: HiDream O1
[INFO ] stable-diffusion.cpp:386 - Weight type stat: bf16: 758
[INFO ] stable-diffusion.cpp:387 - Conditioner weight type stat:
[INFO ] stable-diffusion.cpp:388 - Diffusion model weight type stat:
[INFO ] stable-diffusion.cpp:389 - VAE weight type stat:
[DEBUG] stable-diffusion.cpp:391 - ggml tensor size = 400 bytes
[DEBUG] qwen2_tokenizer.cpp:14 - merges size 151387
[DEBUG] qwen2_tokenizer.cpp:39 - vocab size: 151674
[INFO ] stable-diffusion.cpp:739 - using FakeVAE
[DEBUG] stable-diffusion.cpp:880 - loading weights
[DEBUG] ggml_extend.hpp:2688 - hidream_o1_vision params backend buffer size = 875.61 MB(VRAM) (333 tensors)
[DEBUG] ggml_extend.hpp:2688 - hidream_o1 params backend buffer size = 15695.23 MB(VRAM) (407 tensors)
[DEBUG] ggml_extend.hpp:2671 - fake_vae skipping params allocation (no tensors)
[INFO ] model.cpp:799 - NOT using mmap for 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors' (mmap disabled by caller)
[INFO ] model.cpp:810 - model files processing completed in 0.00s
[DEBUG] model.cpp:909 - using 10 threads for model loading
[DEBUG] model.cpp:925 - loading tensors from I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors
|==================================================| 758/758 - 1.95GB/s
[INFO ] model.cpp:1143 - loading tensors completed, taking 7.69s (read: 4.42s, memcpy: 0.00s, convert: 0.05s, copy_to_backend: 1.16s)
[DEBUG] stable-diffusion.cpp:971 - finished loaded file
[INFO ] stable-diffusion.cpp:1053 - total params memory size = 16570.85MB (VRAM 16570.85MB, RAM 0.00MB): text_encoders 875.61MB(VRAM), diffusion_model 15695.23MB(VRAM), vae 0.00MB(N/A), controlnet 0.00MB(N/A), pmid 0.00MB(N/A)
[INFO ] stable-diffusion.cpp:1130 - running in FLOW mode
[INFO ] stable-diffusion.cpp:3894 - generate_image 1024x1024
[INFO ] denoiser.hpp:637 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3214 - sampling using Euler method
[DEBUG] bpe_tokenizer.cpp:207 - split prompt "<|im_start|>user
a lovely cat holding a sign say 'hidream o1 cpp'<|im_end|>
<|im_start|>assistant
<|boi_token|><|tms_token|>" to tokens ["<|im_start|>", "user", "Ċ", "a", "Ġlovely", "Ġcat", "Ġholding", "Ġa", "Ġsign", "Ġsay", "Ġ'", "hid", "ream", "Ġo", "1", "Ġcpp", "'", "<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", "<|boi_token|>", "<|tms_token|>", ]
[INFO ] stable-diffusion.cpp:3695 - get_learned_condition completed, taking 0.00s
[INFO ] stable-diffusion.cpp:3928 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1907 - hidream_o1 compute buffer size: 208.23 MB(VRAM)
|==================================================| 20/20 - 1.74s/it
[INFO ] stable-diffusion.cpp:3962 - sampling completed, taking 38.29s
[INFO ] stable-diffusion.cpp:3982 - generating 1 latent images completed, taking 38.47s
[INFO ] stable-diffusion.cpp:3719 - decoding 1 latents
[DEBUG] vae.hpp:209 - computing vae decode graph completed, taking 0.01s
[INFO ] stable-diffusion.cpp:3735 - latent 1 decoded, taking 0.01s
[INFO ] stable-diffusion.cpp:3739 - decode_first_stage completed, taking 0.01s
[INFO ] stable-diffusion.cpp:4125 - generate_image completed in 38.60s
[INFO ] main.cpp:462 - save result image 0 to 'output.png' (success)
[INFO ] main.cpp:534 - 1/1 images saved
SD server
PS C:\Users\copyhere2\Downloads\sd-master-06accf2-bin-win-cuda12-x64> .\sd-server.exe -m "I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors"
[INFO ] ggml_extend.hpp:63 - ggml_cuda_init: found 1 CUDA devices (Total VRAM: 16375 MiB):
[INFO ] ggml_extend.hpp:63 - Device 0: NVIDIA GeForce RTX 4070 Ti SUPER, compute capability 8.9, VMM: yes, VRAM: 16375 MiB
[INFO ] stable-diffusion.cpp:249 - loading model from 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors'
[INFO ] model.cpp:219 - load I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:358 - Version: HiDream O1
[INFO ] stable-diffusion.cpp:386 - Weight type stat: bf16: 758
[INFO ] stable-diffusion.cpp:387 - Conditioner weight type stat:
[INFO ] stable-diffusion.cpp:388 - Diffusion model weight type stat:
[INFO ] stable-diffusion.cpp:389 - VAE weight type stat:
[INFO ] stable-diffusion.cpp:739 - using FakeVAE
[INFO ] model.cpp:799 - NOT using mmap for 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors' (mmap disabled by caller)
[INFO ] model.cpp:810 - model files processing completed in 0.00s
|==================================================| 758/758 - 2.82GB/s
[INFO ] model.cpp:1143 - loading tensors completed, taking 5.33s (read: 3.35s, memcpy: 0.00s, convert: 0.11s, copy_to_backend: 1.54s)
[INFO ] stable-diffusion.cpp:1053 - total params memory size = 16570.85MB (VRAM 16570.85MB, RAM 0.00MB): text_encoders 875.61MB(VRAM), diffusion_model 15695.23MB(VRAM), vae 0.00MB(N/A), controlnet 0.00MB(N/A), pmid 0.00MB(N/A)
[INFO ] stable-diffusion.cpp:1130 - running in FLOW mode
[INFO ] main.cpp:148 - listening on: http://127.0.0.1:1234
[INFO ] stable-diffusion.cpp:3894 - generate_image 1024x1024
[INFO ] denoiser.hpp:637 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3214 - sampling using Euler method
[INFO ] stable-diffusion.cpp:3695 - get_learned_condition completed, taking 0.00s
[INFO ] stable-diffusion.cpp:3928 - generating image: 1/1 - seed 42
|==================================================| 20/20 - 11.49s/it
[INFO ] stable-diffusion.cpp:3962 - sampling completed, taking 232.61s
[INFO ] stable-diffusion.cpp:3982 - generating 1 latent images completed, taking 232.61s
[INFO ] stable-diffusion.cpp:3719 - decoding 1 latents
[INFO ] stable-diffusion.cpp:3735 - latent 1 decoded, taking 0.01s
[INFO ] stable-diffusion.cpp:3739 - decode_first_stage completed, taking 0.01s
[INFO ] stable-diffusion.cpp:4125 - generate_image completed in 232.63s
Additional context / environment details
Would guess that sending API call overrides some default setting that is used when doing the minimal command line command with cli
Git commit
sd-master-06accf2-bin-win-cuda12-x64
Operating System & Version
Windows 11
GGML backends
CUDA
Command-line arguments used
.\sd-cli.exe -m "I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors" -p "a lovely cat holding a sign say 'hidream o1 cpp'" --cfg-scale 1.0 -v -H 1024 -W 1024
Steps to reproduce
Start cli with same params than SD server...
Generate image with both
Observe the speed
What you expected to happen
Same speed when generating image
What actually happened
Almost 10x slower speed
Logs / error messages / stack trace
sd-cli.exe
PS C:\Users\copyhere2\Downloads\sd-master-06accf2-bin-win-cuda12-x64> .\sd-cli.exe -m "I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors" -p "a lovely cat holding a sign say 'hidream o1 cpp'" --cfg-scale 1.0 -v -H 1024 -W 1024
[DEBUG] main.cpp:597 - version: stable-diffusion.cpp version unknown, commit 06accf2
[DEBUG] main.cpp:598 - System Info:
SSE3 = 1 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | VSX = 0 |
[DEBUG] main.cpp:599 - SDCliParams {
mode: img_gen,
output_path: "output.png",
image_path: "",
metadata_format: "text",
verbose: true,
color: false,
canny_preprocess: false,
convert_name: false,
preview_method: none,
preview_interval: 1,
preview_path: "preview.png",
preview_fps: 16,
taesd_preview: false,
preview_noisy: false,
metadata_raw: false,
metadata_brief: false,
metadata_all: false
}
[DEBUG] main.cpp:600 - SDContextParams {
n_threads: 10,
model_path: "I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors",
clip_l_path: "",
clip_g_path: "",
clip_vision_path: "",
t5xxl_path: "",
llm_path: "",
llm_vision_path: "",
diffusion_model_path: "",
high_noise_diffusion_model_path: "",
embeddings_connectors_path: "",
vae_path: "",
audio_vae_path: "",
taesd_path: "",
esrgan_path: "",
control_net_path: "",
embedding_dir: "",
embeddings: {
}
wtype: NONE,
tensor_type_rules: "",
lora_model_dir: ".",
hires_upscalers_dir: "",
photo_maker_path: "",
rng_type: cuda,
sampler_rng_type: NONE,
offload_params_to_cpu: false,
max_vram: 0,
backend: "",
params_backend: "",
enable_mmap: false,
control_net_cpu: false,
clip_on_cpu: false,
vae_on_cpu: false,
flash_attn: false,
diffusion_flash_attn: false,
diffusion_conv_direct: false,
vae_conv_direct: false,
circular: false,
circular_x: false,
circular_y: false,
chroma_use_dit_mask: true,
qwen_image_zero_cond_t: false,
chroma_use_t5_mask: false,
chroma_t5_mask_pad: 1,
prediction: NONE,
lora_apply_mode: auto,
force_sdxl_vae_conv_scale: false
}
[DEBUG] main.cpp:601 - SDGenerationParams {
loras: "{
}",
high_noise_loras: "{
}",
prompt: "a lovely cat holding a sign say 'hidream o1 cpp'",
negative_prompt: "",
clip_skip: -1,
width: 1024,
height: 1024,
batch_count: 1,
init_image_path: "",
end_image_path: "",
mask_image_path: "",
control_image_path: "",
ref_image_paths: [],
control_video_path: "",
auto_resize_ref_image: true,
increase_ref_index: false,
pm_id_images_dir: "",
pm_id_embed_path: "",
pm_style_strength: 20,
skip_layers: [7, 8, 9],
sample_params: (txt_cfg: 1.00, img_cfg: 1.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: inf, shifted_timestep: 0, flow_shift: inf, extra_sample_args: ),
high_noise_skip_layers: [7, 8, 9],
high_noise_sample_params: (txt_cfg: 7.00, img_cfg: 7.00, distilled_guidance: 3.50, slg.layer_count: 0, slg.layer_start: 0.01, slg.layer_end: 0.20, slg.scale: 0.00, scheduler: NONE, sample_method: NONE, sample_steps: 20, eta: inf, shifted_timestep: 0, flow_shift: inf, extra_sample_args: ),
custom_sigmas: [],
cache_mode: "",
cache_option: "",
cache: disabled (threshold=inf, start=0.15, end=0.95),
moe_boundary: 0.875,
video_frames: 1,
fps: 16,
vace_strength: 1,
strength: 0.75,
control_strength: 0.9,
seed: 42,
upscale_repeats: 1,
upscale_tile_size: 128,
hires: { enabled: false, upscaler: "Latent", model_path: "", scale: 2, target_width: 0, target_height: 0, steps: 0, denoising_strength: 0.7, upscale_tile_size: 128 },
vae_tiling_params: { 0, 0, 0, 0, 0.5, 0, 0 },
}
[INFO ] ggml_extend.hpp:63 - ggml_cuda_init: found 1 CUDA devices (Total VRAM: 16375 MiB):
[INFO ] ggml_extend.hpp:63 - Device 0: NVIDIA GeForce RTX 4070 Ti SUPER, compute capability 8.9, VMM: yes, VRAM: 16375 MiB
[DEBUG] ggml_extend_backend.cpp:311 - Found 2 backend devices:
[DEBUG] ggml_extend_backend.cpp:314 - #0: CUDA0
[DEBUG] ggml_extend_backend.cpp:314 - #1: CPU
[DEBUG] ggml_extend_backend.cpp:291 - Initializing backend: CUDA0
[INFO ] stable-diffusion.cpp:249 - loading model from 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors'
[INFO ] model.cpp:219 - load I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors using safetensors format
[DEBUG] model.cpp:294 - init from 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors', prefix = ''
[INFO ] stable-diffusion.cpp:358 - Version: HiDream O1
[INFO ] stable-diffusion.cpp:386 - Weight type stat: bf16: 758
[INFO ] stable-diffusion.cpp:387 - Conditioner weight type stat:
[INFO ] stable-diffusion.cpp:388 - Diffusion model weight type stat:
[INFO ] stable-diffusion.cpp:389 - VAE weight type stat:
[DEBUG] stable-diffusion.cpp:391 - ggml tensor size = 400 bytes
[DEBUG] qwen2_tokenizer.cpp:14 - merges size 151387
[DEBUG] qwen2_tokenizer.cpp:39 - vocab size: 151674
[INFO ] stable-diffusion.cpp:739 - using FakeVAE
[DEBUG] stable-diffusion.cpp:880 - loading weights
[DEBUG] ggml_extend.hpp:2688 - hidream_o1_vision params backend buffer size = 875.61 MB(VRAM) (333 tensors)
[DEBUG] ggml_extend.hpp:2688 - hidream_o1 params backend buffer size = 15695.23 MB(VRAM) (407 tensors)
[DEBUG] ggml_extend.hpp:2671 - fake_vae skipping params allocation (no tensors)
[INFO ] model.cpp:799 - NOT using mmap for 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors' (mmap disabled by caller)
[INFO ] model.cpp:810 - model files processing completed in 0.00s
[DEBUG] model.cpp:909 - using 10 threads for model loading
[DEBUG] model.cpp:925 - loading tensors from I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors
|==================================================| 758/758 - 1.95GB/s
[INFO ] model.cpp:1143 - loading tensors completed, taking 7.69s (read: 4.42s, memcpy: 0.00s, convert: 0.05s, copy_to_backend: 1.16s)
[DEBUG] stable-diffusion.cpp:971 - finished loaded file
[INFO ] stable-diffusion.cpp:1053 - total params memory size = 16570.85MB (VRAM 16570.85MB, RAM 0.00MB): text_encoders 875.61MB(VRAM), diffusion_model 15695.23MB(VRAM), vae 0.00MB(N/A), controlnet 0.00MB(N/A), pmid 0.00MB(N/A)
[INFO ] stable-diffusion.cpp:1130 - running in FLOW mode
[INFO ] stable-diffusion.cpp:3894 - generate_image 1024x1024
[INFO ] denoiser.hpp:637 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3214 - sampling using Euler method
[DEBUG] bpe_tokenizer.cpp:207 - split prompt "<|im_start|>user
a lovely cat holding a sign say 'hidream o1 cpp'<|im_end|>
<|im_start|>assistant
<|boi_token|><|tms_token|>" to tokens ["<|im_start|>", "user", "Ċ", "a", "Ġlovely", "Ġcat", "Ġholding", "Ġa", "Ġsign", "Ġsay", "Ġ'", "hid", "ream", "Ġo", "1", "Ġcpp", "'", "<|im_end|>", "Ċ", "<|im_start|>", "assistant", "Ċ", "<|boi_token|>", "<|tms_token|>", ]
[INFO ] stable-diffusion.cpp:3695 - get_learned_condition completed, taking 0.00s
[INFO ] stable-diffusion.cpp:3928 - generating image: 1/1 - seed 42
[DEBUG] ggml_extend.hpp:1907 - hidream_o1 compute buffer size: 208.23 MB(VRAM)
|==================================================| 20/20 - 1.74s/it
[INFO ] stable-diffusion.cpp:3962 - sampling completed, taking 38.29s
[INFO ] stable-diffusion.cpp:3982 - generating 1 latent images completed, taking 38.47s
[INFO ] stable-diffusion.cpp:3719 - decoding 1 latents
[DEBUG] vae.hpp:209 - computing vae decode graph completed, taking 0.01s
[INFO ] stable-diffusion.cpp:3735 - latent 1 decoded, taking 0.01s
[INFO ] stable-diffusion.cpp:3739 - decode_first_stage completed, taking 0.01s
[INFO ] stable-diffusion.cpp:4125 - generate_image completed in 38.60s
[INFO ] main.cpp:462 - save result image 0 to 'output.png' (success)
[INFO ] main.cpp:534 - 1/1 images saved
SD server
PS C:\Users\copyhere2\Downloads\sd-master-06accf2-bin-win-cuda12-x64> .\sd-server.exe -m "I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors"
[INFO ] ggml_extend.hpp:63 - ggml_cuda_init: found 1 CUDA devices (Total VRAM: 16375 MiB):
[INFO ] ggml_extend.hpp:63 - Device 0: NVIDIA GeForce RTX 4070 Ti SUPER, compute capability 8.9, VMM: yes, VRAM: 16375 MiB
[INFO ] stable-diffusion.cpp:249 - loading model from 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors'
[INFO ] model.cpp:219 - load I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors using safetensors format
[INFO ] stable-diffusion.cpp:358 - Version: HiDream O1
[INFO ] stable-diffusion.cpp:386 - Weight type stat: bf16: 758
[INFO ] stable-diffusion.cpp:387 - Conditioner weight type stat:
[INFO ] stable-diffusion.cpp:388 - Diffusion model weight type stat:
[INFO ] stable-diffusion.cpp:389 - VAE weight type stat:
[INFO ] stable-diffusion.cpp:739 - using FakeVAE
[INFO ] model.cpp:799 - NOT using mmap for 'I:\LvsSdModels\hiDream\hidream_o1_image_bf16.safetensors' (mmap disabled by caller)
[INFO ] model.cpp:810 - model files processing completed in 0.00s
|==================================================| 758/758 - 2.82GB/s
[INFO ] model.cpp:1143 - loading tensors completed, taking 5.33s (read: 3.35s, memcpy: 0.00s, convert: 0.11s, copy_to_backend: 1.54s)
[INFO ] stable-diffusion.cpp:1053 - total params memory size = 16570.85MB (VRAM 16570.85MB, RAM 0.00MB): text_encoders 875.61MB(VRAM), diffusion_model 15695.23MB(VRAM), vae 0.00MB(N/A), controlnet 0.00MB(N/A), pmid 0.00MB(N/A)
[INFO ] stable-diffusion.cpp:1130 - running in FLOW mode
[INFO ] main.cpp:148 - listening on: http://127.0.0.1:1234
[INFO ] stable-diffusion.cpp:3894 - generate_image 1024x1024
[INFO ] denoiser.hpp:637 - get_sigmas with discrete scheduler
[INFO ] stable-diffusion.cpp:3214 - sampling using Euler method
[INFO ] stable-diffusion.cpp:3695 - get_learned_condition completed, taking 0.00s
[INFO ] stable-diffusion.cpp:3928 - generating image: 1/1 - seed 42
|==================================================| 20/20 - 11.49s/it
[INFO ] stable-diffusion.cpp:3962 - sampling completed, taking 232.61s
[INFO ] stable-diffusion.cpp:3982 - generating 1 latent images completed, taking 232.61s
[INFO ] stable-diffusion.cpp:3719 - decoding 1 latents
[INFO ] stable-diffusion.cpp:3735 - latent 1 decoded, taking 0.01s
[INFO ] stable-diffusion.cpp:3739 - decode_first_stage completed, taking 0.01s
[INFO ] stable-diffusion.cpp:4125 - generate_image completed in 232.63s
Additional context / environment details
Would guess that sending API call overrides some default setting that is used when doing the minimal command line command with cli