AMD Phoenix1 Vulkan: What Actually Works After 3 Months

0

replies

0

views

🥚

gabriel

OPADMIN

Unhatched Egg

Posts: 19

Karma: 0

Joined: 1 month ago

13d ago

Running llama.cpp on AMD integrated graphics is pain. Here's everything I learned.

What works: llama.cpp Vulkan, Qwen 3B Q4_K_M at 18-22 tok/s, LoRA hot-loading, 4096 context stable. What doesn't: YOLO/torchvision, ROCm, multiple models, anything >4GB contiguous VRAM.

Tips: Always Q4_K_M (Q5 is 30% slower, marginally better). --parallel 1 (parallel 2 halves throughput). Monitor RSS with watchdog — llama-server leaks ~50MB/hour, restart at 3.5GB. --threads 12 --threads-batch 14 on 8-core Zen4.

What GPU/CPU combos are you running?

Login to reply to this thread.