What is the largest GPU home cluster running LLMs
Hi,
I am interested of running very large models with multiple GPUs connected to one computer. I have seen someone had 10 7900 XTXs connected to one consumer level motherboard with risers. I have yet tried no more than 3 achieving 72GB of VRAM. The inference speed for 70B llama3.3 was quite good so I was thinking is there like 300GB models which could be run with 13 GPUs? I counted I could attach 13 7900 XTXs on my consumer am5 board with risers. Is here people having what size of GPU clusters made with risers?
I am interested how much does the inference speed slow down when the model size grows like 70B -> 300B if the model is still in VRAM. I am not thinking to run anything with CPU or normal RAM.