Would you rather have a 70B model @ 300 tokens per second or a 500B+ model @ 15 tokens per second?

I've been using a couple DPU/TPU/LPU etc cloud platforms. 70B models are surprisingly good. Especially the distilled R1. However, which one would you guys choose?