Deepseek v3 will make MoE opensource models wayyy more common.
IDK why no one is talking about this but i just finished reading Deepseek v3's technical report, and how they’ve found innovative and novel solution for one of the biggest challenges with training MoE architectures which is irregular loss spiking.
this issue was probably the major reason why we haven’t seen widespread adoption of MoE models before. But now, with their novel solutions laid out in this open report, it’s likely that other companies will start implementing similar approaches.
I can already imagine a MoE powered Qwen or Llama becoming flagship models in future, just like deepseek