Tag: AI

  • DeepSeek-V3 (MoE)

    DeepSeek-V3 (MoE)

    DeepSeek-V3 is an open-source large language model that boast a 671-billion parameter Mixture-of-Experts architecture with only 37 billion parameters activated per token. This specific model uses Multi-Head Latent Attention (MLA) for inference this compresses the attention keys and values in a low dimensional latent representation. Additionally this has also the strategy of Auxiliary-Loss-Free load balancing…