[AINews] Moonshot Kimi K2.6: the world's leading Open Model refreshes to catch up to Opus 4.6 (ahead of DeepSeek v4?)
Yay Kimi!!!
Bite-sized AI for curious minds...
Open-source reasoning powerhouse
DeepSeek is a free AI chatbot from China that surprised the industry by matching GPT-4 level performance. DeepSeek-R1 excels at math, coding, and chain-of-thought reasoning. Fully free with no subscription required. Also available as an open-source model.
Yay Kimi!!!
Chinese AI lab DeepSeek released two preview models of its V4 series: DeepSeek-V4-Pro and DeepSeek-V4-Flash, available on Hugging Face. Both are Mixture of Experts models with a 1 million token context window, following their V3.2 release last December. The models achieve near-frontier performance while being offered at a fraction of the price of comparable models from OpenAI or Anthropic. This marks a significant step in making high-performance AI more accessible and affordable.
I'm a software engineer who works with LLMs professionally (Forward Deployed Engineer at TrueFoundry). Over the past year I built up implementations of five LLM architectures from scratch and wrote a book around them.The progression:- Ch1: Vanilla encoder-decoder transformer (English to Hindi translation) - Ch2: GPT-2 124M from scratch, loads real OpenAI pretrained weights - Ch3: Llama 3.2-3B by swapping 4 components of GPT-2 (LayerNorm to RMSNorm, learned PE to RoPE, GELU to SwiGLU,
Mihir Prabhudesai, Aryan Satpathy, Yangmin Li et al. - We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in...