Together AI dropped eight papers at ICML 2026, but the one that should make you sit up is TTT-Discover: an open 120B model that beats the best human experts across mathematics, GPU kernels, competitive algorithms, and biology, for roughly $500 per problem. Every prior result at this level leaned on closed frontier models you can't inspect or reproduce.
TTT-Discover doesn't just sample a frozen model a thousand times and keep the best output. It runs reinforcement learning at test time on the single problem in front of it. Each attempt becomes training data for the next, so the model improves as it works. With the same sampling budget, plain best-of-N never catches up. The method set a tighter bound on a 60-year-old Erdős problem in mathematics, discovered a GPU kernel faster than the best prior submission, and did it all with one unchanged recipe and an open model.
Frontier Agents That Actually Ship
Alongside TTT-Discover, Together introduced DSGym and ThunderAgent. DSGym standardizes data-science agent evaluation across 1,000+ tasks in 10+ domains, unified under one API, and closes the loophole where agents solve tasks without ever touching the data. The same environment runs in reverse as a training engine: trajectory generation and synthetic query pipelines produce execution-verified data, which turned a 4B model into a state-of-the-art open-source data-science agent with zero human labeling.
ThunderAgent tackles a different bottleneck. Parallel agent workloads collapse under load because the inference engine treats each step of a multi-turn workflow as an isolated request, inflating latency by up to 7.14x under load. ThunderAgent makes the workflow a first-class object the scheduler can reason about end to end, delivering 1.5 to 3.6x higher serving throughput with three lines of code to adopt, and 1.8 to 3.9x faster RL rollouts.
From Algorithms to Kernels, All in the Stack
Together's ICML work doesn't stop at agents. Aurora, their adaptive speculative decoding paper, achieves 1.25x speedup even as traffic patterns shift, and already ships today as the ATLAS speculator in production. On the systems side, Untied Ulysses pushes context parallelism to 5 million tokens, and Opportunistic Expert Activation yields up to 39% gains in mixture-of-experts serving.
The real payoff comes when these layers feed back into production. Aurora ships now, DSGym and ThunderAgent are already part of Together's inference stack, and the kernel research ongoing at Together suggests the next round will squeeze another 2x from the same hardware.
Source: Together AI at ICML 2026: frontier research across the full stack
Domain: together.ai
Comments load interactively on the live page.