Meta FAIR released Code World Model (CWM), a 32-billion-parameter dense decoder-only LLM that injects world modeling into code generation by training on execution traces and long-horizon agent–environment interactions—not just static source text.
What’s new: learning code by predicting execution?
CWM mid-trains on two large families of observation–action trajectories: (1) Python interpreter traces that record local variable states after each executed line, and (2) agentic interactions inside Dockerized repositories that capture edits, shell commands, and test feedback. This grounding is intended to teach semantics (how state evolves) rather than only syntax.
To scale collection, the research team built executable repository images from thousands of GitHub projects and foraged multi-step trajectories via a software-engineering agent (“ForagerAgent”). The release reports ~3M trajectories across ~10k images and 3.15k repos, with mutate-fix and issue-fix variants.

Model and context window
CWM is a dense, decoder-only Transformer (no MoE) with 64 layers, GQA (48Q/8KV), SwiGLU, RMSNorm, and Scaled RoPE. Attention alternates local 8k and global 131k sliding-window blocks, enabling 131k tokens effective context; training uses document-causal masking.
Training recipe (pre → mid → post)
- General pretraining: 8T tokens (code-heavy) at 8k context.
- Mid-training: +5T tokens, long-context (131k) with Python execution traces, ForagerAgent data, PR-derived diffs, IR/compilers, Triton kernels, and Lean math.
- Post-training: 100B-token SFT for instruction + reasoning, then multi-task RL (~172B-token) across verifiable coding, math, and multi-turn SWE environments using a GRPO-style algorithm and a minimal toolset (bash/edit/create/submit).
- Quantized inference fits on a single 80 GB H100.
Benchmarks
The research team cites the following pass@1 / scores (test-time scaling noted where applicable):
- SWE-bench Verified: 65.8% (with test-time scaling).
- LiveCodeBench-v5: 68.6%; LCB-v6: 63.5%.
- Math-500: 96.6%; AIME-24: 76.0%; AIME-25: 68.2%.
- CruxEval-Output: 94.3%.
The research team position CWM as competitive with similarly sized open-weights baselines and even with larger or closed models on SWE-bench Verified.
For context on SWE-bench Verified’s task design and metrics, see the official benchmark resources.


Why world modeling matters for code?
The release emphasizes two operational capabilities:
- Execution-trace prediction: given a function and a trace start, CWM predicts stack frames (locals) and the executed line at each step via a structured format—usable as a “neural debugger” for grounded reasoning without live execution.
- Agentic coding: multi-turn reasoning with tool use against real repos, verified by hidden tests and patch similarity rewards; the setup trains the model to localize faults and generate end-to-end patches (git diff) rather than snippets.
Some details worth noting
- Tokenizer: Llama-3 family with reserved control tokens; reserved IDs are used to demarcate trace and reasoning segments during SFT.
- Attention layout: the 3:1 local:global interleave is repeated across the depth; long-context training occurs at large token batch sizes to stabilize gradients.
- Compute scaling: learning-rate/batch size schedules are derived from internal scaling-law sweeps tailored for long-context overheads.
Summary
CWM is a pragmatic step toward grounded code generation: Meta ties a 32B dense transformer to execution-trace learning and agentic, test-verified patching, releases intermediate/post-trained checkpoints, and gates usage under the FAIR Non-Commercial Research License—making it a useful platform for reproducible ablations on long-context, execution-aware coding without conflating research with production deployment.
Check out the Paper, GitHub Page, and Model on Hugging Face. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.