ai
  • Crypto News
  • Ai
  • eSports
  • Bitcoin
  • Ethereum
  • Blockchain
Home»Ai»Meta Superintelligence Labs Introduces REFRAG: Scaling RAG with 16× Longer Contexts and 31× Faster Decoding
Ai

Meta Superintelligence Labs Introduces REFRAG: Scaling RAG with 16× Longer Contexts and 31× Faster Decoding

Share
Facebook Twitter LinkedIn Pinterest Email

A team of researchers from Meta Superintelligence Labs, National University of Singapore and Rice University has unveiled REFRAG (REpresentation For RAG), a decoding framework that rethinks retrieval-augmented generation (RAG) efficiency. REFRAG extends LLM context windows by 16× and achieves up to a 30.85× acceleration in time-to-first-token (TTFT) without compromising accuracy.

Why is long context such a bottleneck for LLMs?

The attention mechanism in large language models scales quadratically with input length. If a document is twice as long, the compute and memory cost can grow fourfold. This not only slows inference but also increases the size of the key-value (KV) cache, making large-context applications impractical in production systems. In RAG settings, most retrieved passages contribute little to the final answer, but the model still pays the full quadratic price to process them.

How does REFRAG compress and shorten context?

REFRAG introduces a lightweight encoder that splits retrieved passages into fixed-size chunks (e.g., 16 tokens) and compresses each into a dense chunk embedding. Instead of feeding thousands of raw tokens, the decoder processes this shorter sequence of embeddings. The result is a 16× reduction in sequence length, with no change to the LLM architecture.

https://arxiv.org/pdf/2509.01092

How is acceleration achieved?

By shortening the decoder’s input sequence, REFRAG reduces the quadratic attention computation and shrinks the KV cache. Empirical results show 16.53× TTFT acceleration at k=16 and 30.85× acceleration at k=32, far surpassing prior state-of-the-art CEPE (which achieved only 2–8×). Throughput also improves by up to 6.78× compared to LLaMA baselines.

How does REFRAG preserve accuracy?

A reinforcement learning (RL) policy supervises compression. It identifies the most information-dense chunks and allows them to bypass compression, feeding raw tokens directly into the decoder. This selective strategy ensures that critical details—such as exact numbers or rare entities—are not lost. Across multiple benchmarks, REFRAG maintained or improved perplexity compared to CEPE while operating at far lower latency.

What do the experiments reveal?

REFRAG was pretrained on 20B tokens from the SlimPajama corpus (Books + arXiv) and tested on long-context datasets including Book, Arxiv, PG19, and ProofPile. On RAG benchmarks, multi-turn conversation tasks, and long-document summarization, REFRAG consistently outperformed strong baselines:

  • 16× context extension beyond standard LLaMA-2 (4k tokens).
  • ~9.3% perplexity improvement over CEPE across four datasets.
  • Better accuracy in weak retriever settings, where irrelevant passages dominate, due to the ability to process more passages under the same latency budget.
https://arxiv.org/pdf/2509.01092

Summary

REFRAG shows that long-context LLMs don’t have to be slow or memory-hungry. By compressing retrieved passages into compact embeddings, selectively expanding only the important ones, and rethinking how RAG decoding works, Meta Superintelligence Labs has made it possible to process much larger inputs while running dramatically faster. This makes large-context applications—like analyzing entire reports, handling multi-turn conversations, or scaling enterprise RAG systems—not only feasible but efficient, without compromising accuracy.


FAQs

Q1. What is REFRAG?
REFRAG (REpresentation For RAG) is a decoding framework from Meta Superintelligence Labs that compresses retrieved passages into embeddings, enabling faster and longer-context inference in LLMs.

Q2. How much faster is REFRAG compared to existing methods?
REFRAG delivers up to 30.85× faster time-to-first-token (TTFT) and 6.78× throughput improvement compared to LLaMA baselines, while outperforming CEPE.

Q3. Does compression reduce accuracy?
No. A reinforcement learning policy ensures critical chunks remain uncompressed, preserving key details. Across benchmarks, REFRAG maintained or improved accuracy relative to prior methods.

Q4. Where will the code be available?
Meta Superintelligence Labs will release REFRAG on GitHub at facebookresearch/refrag


Check out the PAPER here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

From Pretraining to Post-Training: Why Language Models Hallucinate and How Evaluation Methods Reinforce the Problem

septembre 7, 2025

AI and machine learning for engineering design | MIT News

septembre 7, 2025

Tilde AI Releases TildeOpen LLM: An Open-Source Large Language Model with Over 30 Billion Parameters and Support Most European Languages

septembre 7, 2025

Implementing DeepSpeed for Scalable Transformers: Advanced Training with Gradient Checkpointing and Parallelism

septembre 7, 2025
Add A Comment

Comments are closed.

Top Posts

SwissCryptoDaily.ch delivers the latest cryptocurrency news, market insights, and expert analysis. Stay informed with daily updates from the world of blockchain and digital assets.

We're social. Connect with us:

Facebook X (Twitter) Instagram Pinterest YouTube
Top Insights

XRP Price Eyes Breakout Zone – Can Key Hurdles Unlock Bigger Rally?

septembre 8, 2025

NFT Sales Fall 20% To +$102M, As Crypto Market Cools Down

septembre 8, 2025

Ripple Highlights 3 Key Drivers Behind Institutional Digital Asset Adoption Surge

septembre 8, 2025
Get Informed

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

Facebook X (Twitter) Instagram Pinterest
  • About us
  • Get In Touch
  • Cookies Policy
  • Privacy-Policy
  • Terms and Conditions
© 2025 Swisscryptodaily.ch.

Type above and press Enter to search. Press Esc to cancel.