ai
  • Crypto News
  • Ai
  • eSports
  • Bitcoin
  • Ethereum
  • Blockchain
Home»Ai»How to Build an Advanced AI Agent with Summarized Short-Term and Vector-Based Long-Term Memory
Ai

How to Build an Advanced AI Agent with Summarized Short-Term and Vector-Based Long-Term Memory

Share
Facebook Twitter LinkedIn Pinterest Email

In this tutorial, we walk you through building an advanced AI Agent that not only chats but also remembers. We start from scratch and demonstrate how to combine a lightweight LLM, FAISS vector search, and a summarization mechanism to create both short-term and long-term memory. By working together with embeddings and auto-distilled facts, we can craft an agent that adapts to our instructions, recalls important details in future conversations, and intelligently compresses context, ensuring the interaction remains smooth and efficient. Check out the FULL CODES here.

!pip -q install transformers accelerate bitsandbytes sentence-transformers faiss-cpu


import os, json, time, uuid, math, re
from datetime import datetime
import torch, faiss
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
from sentence_transformers import SentenceTransformer
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

We begin by installing the essential libraries and importing all the required modules for our agent. We set up the environment to determine whether we are using a GPU or a CPU, allowing us to run the model efficiently. Check out the FULL CODES here.

def load_llm(model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0"):
   try:
       if DEVICE=="cuda":
           bnb=BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.bfloat16,bnb_4bit_quant_type="nf4")
           tok=AutoTokenizer.from_pretrained(model_name, use_fast=True)
           mdl=AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb, device_map="auto")
       else:
           tok=AutoTokenizer.from_pretrained(model_name, use_fast=True)
           mdl=AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, low_cpu_mem_usage=True)
       return pipeline("text-generation", model=mdl, tokenizer=tok, device=0 if DEVICE=="cuda" else -1, do_sample=True)
   except Exception as e:
       raise RuntimeError(f"Failed to load LLM: {e}")

We define a function to load our language model. We set it up so that if a GPU is available, we use 4-bit quantization for efficiency; otherwise, we fall back to the CPU with optimized settings. This ensures we can generate text smoothly regardless of the hardware we are running on. Check out the FULL CODES here.

class VectorMemory:
   def __init__(self, path="/content/agent_memory.json", dim=384):
       self.path=path; self.dim=dim; self.items=[]
       self.embedder=SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2", device=DEVICE)
       self.index=faiss.IndexFlatIP(dim)
       if os.path.exists(path):
           data=json.load(open(path))
           self.items=data.get("items",[])
           if self.items:
               X=torch.tensor([x["emb"] for x in self.items], dtype=torch.float32).numpy()
               self.index.add(X)
   def _emb(self, text):
       v=self.embedder.encode([text], normalize_embeddings=True)[0]
       return v.tolist()
   def add(self, text, meta=None):
       e=self._emb(text); self.index.add(torch.tensor([e]).numpy())
       rec={"id":str(uuid.uuid4()),"text":text,"meta":meta or {}, "emb":e}
       self.items.append(rec); self._save(); return rec["id"]
   def search(self, query, k=5, thresh=0.25):
       if len(self.items)==0: return []
       q=self.embedder.encode([query], normalize_embeddings=True)
       D,I=self.index.search(q, min(k, len(self.items)))
       out=[]
       for d,i in zip(D[0],I[0]):
           if i==-1: continue
           if d>=thresh: out.append((d,self.items[i]))
       return out
   def _save(self):
       slim=[{k:v for k,v in it.items()} for it in self.items]
       json.dump({"items":slim}, open(self.path,"w"), indent=2)

We create a VectorMemory class that gives our agent long-term memory. We store past interactions as embeddings using MiniLM and index them with FAISS, allowing us to search and recall relevant information later. Each memory is saved to disk, enabling the agent to retain its memory across sessions. Check out the FULL CODES here.

def now_iso(): return datetime.now().isoformat(timespec="seconds")
def clamp(txt, n=1600): return txt if len(txt)self.max_turns:
           convo="\n".join([f"{r}: {t}" for r,t in self.turns])
           s=self._gen(SUMMARIZE_PROMPT(clamp(convo, 3500)), max_new_tokens=180, temp=0.2)
           self.summary=s; self.turns=self.turns[-4:]
   def recall(self, query, k=5):
       hits=self.mem.search(query, k=k)
       return "\n".join([f"- ({d:.2f}) {h['text']} [meta={h['meta']}]" for d,h in hits])
   def ask(self, user):
       self.turns.append(("user", user))
       saved, memline = self._distill_and_store(user)
       mem_ctx=self.recall(user, k=6)
       prompt=self._chat_prompt(user, mem_ctx)
       reply=self._gen(prompt)
       self.turns.append(("assistant", reply))
       self._maybe_summarize()
       status=f"💾 memory_saved: {saved}; " + (f"note: {memline}" if saved else "note: -")
       print(f"\nUSER: {user}\nASSISTANT: {reply}\n{status}")
       return reply

We bring everything together into the MemoryAgent class. We design the agent to generate responses with context, distill important facts into long-term memory, and periodically summarize conversations to manage short-term context. With this setup, we create an assistant that remembers, recalls, and adapts to our interactions with it. Check out the FULL CODES here.

agent=MemoryAgent()


print("✅ Agent ready. Try these:\n")
agent.ask("Hi! My name is Nicolaus, I prefer being called Nik. I'm preparing for UPSC in 2027.")
agent.ask("Also, I work at  Visa in analytics and love concise answers.")
agent.ask("What's my exam year and how should you address me next time?")
agent.ask("Reminder: I like agentic RAG tutorials with single-file Colab code.")
agent.ask("Given my prefs, suggest a study focus for this week in one paragraph.")

We instantiate our MemoryAgent and immediately exercise it with a few messages to seed long-term memories and verify recall. We confirm it remembers our preferred name and exam year, adapts replies to our concise style, and uses past preferences (agentic RAG, single-file Colab) to tailor study guidance in the present.

In conclusion, we see how powerful it is when we give our AI Agent the ability to remember. We now have an agent that stores key details, recalls them when relevant, and summarizes conversations to stay efficient. This approach keeps our interactions contextual and evolving, making the agent feel more personal and intelligent with each exchange. With this foundation, we are ready to extend memory further, explore richer schemas, and experiment with more advanced memory-augmented agent designs.


Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Transforming CX with embedded real-time analytics 

septembre 4, 2025

Google DeepMind Finds a Fundamental Bug in RAG: Embedding Limits Break Retrieval at Scale

septembre 4, 2025

Imagining the future of banking with agentic AI

septembre 4, 2025

The Download: Unnerving AI avatars, and Trump’s climate gift to China

septembre 4, 2025
Add A Comment

Comments are closed.

Top Posts

SwissCryptoDaily.ch delivers the latest cryptocurrency news, market insights, and expert analysis. Stay informed with daily updates from the world of blockchain and digital assets.

We're social. Connect with us:

Facebook X (Twitter) Instagram Pinterest YouTube
Top Insights

XRP Poised For Amazon-Like Boom? Analyst Predicts $200 Rally

septembre 4, 2025

NFT Marketplace Rarible Adds Support For LightLink Network

septembre 4, 2025

Bitcoin Sell-off To $108K Possible As Traders Choose Bonds

septembre 4, 2025
Get Informed

Subscribe to Updates

Get the latest creative news from FooBar about art, design and business.

Facebook X (Twitter) Instagram Pinterest
  • About us
  • Get In Touch
  • Cookies Policy
  • Privacy-Policy
  • Terms and Conditions
© 2025 Swisscryptodaily.ch.

Type above and press Enter to search. Press Esc to cancel.