Integrating with LlamaIndex

You can connect the Fastino Personalization API to LlamaIndex to give your LLM applications direct access to user-specific memories, preferences, and summaries.

LlamaIndex acts as an orchestration layer for knowledge retrieval — and Fastino becomes its personalized data source, supplying context-aware embeddings and summaries for each user.

Overview

The integration lets you use Fastino as a custom Retriever and Memory backend for LlamaIndex.

This enables your LlamaIndex app to:

Retrieve top-k relevant user memories and context
Personalize reasoning and tone per user
Learn continuously as new data is ingested
Ground model output on actual user history

Architecture

Flow overview:

User interacts with your LlamaIndex-powered agent.
The agent requests Fastino’s memory snippets via the /chunks endpoint.
Retrieved data is injected into LlamaIndex’s context window as a personalized retriever node.
New messages or events are sent back to Fastino through /ingest or update_memory.

Prerequisites

Valid Fastino API key
Installed packages:
At least one user registered via /register
Optional: existing LlamaIndex document store (for non-personal data)

Example Setup

Below is a minimal working integration showing how to use Fastino as a Retriever inside LlamaIndex.

from llama_index import VectorStoreIndex, ServiceContext, SimpleDirectoryReader
from llama_index.schema import Document
from llama_index.retrievers import BaseRetriever
import requests

FASTINO_API = "https://api.fastino.ai/"
FASTINO_KEY = "sk_live_123"
HEADERS = {"Authorization": f"x-api-key {FASTINO_KEY}", "Content-Type": "application/json"}

class FastinoRetriever(BaseRetriever):
    """Custom retriever that pulls personalized context from Fastino."""
    def __init__(self, user_id: str, top_k: int = 5):
        self.user_id = user_id
        self.top_k = top_k

    def retrieve(self, query: str):
        payload = {
            "user_id": self.user_id,
            "conversation": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": query}
            ],
            "top_k": self.top_k
        }
        r = requests.post(f"{FASTINO_API}/personalization/profile/rag", json=payload, headers=HEADERS)
        results = r.json().get("results", [])
        return [Document(text=res["excerpt"]) for res in results]

# Example use
retriever = FastinoRetriever(user_id="usr_42af7c")
context_docs = retriever.retrieve("When does Ash prefer meetings?")
for doc in context_docs:
    print(doc.text)

This retriever can now be passed directly into your LlamaIndex ServiceContext for query pipelines.

Integration with Existing Index

You can combine Fastino’s personalized retriever with your existing VectorStoreIndex for hybrid reasoning:

from llama_index import QueryEngine

personal_retriever = FastinoRetriever("usr_42af7c", top_k=3)
project_index = VectorStoreIndex.from_documents(SimpleDirectoryReader("project_docs").load_data())

query_engine = QueryEngine.from_args(
    retrievers=[personal_retriever, project_index.as_retriever()],
)

response = query_engine.query("What’s Ash’s preferred work schedule and current project status?")
print(response)

Here, LlamaIndex merges Fastino’s personal memory snippets with your project documents, creating unified reasoning grounded in both personal and organizational context.

Writing Back to Fastino

To keep personalization current, you can push new information (summaries, corrections, or reflections) back into Fastino via /ingest or the MCP update_memory tool.

def log_interaction(user_id: str, note: str):
    payload = {
        "user_id": user_id,
        "source": "llamaindex",
        "documents": [
            {
                "doc_id": "note_20251027",
                "kind": "reflection",
                "title": "Post-chat summary",
                "content": note
            }
        ]
    }
    requests.post(f"{FASTINO_API}/ingest", json=payload, headers=HEADERS)

log_interaction("usr_42af7c", "Ash prefers asynchronous updates after noon meetings.")

This ensures Fastino continues learning as your agent interacts with users.

Personalization Use Cases

Use Case	Description
Adaptive Retrieval	Pull user-specific snippets to refine context for each query
Personalized Reasoning	Combine world model context with external data sources
Ongoing Memory Sync	Keep LlamaIndex retrieval aligned with user’s changing preferences
Feedback-Driven Learning	Store conversation outcomes as new documents in Fastino

Authentication

Use the same headers as any Fastino API call:

Authorization: x-api-key: pk-...
Content-Type: application/json

Tip: Store your API key as an environment variable:
export FASTINO_API_KEY=sk_live_456

Example: End-to-End Query Flow

User asks your app: “What’s the best time for me to focus tomorrow?”
LlamaIndex queries Fastino using FastinoRetriever.
Fastino returns snippets like:
“Ash typically focuses from 9–12 PT and avoids meetings before lunch.”
The LLM uses this context to generate a tailored recommendation.
The agent logs the interaction to Fastino using ingest.

Best Practices

Always include user_id in every retrieval and ingestion call.
Cache retrieved snippets per user for short-term context reuse.
Combine Fastino retrieval with existing indexes for hybrid RAG.
Use deduplication (options.dedupe) for repeated sync events.

Summary

Integrating Fastino with LlamaIndex bridges personal memory with general knowledge retrieval — allowing your agents to reason contextually about each individual user.

With this setup, you can build adaptive assistants, personalized copilots, and RAG pipelines that evolve with every interaction.

Next, continue to Integrating with OpenAI Realtime to learn how to stream personalized context into real-time conversations.

Integrating with MCP

Integrating with LangChain

Join our Discord Community

On this page

Title