Integrations
Integrating with LlamaIndex
You can connect the Fastino Personalization API to LlamaIndex to give your LLM applications direct access to user-specific memories, preferences, and summaries.
LlamaIndex acts as an orchestration layer for knowledge retrieval — and Fastino becomes its personalized data source, supplying context-aware embeddings and summaries for each user.
Overview
The integration lets you use Fastino as a custom Retriever and Memory backend for LlamaIndex.
This enables your LlamaIndex app to:
Retrieve top-k relevant user memories and context
Personalize reasoning and tone per user
Learn continuously as new data is ingested
Ground model output on actual user history
Architecture
Flow overview:
User interacts with your LlamaIndex-powered agent.
The agent requests Fastino’s memory snippets via the
/chunksendpoint.Retrieved data is injected into LlamaIndex’s context window as a personalized retriever node.
New messages or events are sent back to Fastino through
/ingestorupdate_memory.
Prerequisites
Valid Fastino API key
Installed packages:
At least one user registered via
/registerOptional: existing LlamaIndex document store (for non-personal data)
Example Setup
Below is a minimal working integration showing how to use Fastino as a Retriever inside LlamaIndex.
This retriever can now be passed directly into your LlamaIndex ServiceContext for query pipelines.
Integration with Existing Index
You can combine Fastino’s personalized retriever with your existing VectorStoreIndex for hybrid reasoning:
Here, LlamaIndex merges Fastino’s personal memory snippets with your project documents, creating unified reasoning grounded in both personal and organizational context.
Writing Back to Fastino
To keep personalization current, you can push new information (summaries, corrections, or reflections) back into Fastino via /ingest or the MCP update_memory tool.
This ensures Fastino continues learning as your agent interacts with users.
Personalization Use Cases
Use Case | Description |
|---|---|
Adaptive Retrieval | Pull user-specific snippets to refine context for each query |
Personalized Reasoning | Combine world model context with external data sources |
Ongoing Memory Sync | Keep LlamaIndex retrieval aligned with user’s changing preferences |
Feedback-Driven Learning | Store conversation outcomes as new documents in Fastino |
Authentication
Use the same headers as any Fastino API call:
Tip: Store your API key as an environment variable:
Example: End-to-End Query Flow
User asks your app: “What’s the best time for me to focus tomorrow?”
LlamaIndex queries Fastino using
FastinoRetriever.Fastino returns snippets like:
“Ash typically focuses from 9–12 PT and avoids meetings before lunch.”The LLM uses this context to generate a tailored recommendation.
The agent logs the interaction to Fastino using
ingest.
Best Practices
Always include
user_idin every retrieval and ingestion call.Cache retrieved snippets per user for short-term context reuse.
Combine Fastino retrieval with existing indexes for hybrid RAG.
Use deduplication (
options.dedupe) for repeated sync events.
Summary
Integrating Fastino with LlamaIndex bridges personal memory with general knowledge retrieval — allowing your agents to reason contextually about each individual user.
With this setup, you can build adaptive assistants, personalized copilots, and RAG pipelines that evolve with every interaction.
Next, continue to Integrating with OpenAI Realtime to learn how to stream personalized context into real-time conversations.
Join our Discord Community