This guide walks you through setting up your Fastino Personalization API workspace, generating API keys, and running your first end-to-end test using sandbox mode.
Personalization API Documentation
Overview
Our personalization API provides a comprehensive solution for building AI agents with deep user understanding. Unlike traditional memory systems, we offer:
Agentic Search Over Noisy Data - Self-evolving tree of knowledge about users that continuously runs on servers, going beyond the scope of standard vector embeddings
Powerful Query Endpoint - /query endpoint wrappable as a tool call for complex questions which require higher intelligence, not just simple chunk retrieval
Context-Aware Chunks - /chunks endpoint accepts message history (not just questions) for more contextually relevant results
Deterministic Profile Summaries - Natural language summaries perfect for system prompts, so your agent can start with a good understanding of the user.
Flexible Ingestion - Accept any data format and automatically extract memories
Privacy-Focused - All data is anonymized using GLiNER-2 before storage
Base URL: https://api.fastino.ai
Before You Start
Requirements
Before integrating with the Personalization API, ensure you have:
API access token - Obtain from the Fastino Developer Portal
JSON-capable HTTP client - curl, Postman, Python requests, or similar
ISO 8601 UTC timestamps - For all time-based fields (e.g., 2025-11-11T14:30:00Z)
First, register a user to initialize their personalization profile. This triggers our multi-stage personalization workflow:
Endpoint:POST /register
Request:
{"email":"me@pioneer.ai","purpose":"This will be used in an AI SDR agent to help understand the user's communication style and priorities","traits":{"name":"Ash Lewis","locale":"en-US","timezone":"America/Los_Angeles","linkedin_url":"https://www.linkedin.com/in/ashlewis","twitter_url":"https://twitter.com/ashlewis","website":"https://ash.example.com","notes":"Founder/engineer; prefers concise communications"}}
Feed any user data into the system - conversations, documents, emails, notes, etc. Our system automatically extracts and stores relevant memories.
Endpoint:POST /ingest
Request:
{"user_id":"usr_42af7c",// returned from /register route."source":"gmail","message_history":[{"role":"user","content":"I love sushi and usually eat late dinners around 9pm","timestamp":"2025-11-01T14:30:00Z"},{"role":"assistant","content":"Got it! I'll keep that in mind for restaurant recommendations.","timestamp":"2025-11-01T14:30:15Z"}],"documents":[{"content":"Q4 2025 Goals: Launch new product, hire 3 engineers, reach $1M ARR","title":"Quarterly Goals Q4","document_type":"document","doc_id":"doc-q4-goals","created_at":"2025-10-01T00:00:00Z"}],"options":{"dedupe":true}}
Retrieve a natural language summary of the user - perfect for system prompts.
Endpoint:GET /summary
Request:
GET /summary?user_id=usr_42af7c&max_chars=1000
Response:
{"user_id":"usr_42af7c","generated_at":"2025-11-11T16:05:00Z","purpose":null,"summary":"Ash Lewis is a founder and engineer based in San Francisco (PST timezone). Ash prefers concise communications and values efficiency. Work style: Deep focus blocks from 9am-12pm, prefers meetings after 1pm. Currently focused on launching a new product and scaling the engineering team. Enjoys sushi and typically has late dinners around 9pm.","cached":true}
Usage:
# Add to your agent's system prompt
summary = get_profile_summary(user_id)system_prompt = f"""You are a helpful assistant for {summary['summary']}
Keep their preferences and work styleinmind when responding."""
Key Parameters:
user_id (required)
max_chars (optional, default 1000) - Truncate summary to this length
Deterministic output for the same input (unless underlying data changes)
Low latency - perfect for every agent session
Generated from Stage 2 data
4. Retrieve Relevant Chunks
Get contextually relevant memory chunks based on the current conversation. Use this at every agent turn to ground responses in user-specific context.
Endpoint:POST /chunks
Request:
{"user_id":"usr_42af7c","history":[{"role":"system","content":"You help with restaurant recommendations"},{"role":"user","content":"I'm hungry, what should I eat tonight?"}],"k":6,"similarity_threshold":0.25}
Response:
{"chunks":[{"id":"mem_789","text":"User loves sushi and usually eats late dinners around 9pm","score":0.82,"source":"memory","created_at":"2025-11-01T14:30:00Z","updated_at":"2025-11-01T14:30:00Z"},{"id":"stage3_5","text":"Q: What are the user's food preferences?\nA: The user enjoys sushi, Italian cuisine, and prefers restaurants in the Mission district. Typically dines late (8-9pm) and values quick service.","score":0.78,"source":"stage3","created_at":"2025-10-15T10:00:00Z","updated_at":"2025-10-15T10:00:00Z","question_index":5,"question":"What are the user's food preferences?","answer":"The user enjoys sushi, Italian cuisine, and prefers restaurants in the Mission district. Typically dines late (8-9pm) and values quick service."}],"used_query":"hungry eat tonight food restaurant","debug":{"stage3_count":1,"memory_count":1,"total_candidates":8,"threshold":0.25,"embedding_time_ms":45,"search_time_ms":23,"total_time_ms":68}}
Key Features:
Message History Input - Pass conversation context, not just a question
Unified Retrieval - Searches both Stage-3 Q&A and conversation memories
Source Attribution - Know whether chunks come from memories or agentic search
Low Latency - Fast vector search, no LLM calls
Usage Pattern:
# At every agent turnchunks = get_relevant_chunks(user_id,conversation_history)if chunks:context = "\n\n".join([c["text"]for c inchunks["chunks"]])user_message += f"\n\nRelevant context:\n{context}"
Key Parameters:
user_id (required)
history (required) - Recent conversation turns
k (optional, default 6) - Number of chunks to return
max_context_turns (optional, default 4) - How many recent turns to consider
exclude_chunk_ids (optional) - Skip specific chunks
5. Query User Profile
Ask complex natural-language questions about the user. This is a high-latency, high-performance endpoint that can run agentic search when needed.
Endpoint:POST /query
Request:
{"user_id":"usr_42af7c","question":"Who are the most important people in the user's professional network? For each person, describe their relationship, communication cadence, and what they typically discuss.","use_cache":true}
Response:
{"user_id":"usr_42af7c","question":"Who are the most important people in the user's professional network?...","answer":"Based on email analysis, here are the key relationships:\n\n1. **David Kimball (Recruiter)** - Long-term trusted career advisor. Communication: 2-3x per month with spikes during job searches. Topics: AI/ML opportunities, market intelligence, career strategy.\n\n2. **George Maloney (Fastino Co-Founder)** - Current employer relationship. Communication: Daily during onboarding, now sporadic milestone-driven. Topics: Strategic decisions, company vision.\n\n3. **Sarah Chen (Technical Co-founder)** - Close collaborator. Communication: Multiple times daily. Topics: Architecture decisions, code reviews, product strategy...","cached":false}
Key Features:
Cache-First - Attempts to answer from existing data (Stage 2 + Stage 3 + memories)
Agentic Fallback - If cache can't answer, runs full document search agent
Automatic Persistence - New answers are saved to Stage 3 for future caching
Complex Queries - Can handle multi-part questions requiring synthesis
Usage as Tool Call:
tools = [{"name":"query_user_profile","description":"Ask detailed questions about the user's preferences, relationships, work style, or any personal information","parameters":{"question":"The natural language question to ask"}}]
# Agent decides when to call thistool
# Example:User asks "Schedule a meeting with my most important contacts"
# Agent calls:query_user_profile("Who are the user's most important professional contacts?")
Key Parameters:
user_id (required)
question (required, non-empty) - Natural language question
use_cache (optional, default true) - Set to false to force fresh agent run
Performance:
Cached response: ~100-500ms
Agent run: ~5-30 seconds (depending on complexity)
Integration Patterns
Pattern 1: System Prompt Enhancement
def create_agent_prompt(user_id):
# Get deterministic profile summarysummary = get_profile_summary(user_id)returnf"""You are a helpful AI assistant.
User Context:{summary['summary']}Keep the user's preferences, work style, and context in mind when responding.
"""
Pattern 2: Contextual Grounding (Every Turn)
def process_message(user_id,conversation_history,new_message):
# Add newmessage to historyconversation_history.append({"role":"user","content":new_message})
# Get relevant chunkschunks_response = get_relevant_chunks(user_id=user_id,history=conversation_history,k=6)
# Append context ifrelevant chunks foundifchunks_response['chunks']:context = "\n".join([f"- {chunk['text']}"
for chunk inchunks_response['chunks']])enhanced_message = f"{new_message}\n\n[Relevant context:\n{context}]"else:enhanced_message = new_message
# Send to LLMreturnllm.chat(enhanced_message)
Pattern 3: Tool-Augmented Agent
tools = [{"name":"query_user_profile","description":"Ask detailed questions about the user when you need specific information not in the current context","parameters":{"type":"object","properties":{"question":{"type":"string","description":"Natural language question about the user"}},"required":["question"]}}]def query_user_profile_tool(question: str):response = query_user_profile(user_id=current_user_id,question=question,use_cache=True)returnresponse['answer']
# Agent automatically calls thiswhen it needs more context
# Example:User says "Book dinner with my team"
# Agent calls:query_user_profile("Who is on the user's team and what are their dietary preferences?")
Pattern 4: Continuous Learning
def after_conversation(user_id,messages):
# Ingest conversation forlearningingest_response = ingest_data(user_id=user_id,message_history=[{"role":msg['role'],"content":msg['content'],"timestamp":msg['timestamp']}formsginmessages],options={"dedupe":True})
# System automatically:
# 1.Extracts facts and updates memories
# 2.Triggers Stage 3at memory thresholds
# 3.Evolves user understanding over time
Complete Example: Restaurant Recommendation Agent
importrequestsfromdatetime import datetime
BASE_URL = "https://api.fastino.ai"API_KEY = "<your_api_key>"def get_headers():return{"x-api-key":f"{API_KEY}","Content-Type":"application/json"}
# 1.Initialize user(one-time)def register_user(user_id,email):response = requests.post(f"{BASE_URL}/register",headers=get_headers(),json={"user_id":user_id,"purpose":"Restaurant recommendation agent that suggests dining options based on user preferences, dietary restrictions, and past experiences","traits":{"name":"User Name","timezone":"America/Los_Angeles"}})returnresponse.json()
# 2.Get profile summary forsystem promptdef get_system_prompt(user_id):response = requests.get(f"{BASE_URL}/summary",headers=get_headers(),params={"user_id":user_id,"max_chars":500})summary = response.json()['summary']returnf"""You are a restaurant recommendation assistant.
User Profile:{summary}Provide personalized restaurant suggestions based on the user's preferences, dietary restrictions, and past experiences.
"""
# 3.Process user message withcontextdef process_message(user_id,conversation_history,new_message):
# Get relevant contextchunks_response = requests.post(f"{BASE_URL}/chunks",headers=get_headers(),json={"user_id":user_id,"history":conversation_history + [{"role":"user","content":new_message}],"k":5})chunks = chunks_response.json()['chunks']
# Build enhanced messageif chunks:context = "\n".join([f"- {c['text']}" for c in chunks])
enhanced = f"{new_message}\n\n[Context: {context}]"else:enhanced = new_message
# Send to your LLM
# ...(your LLMcallhere)returnenhanced
# 4.Learn from conversationdef save_conversation(user_id,messages):requests.post(f"{BASE_URL}/ingest",headers=get_headers(),json={"user_id":user_id,"source":"restaurant_agent","message_history":[{"role":msg['role'],"content":msg['content'],"timestamp":msg.get('timestamp',datetime.utcnow().isoformat() + 'Z')}formsginmessages],"options":{"dedupe":True}})
# Usageuser_id = "user_123"
# First time setupregister_user(user_id,"user@example.com")
# Get system promptsystem_prompt = get_system_prompt(user_id)
# Chat loopconversation = []while True:user_input = input("You: ")ifuser_input.lower() == 'quit':break
# Process withcontextenhanced_message = process_message(user_id,conversation,user_input)
# Get response from your LLM
# assistant_response = your_llm_call(system_prompt,enhanced_message)
# Update conversationconversation.append({"role":"user","content":user_input})
# conversation.append({"role":"assistant","content":assistant_response})
# Save conversation forlearningsave_conversation(user_id,conversation)
API Reference Summary
Endpoint
Method
Auth
Purpose
Latency
/register
POST
Required
Initialize user profile
~100 ms
/ingest
POST
Required
Add user data
~100-500ms
/summary
GET
Required
Get profile summary
~50-200ms
/chunks
POST
None
Get relevant context
~50-150ms
/query
POST
None
Ask complex questions
~5s to 3 minutes (agent runs when cache fails)
Best Practices
1. Registration
Try to provide a purpose to get better personalization
Include social URLs (LinkedIn, Twitter) for richer Stage 1 data
Register users as soon as they sign up
2. Ingestion
Ingest data continuously as it becomes available
Use dedupe: true to prevent duplicate processing
Include timestamps for all messages
Batch related documents together
3. Retrieval
Use /chunks at every agent turn for grounding
Use /summary once per session for system prompts
Use /query as a tool call for complex questions
Set appropriate similarity_threshold (0.25-0.35 for broad, 0.4-0.5 for precise)
4. Performance
Cache profile summaries per session
Use use_cache: true for queries (default)
Consider excluding already-used chunks with exclude_chunk_ids
5. Privacy
All data is automatically anonymized with GLiNER-2