Integrating with OpenAI Realtime

The OpenAI Realtime API lets you build conversational agents that stream messages and audio in real time. By integrating the Fastino Personalization API, you can inject live, user-specific context into these Realtime sessions — enabling personalized reasoning, tone adaptation, and proactive responses.

Overview

Fastino acts as a personalization layer for Realtime models.
Each new Realtime session can be initialized with:

A user’s summary (identity, tone, goals)
Top-k retrieved memories from the user’s world model
Adaptive instructions that evolve across sessions

Together, this creates an agent that remembers who the user is and adapts continuously during live interactions.

Architecture

Realtime Session Initialization Flow

Before creating a Realtime connection, fetch the user’s personalization context from Fastino.
Pass this data into your session.create or response.create call as a system message or session variable.
Optionally, log new messages or corrections back to Fastino after each session for continual learning.

Prerequisites

Fastino API key
OpenAI API key with Realtime access
Installed dependencies:
A registered Fastino user (via /personalization/users/register)

Example Workflow

Step 1 — Retrieve Personalized Context

Before starting a Realtime session, get the user’s deterministic summary and relevant snippets:

import requests

FASTINO_API = "https://api.fastino.ai"
FASTINO_KEY = "sk_live_123"
HEADERS = {"Authorization": f"x-api-key {FASTINO_KEY}", "Content-Type": "application/json"}

USER_ID = "usr_42af7c"

# Fetch summary
summary = requests.get(
    f"{FASTINO_API}/summary?user_id={USER_ID}&purpose=work-style&max_chars=500"
).json()["summary"]

# Fetch top-k context
payload = {"user_id": USER_ID, "conversation": [{"role": "user", "content": "Start realtime session"}], "top_k": 3}
rag = requests.post(f"{FASTINO_API}/chunks", json=payload, headers=HEADERS).json()["results"]

context_snippets = [r["excerpt"] for r in rag]

Step 2 — Initialize Realtime Session with Fastino Context

from openai import OpenAI
import asyncio

client = OpenAI(api_key="sk-openai-xyz")

SYSTEM_PROMPT = f"""
You are a personalized assistant for user {USER_ID}.
User summary:
{summary}

Context snippets:
{context_snippets}
"""

async def main():
    async with client.realtime.sessions.stream(
        model="gpt-4o-realtime-preview-2025-10-01",
        messages=[{"role": "system", "content": SYSTEM_PROMPT}]
    ) as stream:
        await stream.send_input("Hey, can you plan tomorrow around my focus hours?")
        async for event in stream:
            if event.type == "response.output_text.delta":
                print(event.delta, end="")

asyncio.run(main())

Now the Realtime model responds with full awareness of the user’s work style, preferences, and routines from their Fastino profile.

Step 3 — Log New Insights Back to Fastino

At the end of a session, you can capture conversational notes or user updates and feed them back into Fastino.

note = "User mentioned a new preference: prefers 30-minute meetings before lunch."
payload = {
  "user_id": USER_ID,
  "source": "openai_realtime",
  "documents": [
    {"doc_id": "realtime_20251027", "kind": "note", "title": "Realtime session", "content": note}
  ]
}
requests.post(f"{FASTINO_API}/personalization/ingest", json=payload, headers=HEADERS)

This keeps the world model continuously synchronized across live sessions.

Using Realtime Streams

Fastino’s context can be updated mid-stream by dynamically appending retrieved snippets or summaries.

Example:

await stream.update_session(
  messages=[{"role": "system", "content": "Update: user focus window now ends at 1:30 PM PT"}]
)

Realtime agents can therefore adjust instantly to new context — ideal for assistants managing calendars, calls, or decision workflows.

Use Cases

Use Case	Description
Personalized live chat	Feed Fastino summaries into Realtime chat sessions for tone adaptation
Voice assistants	Stream user context into voice interactions for continuity
Scheduling assistants	Sync focus blocks and meeting preferences from user world models
Decision helpers	Recall user priorities in live discussions
Contextual escalation	Pass summarized user data to another agent or channel seamlessly

Authentication

Both APIs require bearer tokens:

Fastino

OpenAI

You can safely combine them in environment variables:

Error Handling

If Fastino or OpenAI returns an error mid-session, handle gracefully:

{
  "error": {
    "code": "USER_NOT_FOUND",
    "message": "No user found with ID usr_42af7c"
  }
}

Implement retries for transient 5xx responses, and cache the latest Fastino summary locally for fallback continuity.

Best Practices

Fetch summaries with purpose=work-style or purpose=conversation for Realtime.
Use short summaries (≤ 1000 chars) for latency-sensitive streaming.
Refresh summaries daily to reflect new learning.
Stream updates to Fastino every few sessions to reinforce context.
Never send raw PII — rely on anonymized user IDs.

Example: Live Conversation Flow

User starts Realtime session → agent fetches Fastino summary.
Fastino returns key preferences (tone, schedule, relationships).
Agent personalizes its reasoning mid-stream.
After session, agent logs updated notes back into Fastino.
Next Realtime session begins with refined memory.

Summary

Integrating Fastino with OpenAI Realtime enables live, adaptive, and continuous personalization.
Every Realtime conversation becomes context-aware — grounded in the user’s world model and capable of evolving dynamically as new signals are learned.

Next, continue to Personalization Use Cases to explore how to apply these integrations across agents, copilots, and adaptive assistants.

Integrating with LangChain

Inferencing with GLiNER 2

Join our Discord Community

On this page

Title