Building a Memory-Driven AI Homelab: DGX Spark, Knowledge Graphs, and 24 Containers From Soup to Nuts
A surgical deep-dive into running an NVIDIA DGX Spark, multi-agent AI orchestration, temporal knowledge graphs, and 24 Docker containers on Unraid — all wired together with MCP servers, HashiCorp Vault, and a custom API layer that gives AI agents persistent memory.
Building a Memory-Driven AI Homelab: DGX Spark, Knowledge Graphs, and 24 Containers From Soup to Nuts
Most homelabs stop at Plex and Pi-hole. This one runs an NVIDIA DGX Spark serving two local LLMs, a multi-agent AI system with four specialized sub-agents, a temporal knowledge graph that gives AI persistent memory across sessions, 24 Docker containers on Unraid, an OPNsense firewall managing TLS termination for 16 services, and a secret management layer that would make an enterprise security team nod approvingly.
This post documents every layer of the architecture — physical hardware, network topology, container orchestration, AI agent routing, knowledge graph internals, and the MCP server mesh that ties it all together. No hand-waving. No “just deploy this Helm chart.” Every IP address, every config decision, every hack required to make Claude talk to a graph database through an OpenAI-compatible proxy.
The 30-Second Overview
┌─────────────────────────────────────────────────────────────────┐
│ PHYSICAL LAYER │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌───────────────────┐ │
│ │ OPNsense │ │ Unraid Tower │ │ NVIDIA DGX Spark │ │
│ │ Firewall │ │ 24 containers │ │ (spanky1) │ │
│ │ 10.0.1.2 │ │ 10.0.128.2 │ │ 10.0.128.196 │ │
│ │ │ │ │ │ │ │
│ │ Caddy TLS │ │ br0 macvlan │ │ Qwen3-32B (vLLM) │ │
│ │ 16 proxies │ │ 10.0.3.x IPs │ │ Qwen2.5-7B │ │
│ │ WireGuard │ │ AI + Media + │ │ Whisper STT │ │
│ │ DNS/DHCP │ │ Infra services │ │ 128GB unified │ │
│ └──────────────┘ └──────────────────┘ └───────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ AI AGENT LAYER │
│ │
│ ┌─────────────────────┐ ┌──────────────────────────────┐ │
│ │ Agent-API │ │ OpenClaw │ │
│ │ 10.0.3.85 │◄────►│ 10.0.3.87 │ │
│ │ 4 sub-agents │ │ Sparky + Dev agents │ │
│ │ PydanticAI + tools │ │ 24 skills combined │ │
│ │ Groq/vLLM/OpenRouter│ │ Claude Sonnet/Opus 4.6 │ │
│ └─────────────────────┘ └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────────┐
│ MEMORY + DATA LAYER │
│ │
│ ┌──────────────┐ ┌──────────────────┐ ┌───────────────────┐ │
│ │ Graphiti │ │ TEI Embeddings │ │ HashiCorp Vault │ │
│ │ + FalkorDB │ │ 10.0.3.89 │ │ 10.0.3.75 │ │
│ │ 10.0.3.88 │ │ all-MiniLM-L6 │ │ Secrets + PKI │ │
│ │ Knowledge │ │ 384-dim vectors │ │ AppRole auth │ │
│ │ Graph + MCP │ │ │ │ │ │
│ └──────────────┘ └──────────────────┘ └───────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Three hosts. Twenty-four containers. Two local LLMs. Three cloud LLM providers. One knowledge graph. Zero cloud dependency for core operations.
Part 1: The Hardware
NVIDIA DGX Spark — The Local AI Powerhouse
The DGX Spark (hostname: spanky1, IP: 10.0.128.196) is the compute backbone. It’s a Grace Blackwell GB10 with 128GB of unified memory running Ubuntu 24.04 on ARM64.
Three systemd services run continuously:
| Service | Model | GPU Memory | Context | Port |
|---|---|---|---|---|
vllm-qwen3 | Qwen3-32B (FP8) | 70% (~90GB) | 32,768 tokens | 8000 |
vllm-router | Qwen2.5-7B-Instruct (FP8) | 15% (~19GB) | 4,096 tokens | 8002 |
whisper | faster-whisper base (CPU) | None | — | 8003 |
Both vLLM instances run with --enforce-eager (no CUDA graph compilation for the GB10 architecture), FP8 quantization with FP8 KV cache, Triton attention backend, and Hermes tool-call parsing enabled for function calling.
The Qwen3-32B handles complex reasoning, code generation, and multi-turn conversations. The Qwen2.5-7B runs the agent router — a lightweight classifier that decides which sub-agent handles each user query. It processes classification requests in under 200ms, which means the routing decision adds negligible latency to every interaction.
Whisper handles speech-to-text for voice commands, running on CPU with int8 quantization to leave GPU memory for the LLMs.
Unraid Tower — The Container Mothership
The Unraid NAS (tower.local.lan, 10.0.128.2) runs all 24 Docker containers across a br0 macvlan network. Every container gets its own IP on the 10.0.3.0/16 subnet, communicating directly at Layer 2 without NAT.
OPNsense — Firewall, DNS, TLS Termination
OPNsense (10.0.1.2) handles routing between subnets, Kea DHCPv4 for all leases, Unbound DNS for local resolution, WireGuard tunnels, and — critically — Caddy reverse proxy for TLS termination of all 16 internal services.
Every *.int.vitalemazo.com domain terminates TLS at Caddy on OPNsense using ACME certificates with Cloudflare DNS challenge. No self-signed certs. No certificate warnings.
Part 2: Network Architecture
Subnet Layout
┌─────────────────────────────────────────────┐
│ Network Topology │
│ │
│ 10.0.1.0/24 ─── OPNsense management │
│ 10.0.3.0/16 ─── br0 macvlan (containers) │
│ 10.0.5.0/24 ─── IoT devices │
│ 10.0.128.0/24 ─── Compute (DGX Spark) │
└─────────────────────────────────────────────┘
The br0 Macvlan — Every Container Is a First-Class Citizen
All 24 containers on Unraid share the br0 macvlan network. Each gets a unique 10.0.3.x IP address. This means:
- Containers communicate directly at Layer 2 — no Docker bridge NAT
- Each container is addressable by IP from anywhere on the network
- OPNsense firewall rules do not apply to same-subnet L2 traffic
- Security between containers is application-layer: bearer tokens, API keys, IP allowlists
This is a deliberate tradeoff. Macvlan gives clean networking and easy addressability at the cost of no implicit inter-container firewall. For a homelab where every service is authenticated, that’s acceptable.
IP Assignment Map
AI / Agent Stack Infrastructure
────────────────── ─────────────────────
10.0.3.85 Agent-API 10.0.3.75 Vault
10.0.3.86 Agent-Chat (Web UI) 10.0.3.25 Home Assistant
10.0.3.87 OpenClaw 10.0.3.20 Mosquitto MQTT
10.0.3.88 Graphiti + FalkorDB 10.0.3.21 RYSE MQTT Bridge
10.0.3.89 TEI Embeddings 10.0.3.30 Docker Registry
10.0.3.90 CLI Proxy API 10.0.3.31 Registry UI
10.0.3.32 Registry Proxy
Media Stack 10.0.3.33 OAuth2 Proxy
────────────────── 10.0.3.66 Cloudflared Tunnel
10.0.3.13 Plex
10.0.3.11 Sonarr IoT / Compute
10.0.3.10 Radarr ─────────────────────
10.0.3.9 Prowlarr 10.0.5.16 RYSE SmartBridge
10.0.3.8 Overseerr 10.0.128.196 DGX Spark
10.0.3.7 Lidarr
10.0.3.5 Deluge
10.0.3.12 FlareSolverr
Caddy Reverse Proxy — 16 Services, One Wildcard
Caddy on OPNsense terminates TLS for every internal service:
vault.int.vitalemazo.com → 10.0.3.75:8200
ha.int.vitalemazo.com → 10.0.3.25:8123
plex.int.vitalemazo.com → 10.0.3.13:32400
agent.int.vitalemazo.com → 10.0.3.85:8888
openclaw.int.vitalemazo.com → 10.0.3.87:18789 (+ OAuth2 proxy)
spanky1-llm.int.vitalemazo.com → 10.0.128.196:8000
...and 10 more
OpenClaw sits behind an additional OAuth2 proxy layer (Google OAuth) via Caddy’s forward_auth directive. Every other service authenticates at the application layer.
Part 3: The Multi-Agent AI System
Agent-API — The Brain Router
The Agent-API (10.0.3.85:8888) is a custom Python application built on PydanticAI that routes every user query to the right specialist.
User Query
│
▼
┌─────────────────────────────────────────────────┐
│ ROUTER (Qwen2.5-7B on DGX Spark, port 8002) │
│ Classifies: infra | home | github | general │
│ Fallback: keyword matching if model unreachable │
└─────────────────────┬───────────────────────────┘
│
┌────────────┼────────────┬──────────────┐
▼ ▼ ▼ ▼
┌────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ INFRA │ │ HOME │ │ GITHUB │ │ GENERAL │
│ 25 tools │ │ 8 tools │ │ MCP │ │ 10 tools │
│ │ │ │ │ Server │ │ │
│ Groq Scout │ │ Qwen3-32B│ │ GPT-OSS │ │ Qwen3-32B│
│ → GPT-OSS │ │ → Groq │ │ → Groq │ │ → GPT-OSS│
└────────────┘ └──────────┘ └──────────┘ └──────────┘
Each sub-agent has a fallback chain. If the primary model is unreachable or returns an error, the agent automatically retries with the next provider:
| Agent | Primary | Fallback | Tools |
|---|---|---|---|
| Infrastructure | Groq (Llama 4 Scout) | GPT-OSS-120B (OpenRouter) | SSH, OPNsense API (10 tools), Terraform, Docker Registry, Cloudflare DNS |
| Home | Qwen3-32B (local) | Groq → GPT-OSS | HA entity control, state queries, automations, history |
| GitHub | GPT-OSS-120B | Groq | GitHub MCP Server (repos, issues, PRs) — 131K context for large diffs |
| General | Qwen3-32B (local) | GPT-OSS-120B | Time, weather, ping, news, web search, RAG (ChromaDB), Vault secrets |
The Home agent has a regex fast path. Simple commands like “turn on the kitchen light” or “close the shades” bypass the LLM entirely — a regex parser extracts the action and entity, calls Home Assistant directly, and returns in under 500ms. The LLM only activates for complex queries like “which lights have been on for more than 2 hours?”
The Monkey-Patches
When you wire together models from Groq, OpenRouter, and a local vLLM instance through the OpenAI SDK, you hit compatibility issues:
-
Groq returns
service_tier: "on_demand"in chat completions. The OpenAI SDK’s Pydantic model rejects this. Fix: patchChatCompletion.model_fields["service_tier"]to accept the value. -
Groq sends
nulltool arguments. GPT-OSS sends{"": {}}for parameterless tools. Neither is valid per the OpenAI spec. Fix: patchToolManager._validate_tool_argsto normalize both patterns.
These are two lines of monkey-patching that save hundreds of error-handling branches.
Authentication and Rate Limiting
Every Agent-API endpoint (except /api/health) requires a bearer token. Tokens are stored in HashiCorp Vault at secret/agent-api/keys — two keys: personal (for direct API access) and openclaw (for the OpenClaw platform).
Rate limiting: 30 requests/minute per key, maximum 2 concurrent requests per key. Sessions expire after 2 hours or 20 messages per agent history.
Part 4: OpenClaw — The Agent Platform
OpenClaw (10.0.3.87) is the user-facing platform. It provides a web chat interface, agent lifecycle management, skill systems, and cron-driven autonomous behaviors.
Browser (openclaw.int.vitalemazo.com)
│
▼
┌─────────────────────────────────────────────┐
│ OpenClaw Gateway (port 18789) │
│ OAuth2 proxy → Caddy forward_auth │
│ │
│ ┌───────────────┐ ┌────────────────────┐ │
│ │ SPARKY │ │ DEV │ │
│ │ Claude │ │ Claude │ │
│ │ Sonnet 4.6 │ │ Opus 4.6 │ │
│ │ 16 skills │ │ 8 skills │ │
│ │ Home + Infra │ │ Development │ │
│ │ focus │ │ focus │ │
│ └───────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────┘
│ │
▼ ▼
cli-proxy-api (10.0.3.90:8317)
│
▼
Anthropic Claude API
Two Agents, Different Roles
Sparky is the home and infrastructure assistant. It has 16 skills covering everything from controlling Sonos speakers and RYSE window shades to querying OPNsense firewall rules and managing Unraid containers. It runs on Claude Sonnet 4.6 for a balance of speed and capability, and has a heartbeat that triggers every 30 minutes during waking hours (8am–11pm) for proactive monitoring.
Dev is the software development agent. It runs on Claude Opus 4.6 for maximum reasoning capability and has skills for autonomous coding loops (clone, build, test, lint, ship), project bootstrapping, and GitHub integration. Sandbox is completely off — it has full read, write, edit, and exec access to its workspace.
The Skill System
Skills are markdown files (SKILL.md) that teach agents how to use specific tools. Some examples from Sparky’s 16 skills:
- homelab-bridge: Proxies requests to the Agent-API for infrastructure/HA/GitHub operations
- knowledge-graph: Stores and retrieves facts from the Graphiti temporal knowledge graph
- opnsense: Directly queries the OPNsense REST API for firewall rules and DHCP leases
- ryse-shades: Controls RYSE SmartBridge window shades (with the workaround that
close_coverdoesn’t work — onlyset_cover_positionto 0) - vault-secrets: CRUD operations on HashiCorp Vault secrets
- sonoscli: Speaker control (play, pause, volume, grouping)
- proactive-agent: Autonomous behavior triggered by cron heartbeats
- self-improving-agent: Learns from errors and corrections to improve future responses
The Claude Proxy
Both agents talk to Claude through cli-proxy-api at 10.0.3.90:8317 — an OpenAI-compatible proxy that forwards requests to Anthropic’s API. This means OpenClaw doesn’t need direct Anthropic API credentials; it speaks the OpenAI API format, and the proxy handles translation.
# OpenClaw model provider config
provider: "claude-proxy"
base_url: "http://10.0.3.90:8317/v1"
models:
- claude-sonnet-4-6 # Sparky
- claude-opus-4-6 # Dev
context_window: 200000
max_output: 16384
Part 5: The Knowledge Graph — Giving AI Persistent Memory
This is where it gets interesting. Most AI agents are stateless — every conversation starts from zero. Graphiti gives our agents a temporal knowledge graph that accumulates entities, relationships, and facts across every interaction.
Architecture
Agent calls graphiti-cli
│
▼
graphiti-cli (shell script)
│
│ HTTP POST (MCP Streamable HTTP protocol)
▼
┌──────────────────────────────────────────────┐
│ Graphiti MCP Server (10.0.3.88:8000) │
│ │
│ ┌──────────────┐ ┌───────────────────┐ │
│ │ Episode │ │ Entity Extraction │ │
│ │ Ingestion │────►│ Claude Sonnet 4.6 │ │
│ │ │ │ ~8-10 LLM calls │ │
│ └──────────────┘ │ per episode │ │
│ └───────────────────┘ │
│ ┌──────────────┐ ┌───────────────────┐ │
│ │ Semantic │ │ TEI Embeddings │ │
│ │ Search │────►│ 10.0.3.89:8080 │ │
│ │ │ │ all-MiniLM-L6-v2 │ │
│ └──────────────┘ │ 384 dimensions │ │
│ └───────────────────┘ │
│ ┌──────────────────────────────────────────┐ │
│ │ FalkorDB (embedded, port 6379) │ │
│ │ Graph storage: nodes, edges, temporal │ │
│ │ metadata, vector embeddings │ │
│ └──────────────────────────────────────────┘ │
└──────────────────────────────────────────────┘
How Memory Works
When an agent learns something important — a deployment outcome, a user preference, an infrastructure fact — it calls graphiti-cli add with a text description and a group ID.
graphiti-cli add "Deployed Graphiti at 10.0.3.88 with FalkorDB \
and TEI embeddings on March 6th 2026" infra
Here’s what happens in the next ~15 seconds:
- Episode creation: The text is stored as an episode in FalkorDB with a timestamp and group ID
- Entity extraction: Claude Sonnet 4.6 analyzes the text and extracts entities with types:
Graphiti→ OrganizationFalkorDB→ Organization10.0.3.88→ LocationTEI embeddings→ TopicMarch 6th 2026→ Event
- Relationship extraction: Claude identifies relationships between entities:
Graphiti—deployed_at→10.0.3.88Graphiti—uses→FalkorDBGraphiti—uses→TEI embeddings
- Embedding generation: Each entity and relationship gets a 384-dimensional vector from the TEI server
- Graph storage: Nodes, edges, and vectors are persisted in FalkorDB
When an agent needs to recall information:
graphiti-cli search-facts "what database does Graphiti use" infra
This performs both semantic search (vector similarity via TEI embeddings) and graph traversal (following relationships in FalkorDB) to return relevant facts with temporal context.
Entity Types
The knowledge graph automatically categorizes extracted entities:
| Type | Description | Examples |
|---|---|---|
| Preference | User choices and opinions | ”Prefers dark mode”, “Uses Qwen3 for routing” |
| Requirement | Needs and specs | ”Must support 200K context”, “Needs FP8 quantization” |
| Procedure | Workflows and commands | ”Delete wlan0 route after reboot”, “Deploy with docker run” |
| Location | Physical and network locations | ”10.0.3.88”, “tower”, “DGX Spark” |
| Event | Deployments, changes, incidents | ”Deployed March 6th”, “Fixed embedder base_url” |
| Organization | Services and systems | ”FalkorDB”, “OpenClaw”, “Graphiti” |
| Document | Files and configs | ”config.yaml”, “deploy.sh”, “SOUL.md” |
| Topic | Concepts and technologies | ”Temporal knowledge graph”, “macvlan networking” |
Group IDs — Cross-Agent Memory
Both agents read and write to the same graph but tag episodes with different group IDs:
sparky— Sparky’s observations and decisionsdev— Dev’s coding context and project knowledgeinfra— Shared infrastructure facts
This means Dev can recall what Sparky learned about a network issue, and Sparky can reference code decisions Dev made. The knowledge graph is shared; the group IDs provide attribution and scoping for search.
The Patches That Made It Work
Graphiti’s MCP server is designed for native OpenAI APIs. Making it work with Claude through an OpenAI-compatible proxy required patching three Python files.
Problem 1: Embeddings routing. Graphiti uses the OpenAI SDK for embeddings, which picks up the OPENAI_BASE_URL environment variable. That points at the Claude proxy (10.0.3.90:8317), but embeddings need to go to the TEI server (10.0.3.89:8080). The factory code doesn’t pass base_url separately.
Fix: Patched factories.py to extract api_url from the embedder’s provider config and pass it explicitly to OpenAIEmbedderConfig(base_url=...).
Problem 2: Structured output validation. Graphiti uses OpenAI’s responses.parse() for structured output — schema validation happens inside the SDK before our code runs. Claude returns JSON wrapped in markdown code fences (```json ... ```), wrong field names (entities instead of extracted_entities), and bare lists instead of objects. All of these fail SDK validation.
Fix: Rewrote openai_client.py to use chat.completions.create() instead of responses.parse(). The JSON schema gets injected as text in the system prompt. A custom response parser strips code fences, remaps field names using fuzzy matching, and auto-wraps bare lists into the expected object structure by inspecting the Pydantic response model’s field types.
Problem 3: Small model fallback. Graphiti uses a “small model” (defaulting to gpt-4.1-mini) for lightweight operations. The Claude proxy doesn’t serve that model.
Fix: Patched factories.py to detect non-OpenAI model names and set small_model = config.model — use Claude for everything.
These three patched files are bind-mounted into the container, overriding the originals at runtime.
Part 6: Secret Management with HashiCorp Vault
Every API key, token, and credential in this infrastructure lives in HashiCorp Vault (10.0.3.75).
┌─────────────────────────────────────────┐
│ HashiCorp Vault (10.0.3.75:8200) │
│ │
│ Auth: AppRole (claude-code role) │
│ Storage: File backend (encrypted) │
│ │
│ secret/api-keys → Groq, │
│ Gemini, │
│ Cerebras, │
│ SambaNova, │
│ OpenRouter │
│ secret/agent-api/keys → API auth │
│ secret/homeassist → HA token │
│ secret/opnsense/api → OPNsense │
│ secret/github/pat → GitHub PAT │
│ secret/cloudflare/* → CF tokens │
│ secret/openclaw/gateway → GW token │
│ secret/docker/registry → Registry │
│ secret/terraform/* → TFC tokens │
│ secret/aws/credentials → AWS keys │
└─────────────────────────────────────────┘
▲ ▲ ▲
│ │ │
Agent-API OpenClaw Claude Code
(AppRole (scoped (AppRole
auto- read-only full
refresh) 15m tokens) access)
No Hardcoded Secrets
The Agent-API authenticates to Vault using AppRole with automatic token refresh. At startup, it exchanges a Role ID and Secret ID for a renewable token (1-hour TTL, extendable to 4 hours). Every API key — Groq, OpenRouter, GitHub, Home Assistant, OPNsense, Cloudflare — is fetched from Vault at runtime.
OpenClaw gets scoped access through a special endpoint (/api/internal/token) on the Agent-API that mints short-lived Vault tokens with a readonly policy and 15-minute TTL. This endpoint is IP-restricted to OpenClaw’s container (10.0.3.87).
Vault MCP Server
Claude Code (my local CLI) connects to Vault through an MCP server — a Go binary that provides read_secret, write_secret, list_secrets, and delete_secret tools, plus full PKI certificate management. This means I can say “store this API key in Vault” in a Claude Code session, and it happens without me ever touching the Vault UI.
Part 7: Home Automation Integration
Home Assistant + MQTT + RYSE Shades
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Agent-API │ │ Home │ │ Mosquitto │
│ Home Agent │────►│ Assistant │ │ MQTT Broker │
│ 10.0.3.85 │REST │ 10.0.3.25 │ │ 10.0.3.20 │
└──────────────┘ └──────┬───────┘ └──────┬───────┘
│ │
│ │
┌──────┴────────────────────┘
│ MQTT
▼
┌──────────────┐ ┌──────────────┐
│ RYSE MQTT │ │ RYSE Smart │
│ Bridge │────►│ Bridge │
│ 10.0.3.21 │ │ 10.0.5.16 │
└──────────────┘ └──────────────┘
The Home Agent has 8 tools for interacting with Home Assistant via its REST API. The standout is ha_control — a combined find-and-control tool that uses fuzzy entity matching with difflib.SequenceMatcher. You can say “turn on the kitchen light” even if the entity is named light.kitchen_main_overhead — it’ll find the closest match.
The RYSE SmartBridge integration deserves special mention. The bridge controls motorized window shades but has a quirk: the standard close_cover service doesn’t work. The agent has learned (and stored in the knowledge graph) that only set_cover_position with position 0 reliably closes the shades. This is exactly the kind of operational knowledge that the temporal knowledge graph preserves across sessions.
Part 8: MCP Servers — The Connective Tissue
Model Context Protocol (MCP) servers provide tool interfaces that AI agents can discover and use. Six MCP servers are configured across the system:
| Server | Runtime | Purpose |
|---|---|---|
| OPNsense | Native binary | Firewall rules, DHCP leases, DNS, WireGuard, diagnostics |
| Vault | Go binary | Secret CRUD, PKI certificate management |
| SSH | Native binary | Remote command execution on known hosts |
| Browser | Native binary | Web page interaction and automation |
| GitHub | Stdio (in Agent-API) | Repository, issue, and PR management |
| Graphiti | HTTP (10.0.3.88:8000) | Knowledge graph read/write via MCP protocol |
The OPNsense MCP server is particularly powerful — it exposes tools for managing firewall aliases, filter rules, Kea DHCP reservations, WireGuard peers, Unbound DNS overrides, firmware updates, and system diagnostics. Instead of SSH-ing into the firewall and running pfctl commands, I tell Claude “block traffic from this IP range” and the MCP server handles the API calls.
MCP Transport: Stdio vs HTTP
Most MCP servers use stdio transport — they run as child processes that communicate over stdin/stdout. This is fine for single-client use (Claude Code on my Mac).
Graphiti uses Streamable HTTP transport — it’s a network service at 10.0.3.88:8000/mcp that multiple clients can connect to simultaneously. The graphiti-cli shell script handles the MCP session lifecycle: initialize a session (get a session ID from the response headers), call tools with that session ID, parse JSON-RPC responses.
# Simplified graphiti-cli flow
SESSION_ID=$(curl -si -X POST "$URL" \
-d '{"jsonrpc":"2.0","method":"initialize",...}' \
| grep -i "mcp-session-id:" | sed "s/^[^:]*: *//" | tr -d "\r\n")
curl -X POST "$URL" \
-H "mcp-session-id: $SESSION_ID" \
-d '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"add_episode",...}}'
Part 9: The Complete Data Flow
Here’s what happens when you type “Remember that the DGX Spark runs Qwen3-32B at 10.0.128.196” into the OpenClaw chat:
1. Browser → Caddy (TLS) → OAuth2 Proxy → OpenClaw Gateway
2. Gateway → Sparky Agent (Claude Sonnet 4.6 via cli-proxy-api)
3. Sparky recognizes this as a memory storage request
4. Sparky invokes the knowledge-graph skill
5. Skill calls: graphiti-cli add "DGX Spark runs Qwen3-32B at 10.0.128.196" infra
6. graphiti-cli → HTTP POST → Graphiti MCP Server (10.0.3.88)
7. Graphiti queues episode for processing
8. Entity extraction begins (Claude Sonnet 4.6 via cli-proxy-api):
├── Extract entities: DGX Spark (Location), Qwen3-32B (Topic),
│ 10.0.128.196 (Location)
├── Extract relationships: DGX_Spark —runs→ Qwen3-32B,
│ DGX_Spark —has_ip→ 10.0.128.196
└── Generate embeddings via TEI (10.0.3.89)
9. Store in FalkorDB (nodes + edges + vectors)
10. Sparky confirms: "Stored that fact in the knowledge graph."
Later, when you ask “What’s running on the DGX Spark?”:
1. Browser → Caddy → OAuth2 → OpenClaw → Sparky
2. Sparky invokes knowledge-graph skill
3. Skill calls: graphiti-cli search-facts "DGX Spark" infra
4. graphiti-cli → Graphiti MCP → TEI (embed query) → FalkorDB
5. FalkorDB returns: vector-similar nodes + graph-connected relationships
6. Sparky receives: "DGX Spark runs Qwen3-32B at 10.0.128.196"
7. Sparky answers with recalled context
The round-trip for recall is under 3 seconds. Storage takes ~15 seconds due to the entity extraction LLM calls.
Part 10: What This Enables
This isn’t infrastructure for its own sake. Here’s what the stack actually does in daily use:
“Turn off the office lights and close the shades” → Home Agent regex fast-path → Home Assistant → lights off in 500ms, then set_cover_position to 0 via RYSE MQTT bridge.
“What containers are running on tower?” → Infrastructure Agent → SSH to tower → docker ps → formatted response with status, IPs, and uptime.
“Create a WireGuard peer for my new laptop” → Infrastructure Agent → OPNsense API → new peer config generated and displayed.
“Review the latest PR on the agent-api repo” → GitHub Agent → GitHub MCP Server → PR diff fetched (131K context window handles large diffs) → detailed review with line-specific comments.
“What did we deploy last week?” → Sparky → Knowledge Graph → temporal query across episodes → list of deployments with dates, IPs, and outcomes.
“Remember that the wlan0 route on tower breaks DGX connectivity after reboot” → Knowledge Graph → stored as Procedure entity → recalled automatically next time DGX connectivity fails.
The knowledge graph is the force multiplier. Without it, every session starts cold. With it, the agents accumulate operational knowledge that compounds over time. Three months from now, these agents will know the history of every deployment, every workaround, every preference — without anyone maintaining a wiki.
Lessons Learned
Local LLMs change the economics. The DGX Spark running Qwen3-32B handles 80% of agent queries without touching a cloud API. Cloud LLMs (Groq, OpenRouter) are fallbacks, not defaults. Claude via the proxy is reserved for where it matters most: entity extraction (Graphiti) and complex reasoning (OpenClaw’s Opus agent).
Macvlan networking is worth the tradeoff. Clean IPs, no NAT, easy debugging. The loss of inter-container firewall rules is acceptable when every service authenticates at the application layer.
MCP servers are the right abstraction. Instead of building custom integrations for every tool, MCP provides a standard interface that any LLM client can discover and use. Adding a new capability means deploying one MCP server, not modifying every agent.
Patching upstream code is sometimes the only option. When the Graphiti image assumes native OpenAI APIs and you’re running Claude through a proxy, you patch. Three bind-mounted Python files is less maintenance than a fork.
Vault from day one. Every secret in one place with audit logs and short-lived tokens. The initial setup takes an afternoon. The payoff is never wondering where an API key lives or whether it’s been rotated.
The Numbers
| Metric | Value |
|---|---|
| Physical hosts | 3 (OPNsense, Unraid, DGX Spark) |
| Docker containers | 24 |
| Local LLMs | 2 (Qwen3-32B, Qwen2.5-7B) |
| Cloud LLM providers | 3 (Groq, OpenRouter, Anthropic) |
| AI sub-agents | 4 (infra, home, github, general) |
| OpenClaw agents | 2 (Sparky, Dev) |
| Combined skills | 24 |
| MCP servers | 6 |
| Caddy reverse proxy entries | 16 |
| Vault secret paths | 15+ |
| Knowledge graph entity types | 8 |
| Total agent tools | 50+ |
| GPU memory allocated | 109GB (of 128GB) |
Three hosts. Twenty-four containers. Fifty tools. One knowledge graph. Zero manual memory management.
The agents remember. The graph grows. The homelab learns.
Related Posts
AI Orchestration for Network Operations: Autonomous Infrastructure at Scale
How a single AI agent orchestrates AWS Global WAN infrastructure with autonomous decision-making, separation-of-powers governance, and 10-100x operational acceleration.
What OpenClaw Teaches Us About the Future of Building Software
Peter Steinberger's OpenClaw isn't just the hottest open-source project of 2026. It's a blueprint for how a single engineer can ship like a team, and why taste and architecture now matter more than raw coding skill.
The Audit Agent: Building Trust in Autonomous AI Infrastructure
How an independent audit agent creates separation of powers for AI-driven infrastructure—preventing runaway automation while enabling autonomous operations at scale.
Comments & Discussion
Discussions are powered by GitHub. Sign in with your GitHub account to leave a comment.