AI Infrastructure by Vitale Mazo
22 min read
0 views

Building a Memory-Driven AI Homelab: DGX Spark, Knowledge Graphs, and 24 Containers From Soup to Nuts

A surgical deep-dive into running an NVIDIA DGX Spark, multi-agent AI orchestration, temporal knowledge graphs, and 24 Docker containers on Unraid — all wired together with MCP servers, HashiCorp Vault, and a custom API layer that gives AI agents persistent memory.

Building a Memory-Driven AI Homelab: DGX Spark, Knowledge Graphs, and 24 Containers From Soup to Nuts
Click to view full size
#AI #Homelab #DGX Spark #Knowledge Graph #MCP #OpenClaw #Unraid #OPNsense #Claude #LLM #Infrastructure #Automation #FalkorDB #Vault

Building a Memory-Driven AI Homelab: DGX Spark, Knowledge Graphs, and 24 Containers From Soup to Nuts

Most homelabs stop at Plex and Pi-hole. This one runs an NVIDIA DGX Spark serving two local LLMs, a multi-agent AI system with four specialized sub-agents, a temporal knowledge graph that gives AI persistent memory across sessions, 24 Docker containers on Unraid, an OPNsense firewall managing TLS termination for 16 services, and a secret management layer that would make an enterprise security team nod approvingly.

This post documents every layer of the architecture — physical hardware, network topology, container orchestration, AI agent routing, knowledge graph internals, and the MCP server mesh that ties it all together. No hand-waving. No “just deploy this Helm chart.” Every IP address, every config decision, every hack required to make Claude talk to a graph database through an OpenAI-compatible proxy.

The 30-Second Overview

┌─────────────────────────────────────────────────────────────────┐
│                        PHYSICAL LAYER                           │
│                                                                 │
│  ┌──────────────┐  ┌──────────────────┐  ┌───────────────────┐ │
│  │  OPNsense    │  │  Unraid Tower    │  │  NVIDIA DGX Spark │ │
│  │  Firewall    │  │  24 containers   │  │  (spanky1)        │ │
│  │  10.0.1.2    │  │  10.0.128.2      │  │  10.0.128.196     │ │
│  │              │  │                  │  │                   │ │
│  │  Caddy TLS   │  │  br0 macvlan     │  │  Qwen3-32B (vLLM) │ │
│  │  16 proxies  │  │  10.0.3.x IPs   │  │  Qwen2.5-7B       │ │
│  │  WireGuard   │  │  AI + Media +    │  │  Whisper STT      │ │
│  │  DNS/DHCP    │  │  Infra services  │  │  128GB unified    │ │
│  └──────────────┘  └──────────────────┘  └───────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                         AI AGENT LAYER                          │
│                                                                 │
│  ┌─────────────────────┐      ┌──────────────────────────────┐ │
│  │  Agent-API           │      │  OpenClaw                    │ │
│  │  10.0.3.85           │◄────►│  10.0.3.87                   │ │
│  │  4 sub-agents        │      │  Sparky + Dev agents         │ │
│  │  PydanticAI + tools  │      │  24 skills combined          │ │
│  │  Groq/vLLM/OpenRouter│      │  Claude Sonnet/Opus 4.6      │ │
│  └─────────────────────┘      └──────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                       MEMORY + DATA LAYER                       │
│                                                                 │
│  ┌──────────────┐  ┌──────────────────┐  ┌───────────────────┐ │
│  │  Graphiti     │  │  TEI Embeddings  │  │  HashiCorp Vault  │ │
│  │  + FalkorDB   │  │  10.0.3.89       │  │  10.0.3.75        │ │
│  │  10.0.3.88    │  │  all-MiniLM-L6   │  │  Secrets + PKI    │ │
│  │  Knowledge    │  │  384-dim vectors  │  │  AppRole auth     │ │
│  │  Graph + MCP  │  │                  │  │                   │ │
│  └──────────────┘  └──────────────────┘  └───────────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Three hosts. Twenty-four containers. Two local LLMs. Three cloud LLM providers. One knowledge graph. Zero cloud dependency for core operations.


Part 1: The Hardware

NVIDIA DGX Spark — The Local AI Powerhouse

The DGX Spark (hostname: spanky1, IP: 10.0.128.196) is the compute backbone. It’s a Grace Blackwell GB10 with 128GB of unified memory running Ubuntu 24.04 on ARM64.

Three systemd services run continuously:

ServiceModelGPU MemoryContextPort
vllm-qwen3Qwen3-32B (FP8)70% (~90GB)32,768 tokens8000
vllm-routerQwen2.5-7B-Instruct (FP8)15% (~19GB)4,096 tokens8002
whisperfaster-whisper base (CPU)None8003

Both vLLM instances run with --enforce-eager (no CUDA graph compilation for the GB10 architecture), FP8 quantization with FP8 KV cache, Triton attention backend, and Hermes tool-call parsing enabled for function calling.

The Qwen3-32B handles complex reasoning, code generation, and multi-turn conversations. The Qwen2.5-7B runs the agent router — a lightweight classifier that decides which sub-agent handles each user query. It processes classification requests in under 200ms, which means the routing decision adds negligible latency to every interaction.

Whisper handles speech-to-text for voice commands, running on CPU with int8 quantization to leave GPU memory for the LLMs.

Unraid Tower — The Container Mothership

The Unraid NAS (tower.local.lan, 10.0.128.2) runs all 24 Docker containers across a br0 macvlan network. Every container gets its own IP on the 10.0.3.0/16 subnet, communicating directly at Layer 2 without NAT.

OPNsense — Firewall, DNS, TLS Termination

OPNsense (10.0.1.2) handles routing between subnets, Kea DHCPv4 for all leases, Unbound DNS for local resolution, WireGuard tunnels, and — critically — Caddy reverse proxy for TLS termination of all 16 internal services.

Every *.int.vitalemazo.com domain terminates TLS at Caddy on OPNsense using ACME certificates with Cloudflare DNS challenge. No self-signed certs. No certificate warnings.


Part 2: Network Architecture

Subnet Layout

┌─────────────────────────────────────────────┐
│              Network Topology                │
│                                              │
│  10.0.1.0/24   ─── OPNsense management      │
│  10.0.3.0/16   ─── br0 macvlan (containers) │
│  10.0.5.0/24   ─── IoT devices              │
│  10.0.128.0/24 ─── Compute (DGX Spark)      │
└─────────────────────────────────────────────┘

The br0 Macvlan — Every Container Is a First-Class Citizen

All 24 containers on Unraid share the br0 macvlan network. Each gets a unique 10.0.3.x IP address. This means:

  • Containers communicate directly at Layer 2 — no Docker bridge NAT
  • Each container is addressable by IP from anywhere on the network
  • OPNsense firewall rules do not apply to same-subnet L2 traffic
  • Security between containers is application-layer: bearer tokens, API keys, IP allowlists

This is a deliberate tradeoff. Macvlan gives clean networking and easy addressability at the cost of no implicit inter-container firewall. For a homelab where every service is authenticated, that’s acceptable.

IP Assignment Map

AI / Agent Stack                    Infrastructure
──────────────────                  ─────────────────────
10.0.3.85  Agent-API                10.0.3.75  Vault
10.0.3.86  Agent-Chat (Web UI)      10.0.3.25  Home Assistant
10.0.3.87  OpenClaw                 10.0.3.20  Mosquitto MQTT
10.0.3.88  Graphiti + FalkorDB      10.0.3.21  RYSE MQTT Bridge
10.0.3.89  TEI Embeddings           10.0.3.30  Docker Registry
10.0.3.90  CLI Proxy API            10.0.3.31  Registry UI
                                    10.0.3.32  Registry Proxy
Media Stack                         10.0.3.33  OAuth2 Proxy
──────────────────                  10.0.3.66  Cloudflared Tunnel
10.0.3.13  Plex
10.0.3.11  Sonarr                   IoT / Compute
10.0.3.10  Radarr                   ─────────────────────
10.0.3.9   Prowlarr                 10.0.5.16  RYSE SmartBridge
10.0.3.8   Overseerr                10.0.128.196  DGX Spark
10.0.3.7   Lidarr
10.0.3.5   Deluge
10.0.3.12  FlareSolverr

Caddy Reverse Proxy — 16 Services, One Wildcard

Caddy on OPNsense terminates TLS for every internal service:

vault.int.vitalemazo.com       → 10.0.3.75:8200
ha.int.vitalemazo.com          → 10.0.3.25:8123
plex.int.vitalemazo.com        → 10.0.3.13:32400
agent.int.vitalemazo.com       → 10.0.3.85:8888
openclaw.int.vitalemazo.com    → 10.0.3.87:18789  (+ OAuth2 proxy)
spanky1-llm.int.vitalemazo.com → 10.0.128.196:8000
...and 10 more

OpenClaw sits behind an additional OAuth2 proxy layer (Google OAuth) via Caddy’s forward_auth directive. Every other service authenticates at the application layer.


Part 3: The Multi-Agent AI System

Agent-API — The Brain Router

The Agent-API (10.0.3.85:8888) is a custom Python application built on PydanticAI that routes every user query to the right specialist.

User Query


┌─────────────────────────────────────────────────┐
│  ROUTER (Qwen2.5-7B on DGX Spark, port 8002)   │
│  Classifies: infra | home | github | general    │
│  Fallback: keyword matching if model unreachable │
└─────────────────────┬───────────────────────────┘

         ┌────────────┼────────────┬──────────────┐
         ▼            ▼            ▼              ▼
  ┌────────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
  │ INFRA      │ │ HOME     │ │ GITHUB   │ │ GENERAL  │
  │ 25 tools   │ │ 8 tools  │ │ MCP      │ │ 10 tools │
  │            │ │          │ │ Server   │ │          │
  │ Groq Scout │ │ Qwen3-32B│ │ GPT-OSS  │ │ Qwen3-32B│
  │ → GPT-OSS  │ │ → Groq   │ │ → Groq   │ │ → GPT-OSS│
  └────────────┘ └──────────┘ └──────────┘ └──────────┘

Each sub-agent has a fallback chain. If the primary model is unreachable or returns an error, the agent automatically retries with the next provider:

AgentPrimaryFallbackTools
InfrastructureGroq (Llama 4 Scout)GPT-OSS-120B (OpenRouter)SSH, OPNsense API (10 tools), Terraform, Docker Registry, Cloudflare DNS
HomeQwen3-32B (local)Groq → GPT-OSSHA entity control, state queries, automations, history
GitHubGPT-OSS-120BGroqGitHub MCP Server (repos, issues, PRs) — 131K context for large diffs
GeneralQwen3-32B (local)GPT-OSS-120BTime, weather, ping, news, web search, RAG (ChromaDB), Vault secrets

The Home agent has a regex fast path. Simple commands like “turn on the kitchen light” or “close the shades” bypass the LLM entirely — a regex parser extracts the action and entity, calls Home Assistant directly, and returns in under 500ms. The LLM only activates for complex queries like “which lights have been on for more than 2 hours?”

The Monkey-Patches

When you wire together models from Groq, OpenRouter, and a local vLLM instance through the OpenAI SDK, you hit compatibility issues:

  1. Groq returns service_tier: "on_demand" in chat completions. The OpenAI SDK’s Pydantic model rejects this. Fix: patch ChatCompletion.model_fields["service_tier"] to accept the value.

  2. Groq sends null tool arguments. GPT-OSS sends {"": {}} for parameterless tools. Neither is valid per the OpenAI spec. Fix: patch ToolManager._validate_tool_args to normalize both patterns.

These are two lines of monkey-patching that save hundreds of error-handling branches.

Authentication and Rate Limiting

Every Agent-API endpoint (except /api/health) requires a bearer token. Tokens are stored in HashiCorp Vault at secret/agent-api/keys — two keys: personal (for direct API access) and openclaw (for the OpenClaw platform).

Rate limiting: 30 requests/minute per key, maximum 2 concurrent requests per key. Sessions expire after 2 hours or 20 messages per agent history.


Part 4: OpenClaw — The Agent Platform

OpenClaw (10.0.3.87) is the user-facing platform. It provides a web chat interface, agent lifecycle management, skill systems, and cron-driven autonomous behaviors.

Browser (openclaw.int.vitalemazo.com)


┌─────────────────────────────────────────────┐
│  OpenClaw Gateway (port 18789)              │
│  OAuth2 proxy → Caddy forward_auth          │
│                                             │
│  ┌───────────────┐  ┌────────────────────┐  │
│  │  SPARKY        │  │  DEV               │  │
│  │  Claude        │  │  Claude            │  │
│  │  Sonnet 4.6    │  │  Opus 4.6          │  │
│  │  16 skills     │  │  8 skills          │  │
│  │  Home + Infra  │  │  Development       │  │
│  │  focus         │  │  focus             │  │
│  └───────────────┘  └────────────────────┘  │
└─────────────────────────────────────────────┘
         │                      │
         ▼                      ▼
   cli-proxy-api (10.0.3.90:8317)


   Anthropic Claude API

Two Agents, Different Roles

Sparky is the home and infrastructure assistant. It has 16 skills covering everything from controlling Sonos speakers and RYSE window shades to querying OPNsense firewall rules and managing Unraid containers. It runs on Claude Sonnet 4.6 for a balance of speed and capability, and has a heartbeat that triggers every 30 minutes during waking hours (8am–11pm) for proactive monitoring.

Dev is the software development agent. It runs on Claude Opus 4.6 for maximum reasoning capability and has skills for autonomous coding loops (clone, build, test, lint, ship), project bootstrapping, and GitHub integration. Sandbox is completely off — it has full read, write, edit, and exec access to its workspace.

The Skill System

Skills are markdown files (SKILL.md) that teach agents how to use specific tools. Some examples from Sparky’s 16 skills:

  • homelab-bridge: Proxies requests to the Agent-API for infrastructure/HA/GitHub operations
  • knowledge-graph: Stores and retrieves facts from the Graphiti temporal knowledge graph
  • opnsense: Directly queries the OPNsense REST API for firewall rules and DHCP leases
  • ryse-shades: Controls RYSE SmartBridge window shades (with the workaround that close_cover doesn’t work — only set_cover_position to 0)
  • vault-secrets: CRUD operations on HashiCorp Vault secrets
  • sonoscli: Speaker control (play, pause, volume, grouping)
  • proactive-agent: Autonomous behavior triggered by cron heartbeats
  • self-improving-agent: Learns from errors and corrections to improve future responses

The Claude Proxy

Both agents talk to Claude through cli-proxy-api at 10.0.3.90:8317 — an OpenAI-compatible proxy that forwards requests to Anthropic’s API. This means OpenClaw doesn’t need direct Anthropic API credentials; it speaks the OpenAI API format, and the proxy handles translation.

# OpenClaw model provider config
provider: "claude-proxy"
base_url: "http://10.0.3.90:8317/v1"
models:
  - claude-sonnet-4-6   # Sparky
  - claude-opus-4-6     # Dev
context_window: 200000
max_output: 16384

Part 5: The Knowledge Graph — Giving AI Persistent Memory

This is where it gets interesting. Most AI agents are stateless — every conversation starts from zero. Graphiti gives our agents a temporal knowledge graph that accumulates entities, relationships, and facts across every interaction.

Architecture

Agent calls graphiti-cli


 graphiti-cli (shell script)

       │  HTTP POST (MCP Streamable HTTP protocol)

┌──────────────────────────────────────────────┐
│  Graphiti MCP Server (10.0.3.88:8000)        │
│                                              │
│  ┌──────────────┐     ┌───────────────────┐  │
│  │ Episode       │     │ Entity Extraction │  │
│  │ Ingestion     │────►│ Claude Sonnet 4.6 │  │
│  │               │     │ ~8-10 LLM calls   │  │
│  └──────────────┘     │ per episode        │  │
│                        └───────────────────┘  │
│  ┌──────────────┐     ┌───────────────────┐  │
│  │ Semantic      │     │ TEI Embeddings    │  │
│  │ Search        │────►│ 10.0.3.89:8080    │  │
│  │               │     │ all-MiniLM-L6-v2  │  │
│  └──────────────┘     │ 384 dimensions     │  │
│                        └───────────────────┘  │
│  ┌──────────────────────────────────────────┐ │
│  │ FalkorDB (embedded, port 6379)           │ │
│  │ Graph storage: nodes, edges, temporal    │ │
│  │ metadata, vector embeddings              │ │
│  └──────────────────────────────────────────┘ │
└──────────────────────────────────────────────┘

How Memory Works

When an agent learns something important — a deployment outcome, a user preference, an infrastructure fact — it calls graphiti-cli add with a text description and a group ID.

graphiti-cli add "Deployed Graphiti at 10.0.3.88 with FalkorDB \
  and TEI embeddings on March 6th 2026" infra

Here’s what happens in the next ~15 seconds:

  1. Episode creation: The text is stored as an episode in FalkorDB with a timestamp and group ID
  2. Entity extraction: Claude Sonnet 4.6 analyzes the text and extracts entities with types:
    • Graphiti → Organization
    • FalkorDB → Organization
    • 10.0.3.88 → Location
    • TEI embeddings → Topic
    • March 6th 2026 → Event
  3. Relationship extraction: Claude identifies relationships between entities:
    • Graphiti —deployed_at→ 10.0.3.88
    • Graphiti —uses→ FalkorDB
    • Graphiti —uses→ TEI embeddings
  4. Embedding generation: Each entity and relationship gets a 384-dimensional vector from the TEI server
  5. Graph storage: Nodes, edges, and vectors are persisted in FalkorDB

When an agent needs to recall information:

graphiti-cli search-facts "what database does Graphiti use" infra

This performs both semantic search (vector similarity via TEI embeddings) and graph traversal (following relationships in FalkorDB) to return relevant facts with temporal context.

Entity Types

The knowledge graph automatically categorizes extracted entities:

TypeDescriptionExamples
PreferenceUser choices and opinions”Prefers dark mode”, “Uses Qwen3 for routing”
RequirementNeeds and specs”Must support 200K context”, “Needs FP8 quantization”
ProcedureWorkflows and commands”Delete wlan0 route after reboot”, “Deploy with docker run”
LocationPhysical and network locations”10.0.3.88”, “tower”, “DGX Spark”
EventDeployments, changes, incidents”Deployed March 6th”, “Fixed embedder base_url”
OrganizationServices and systems”FalkorDB”, “OpenClaw”, “Graphiti”
DocumentFiles and configs”config.yaml”, “deploy.sh”, “SOUL.md”
TopicConcepts and technologies”Temporal knowledge graph”, “macvlan networking”

Group IDs — Cross-Agent Memory

Both agents read and write to the same graph but tag episodes with different group IDs:

  • sparky — Sparky’s observations and decisions
  • dev — Dev’s coding context and project knowledge
  • infra — Shared infrastructure facts

This means Dev can recall what Sparky learned about a network issue, and Sparky can reference code decisions Dev made. The knowledge graph is shared; the group IDs provide attribution and scoping for search.

The Patches That Made It Work

Graphiti’s MCP server is designed for native OpenAI APIs. Making it work with Claude through an OpenAI-compatible proxy required patching three Python files.

Problem 1: Embeddings routing. Graphiti uses the OpenAI SDK for embeddings, which picks up the OPENAI_BASE_URL environment variable. That points at the Claude proxy (10.0.3.90:8317), but embeddings need to go to the TEI server (10.0.3.89:8080). The factory code doesn’t pass base_url separately.

Fix: Patched factories.py to extract api_url from the embedder’s provider config and pass it explicitly to OpenAIEmbedderConfig(base_url=...).

Problem 2: Structured output validation. Graphiti uses OpenAI’s responses.parse() for structured output — schema validation happens inside the SDK before our code runs. Claude returns JSON wrapped in markdown code fences (```json ... ```), wrong field names (entities instead of extracted_entities), and bare lists instead of objects. All of these fail SDK validation.

Fix: Rewrote openai_client.py to use chat.completions.create() instead of responses.parse(). The JSON schema gets injected as text in the system prompt. A custom response parser strips code fences, remaps field names using fuzzy matching, and auto-wraps bare lists into the expected object structure by inspecting the Pydantic response model’s field types.

Problem 3: Small model fallback. Graphiti uses a “small model” (defaulting to gpt-4.1-mini) for lightweight operations. The Claude proxy doesn’t serve that model.

Fix: Patched factories.py to detect non-OpenAI model names and set small_model = config.model — use Claude for everything.

These three patched files are bind-mounted into the container, overriding the originals at runtime.


Part 6: Secret Management with HashiCorp Vault

Every API key, token, and credential in this infrastructure lives in HashiCorp Vault (10.0.3.75).

┌─────────────────────────────────────────┐
│  HashiCorp Vault (10.0.3.75:8200)       │
│                                         │
│  Auth: AppRole (claude-code role)       │
│  Storage: File backend (encrypted)      │
│                                         │
│  secret/api-keys          → Groq,       │
│                             Gemini,     │
│                             Cerebras,   │
│                             SambaNova,  │
│                             OpenRouter  │
│  secret/agent-api/keys    → API auth    │
│  secret/homeassist        → HA token    │
│  secret/opnsense/api      → OPNsense   │
│  secret/github/pat        → GitHub PAT  │
│  secret/cloudflare/*      → CF tokens   │
│  secret/openclaw/gateway  → GW token    │
│  secret/docker/registry   → Registry    │
│  secret/terraform/*       → TFC tokens  │
│  secret/aws/credentials   → AWS keys    │
└─────────────────────────────────────────┘
         ▲           ▲           ▲
         │           │           │
   Agent-API    OpenClaw     Claude Code
   (AppRole     (scoped       (AppRole
    auto-       read-only      full
    refresh)    15m tokens)    access)

No Hardcoded Secrets

The Agent-API authenticates to Vault using AppRole with automatic token refresh. At startup, it exchanges a Role ID and Secret ID for a renewable token (1-hour TTL, extendable to 4 hours). Every API key — Groq, OpenRouter, GitHub, Home Assistant, OPNsense, Cloudflare — is fetched from Vault at runtime.

OpenClaw gets scoped access through a special endpoint (/api/internal/token) on the Agent-API that mints short-lived Vault tokens with a readonly policy and 15-minute TTL. This endpoint is IP-restricted to OpenClaw’s container (10.0.3.87).

Vault MCP Server

Claude Code (my local CLI) connects to Vault through an MCP server — a Go binary that provides read_secret, write_secret, list_secrets, and delete_secret tools, plus full PKI certificate management. This means I can say “store this API key in Vault” in a Claude Code session, and it happens without me ever touching the Vault UI.


Part 7: Home Automation Integration

Home Assistant + MQTT + RYSE Shades

┌──────────────┐     ┌──────────────┐     ┌──────────────┐
│ Agent-API    │     │ Home         │     │ Mosquitto    │
│ Home Agent   │────►│ Assistant    │     │ MQTT Broker  │
│ 10.0.3.85   │REST │ 10.0.3.25    │     │ 10.0.3.20    │
└──────────────┘     └──────┬───────┘     └──────┬───────┘
                            │                     │
                            │                     │
                     ┌──────┴────────────────────┘
                     │ MQTT

              ┌──────────────┐     ┌──────────────┐
              │ RYSE MQTT    │     │ RYSE Smart   │
              │ Bridge       │────►│ Bridge       │
              │ 10.0.3.21    │     │ 10.0.5.16    │
              └──────────────┘     └──────────────┘

The Home Agent has 8 tools for interacting with Home Assistant via its REST API. The standout is ha_control — a combined find-and-control tool that uses fuzzy entity matching with difflib.SequenceMatcher. You can say “turn on the kitchen light” even if the entity is named light.kitchen_main_overhead — it’ll find the closest match.

The RYSE SmartBridge integration deserves special mention. The bridge controls motorized window shades but has a quirk: the standard close_cover service doesn’t work. The agent has learned (and stored in the knowledge graph) that only set_cover_position with position 0 reliably closes the shades. This is exactly the kind of operational knowledge that the temporal knowledge graph preserves across sessions.


Part 8: MCP Servers — The Connective Tissue

Model Context Protocol (MCP) servers provide tool interfaces that AI agents can discover and use. Six MCP servers are configured across the system:

ServerRuntimePurpose
OPNsenseNative binaryFirewall rules, DHCP leases, DNS, WireGuard, diagnostics
VaultGo binarySecret CRUD, PKI certificate management
SSHNative binaryRemote command execution on known hosts
BrowserNative binaryWeb page interaction and automation
GitHubStdio (in Agent-API)Repository, issue, and PR management
GraphitiHTTP (10.0.3.88:8000)Knowledge graph read/write via MCP protocol

The OPNsense MCP server is particularly powerful — it exposes tools for managing firewall aliases, filter rules, Kea DHCP reservations, WireGuard peers, Unbound DNS overrides, firmware updates, and system diagnostics. Instead of SSH-ing into the firewall and running pfctl commands, I tell Claude “block traffic from this IP range” and the MCP server handles the API calls.

MCP Transport: Stdio vs HTTP

Most MCP servers use stdio transport — they run as child processes that communicate over stdin/stdout. This is fine for single-client use (Claude Code on my Mac).

Graphiti uses Streamable HTTP transport — it’s a network service at 10.0.3.88:8000/mcp that multiple clients can connect to simultaneously. The graphiti-cli shell script handles the MCP session lifecycle: initialize a session (get a session ID from the response headers), call tools with that session ID, parse JSON-RPC responses.

# Simplified graphiti-cli flow
SESSION_ID=$(curl -si -X POST "$URL" \
  -d '{"jsonrpc":"2.0","method":"initialize",...}' \
  | grep -i "mcp-session-id:" | sed "s/^[^:]*: *//" | tr -d "\r\n")

curl -X POST "$URL" \
  -H "mcp-session-id: $SESSION_ID" \
  -d '{"jsonrpc":"2.0","method":"tools/call","params":{"name":"add_episode",...}}'

Part 9: The Complete Data Flow

Here’s what happens when you type “Remember that the DGX Spark runs Qwen3-32B at 10.0.128.196” into the OpenClaw chat:

1. Browser → Caddy (TLS) → OAuth2 Proxy → OpenClaw Gateway
2. Gateway → Sparky Agent (Claude Sonnet 4.6 via cli-proxy-api)
3. Sparky recognizes this as a memory storage request
4. Sparky invokes the knowledge-graph skill
5. Skill calls: graphiti-cli add "DGX Spark runs Qwen3-32B at 10.0.128.196" infra
6. graphiti-cli → HTTP POST → Graphiti MCP Server (10.0.3.88)
7. Graphiti queues episode for processing
8. Entity extraction begins (Claude Sonnet 4.6 via cli-proxy-api):
   ├── Extract entities: DGX Spark (Location), Qwen3-32B (Topic),
   │   10.0.128.196 (Location)
   ├── Extract relationships: DGX_Spark —runs→ Qwen3-32B,
   │   DGX_Spark —has_ip→ 10.0.128.196
   └── Generate embeddings via TEI (10.0.3.89)
9. Store in FalkorDB (nodes + edges + vectors)
10. Sparky confirms: "Stored that fact in the knowledge graph."

Later, when you ask “What’s running on the DGX Spark?”:

1. Browser → Caddy → OAuth2 → OpenClaw → Sparky
2. Sparky invokes knowledge-graph skill
3. Skill calls: graphiti-cli search-facts "DGX Spark" infra
4. graphiti-cli → Graphiti MCP → TEI (embed query) → FalkorDB
5. FalkorDB returns: vector-similar nodes + graph-connected relationships
6. Sparky receives: "DGX Spark runs Qwen3-32B at 10.0.128.196"
7. Sparky answers with recalled context

The round-trip for recall is under 3 seconds. Storage takes ~15 seconds due to the entity extraction LLM calls.


Part 10: What This Enables

This isn’t infrastructure for its own sake. Here’s what the stack actually does in daily use:

“Turn off the office lights and close the shades” → Home Agent regex fast-path → Home Assistant → lights off in 500ms, then set_cover_position to 0 via RYSE MQTT bridge.

“What containers are running on tower?” → Infrastructure Agent → SSH to tower → docker ps → formatted response with status, IPs, and uptime.

“Create a WireGuard peer for my new laptop” → Infrastructure Agent → OPNsense API → new peer config generated and displayed.

“Review the latest PR on the agent-api repo” → GitHub Agent → GitHub MCP Server → PR diff fetched (131K context window handles large diffs) → detailed review with line-specific comments.

“What did we deploy last week?” → Sparky → Knowledge Graph → temporal query across episodes → list of deployments with dates, IPs, and outcomes.

“Remember that the wlan0 route on tower breaks DGX connectivity after reboot” → Knowledge Graph → stored as Procedure entity → recalled automatically next time DGX connectivity fails.

The knowledge graph is the force multiplier. Without it, every session starts cold. With it, the agents accumulate operational knowledge that compounds over time. Three months from now, these agents will know the history of every deployment, every workaround, every preference — without anyone maintaining a wiki.


Lessons Learned

Local LLMs change the economics. The DGX Spark running Qwen3-32B handles 80% of agent queries without touching a cloud API. Cloud LLMs (Groq, OpenRouter) are fallbacks, not defaults. Claude via the proxy is reserved for where it matters most: entity extraction (Graphiti) and complex reasoning (OpenClaw’s Opus agent).

Macvlan networking is worth the tradeoff. Clean IPs, no NAT, easy debugging. The loss of inter-container firewall rules is acceptable when every service authenticates at the application layer.

MCP servers are the right abstraction. Instead of building custom integrations for every tool, MCP provides a standard interface that any LLM client can discover and use. Adding a new capability means deploying one MCP server, not modifying every agent.

Patching upstream code is sometimes the only option. When the Graphiti image assumes native OpenAI APIs and you’re running Claude through a proxy, you patch. Three bind-mounted Python files is less maintenance than a fork.

Vault from day one. Every secret in one place with audit logs and short-lived tokens. The initial setup takes an afternoon. The payoff is never wondering where an API key lives or whether it’s been rotated.


The Numbers

MetricValue
Physical hosts3 (OPNsense, Unraid, DGX Spark)
Docker containers24
Local LLMs2 (Qwen3-32B, Qwen2.5-7B)
Cloud LLM providers3 (Groq, OpenRouter, Anthropic)
AI sub-agents4 (infra, home, github, general)
OpenClaw agents2 (Sparky, Dev)
Combined skills24
MCP servers6
Caddy reverse proxy entries16
Vault secret paths15+
Knowledge graph entity types8
Total agent tools50+
GPU memory allocated109GB (of 128GB)

Three hosts. Twenty-four containers. Fifty tools. One knowledge graph. Zero manual memory management.

The agents remember. The graph grows. The homelab learns.

Comments & Discussion

Discussions are powered by GitHub. Sign in with your GitHub account to leave a comment.

About the Author

Vitale Mazo is a Senior Cloud Engineer with 19+ years of experience in enterprise IT, specializing in cloud native technologies and multi-cloud infrastructure design.

Related Posts