I'll be back. And I route tokens properly.
A local-first agent architecture. Query classification, retrieval orchestration and skill execution β controllable and token-efficient. Runs on your hardware, escalates to cloud models on demand.
What is Clawminator?
An open-source project in active development. Not yet publicly available.
Clawminator is an agent architecture, not a chatbot. The core is token efficiency: requests are first classified, then prepared through routing, memory and retrieval, and only enriched with context that is actually relevant. Smart and small, not dumb and big. The result: an agent that works well on 16K context and 9B parameters β or uses the same pipeline with a frontier cloud model. Provider-agnostic: the orchestrator stays the same, local or cloud.
| Architecture | Query Classifier β IR Orchestrator β 7-Layer Router β Skill/LLM Execution |
| Local Reference HW | Mac Mini M4, 16 GB RAM |
| Local Model | qwen3.5:9b (16K context) |
| Cloud Option | Mistral Small (EU servers, GDPR-compliant, 128K context) β more providers planned |
| Interfaces | Telegram (current) Β· Web / API / MS Teams planned |
| Languages | German + English |
| License | MIT (core) β enterprise integration on request |
75 Skills β executable actions without LLM generation
Once the Query Classifier detects a skill action, the Router calls the matching skill function directly. No token generation, no hallucination. The LLM fallback only kicks in if no skill matches.
Files
Create, read, write, search, delete
Weather
Current + forecast via Open-Meteo (no API key)
Timer & Reminders
Cron-based create, list, delete
Memory
Remember, retrieve, search facts. Knowledge Graph (Clawminator) + full-text search (OpenClaw)
System
CPU, RAM, disk, processes, network, WiFi password
Gateway
Status, restart, view config
Browser
Open URLs, navigate, list/close tabs, screenshots
Canvas
Display HTML, hide, execute JS, snapshots
Apple Notes & Reminders
Create, search, list
Spotify
Play, pause, skip (if spotify_player installed)
Philips Hue
Light control (if openhue installed)
iMessage
Send messages via AppleScript
Telegram
Reactions, replies, polls
Sessions
Status, history, manage sub-agents
Nodes / Devices
Camera, location, battery, notifications
Health Check
Ollama + Gateway + Disk + RAM at a glance
13 Slash Commands β 0ms, no LLM
52 SKILL.md Templates β for creative tasks
When the language model needs to think (poems, explanations, code), the matching skill template is loaded via FTS5/BM25 (~2K tokens). The rest of the 16K budget stays available for chat.
Three-layer architecture
Every message passes through three layers β the first one that matches, responds.
Slash Commands < 100ms 0 Tokens
Instant, deterministic, no model woken up.
Classifier + Router + Skill Execute ~50ms 0 meaningful tokens
The Query Classifier (multi-head neural network) detects intent, domain and task type. The 7-Layer Router picks the matching skill function. Executed directly β no LLM generation, no hallucination.
LLM + Context Pipeline 2β15s 500β4,000 Tokens
Everything that requires thinking β explanations, analysis, code, creative writing β goes through the full context pipeline to the local model (or optionally to cloud).
β Layer 2 classifier+router+weather 280ms Β· 0 output tokens
π€οΈ Vienna: 22Β°C, partly cloudy, no rain.
You: Explain monads in Haskell
β Layer 3 qwen3.5:9b HEAVY Β· 8.2s Β· 2.1K Tokens
A monad is a structure that sequences...
You: /cyberclaw
β Layer 1 slash <1ms Β· 0 Tokens
CPU: 12% Β· RAM: 8.2/16 GB Β· Disk: 142 GB free
Under the hood β details for developers
Expand if you want the full picture.
Query Classifier β Multi-Head Neural Network, 4 parallel classifications
Before anything else happens, Clawminator classifies the request. A shared embedding (nomic-embed-text, 768 dimensions) is computed once and consumed by four parallel heads:
| Head | Classes | Purpose |
|---|---|---|
| intent-head | 3 | actionable / conversational / ambiguous |
| domain-head | 17 | weather / calendar / email / ... (domain routing) |
| task-type-head | 8 (multi-label) | small_talk, skill_action, knowledge_personal, knowledge_general, safety_critical, meta_question, follow_up, multi_action |
| segmentation-head | Token-level BIO | Splits multi-action queries into atomic tasks |
Downstream: Anaphora Resolver (layers AβD local, layer E optional cloud) resolves "it", "there", "that" against the dialog state. Dependency Detector (deixis + lastResult slot) detects references to previous answers. Output: a query plan with a list of classified segments.
IR Orchestrator β Reciprocal Rank Fusion over 4 stores
Information retrieval is not "query one vector DB". Clawminator orchestrates four parallel sources policy-driven β how strongly each source is weighted depends on the task type.
| Store | Content | Retrieval |
|---|---|---|
| Knowledge Graph | Entity gazetteer + facts | Lookup + FTS fallback |
| Memory Chunks (Session) | Dialog history | BM25 + sqlite-vec cosine |
| Memory Chunks (Workspace) | SOUL / IDENTITY / USER.md | Persistent profile data |
| Document Store | User uploads: PDF/TXT/MD/DOCX | Chunk-based indexing |
RRF Fusion (Reciprocal Rank Fusion): Results from all sources are fused via a policy matrix. Example: for knowledge_personal, the Knowledge Graph weighs more; for knowledge_general, the Document Store does. No static merging β the weights come from the query classifier output.
7-Layer Router β ADR-021, policy-driven skill matching
The router runs through seven layers in fixed order. Each layer can match and answer directly, match and forward, or not match at all.
| Layer | Function |
|---|---|
| 2.0 Slash/Regex Support | Explicit commands and regex skills |
| 2.1 Entity Gazetteer | Known entities from the KG |
| 2.2 KG-First (fact queries) | For knowledge_personal β KG lookup first |
| 2.3 Dialog-State-Prior | Context from the previous answer |
| 2.4 Hybrid BM25 + Dense + RRF | CORE β fusion of text and vector search |
| 2.5 MLP Domain Gate | Boost for domain-specific skills (no block) |
| 2.6 Cross-Encoder Re-rank | Optional β expensive, but precise |
After the router: Confidence Gate (ADR-025). HIGH (gap > 0.02) β execute skill. MEDIUM (0.005β0.02) β Intent-LLM (qwen3.5:9b local, 1-token answer, 200β500ms) clarifies. LOW (< 0.005) β forward to Layer 3 LLM.
Layer 3 LLM Pipeline β Bridge-Tool, Capability Injection, Context Guard
When a request really does need the LLM, it doesn't just go in as a prompt. Four stations before it keep the context small and relevant:
| Station | Function |
|---|---|
Bridge-Tool execute_action | ~650 token budget. Unified entry point for LLM-driven skill calls. Replaces uploading all 26 tool definitions. |
| Capability Injection | Only the relevant capabilities for the current request are injected. Instead of 8,000 tokens for all tools: 400β800 for the right ones. |
| Context Engine | Two-slot system: systemMessages[0] static (KV-cache friendly), systemMessages[1] dynamic with IR chunks from the orchestrator. |
| Context Guard | Rule-based pruning at 70% context fill. Cleans up before the model can hallucinate because the context got too full. |
Quality Gate & Mistral Improve β Output check, async fallback
After the LLM response, a rule-based Quality Gate checks the output locally β no additional LLM call. It looks for typical failure patterns:
too_shortβ response truncated or emptyrepetitionβ model repeats itself (infinite loop)placeholderβ "[insert answer here]" or similarrefusalβ unwanted "I can't do that"lang_mixβ wrong language or mixed
Mistral Improve (async Fire-and-Forget): When cloud=on and the Quality Gate flags a FAIL, the response is sent to Mistral Small in the background for an improved version. The first (local) answer goes out immediately β no extra waiting for the user. The improvement arrives as a follow-up marked with a βοΈ icon.
Complexity Router β one model, four tiers, optional cloud
Not every request needs a full generation budget. The router analyzes the complexity of each message heuristically (no LLM call, <5ms) and automatically selects token budget and execution target.
Local model: qwen3.5:9b runs on Mac Mini M4 with 16 GB RAM. The trick: it's not the model that has to be small, it's the context. Through classifier + retrieval only the truly relevant context enters β 16K tokens are enough.
Four complexity tiers:
| Tier | Max Tokens | Target | Example |
|---|---|---|---|
| SUPER_LIGHT | 256 | Local | "Hello", "Thanks" |
| LIGHT | 512 | Local | Simple questions |
| MEDIUM | 2,048 | Local or cloud (if cloud=on) | Explanations, summaries |
| HEAVY | 3,072 | Local or cloud (if cloud=on) | Code generation, analysis |
Cloud option: Currently integrated is Mistral Small with 128K context. Chosen for EU servers and GDPR compliance. More providers are planned β routing stays identical regardless of which model ends up receiving the prompt.
Hardware profile system β current reference + planned profiles
Hardware profiles as JSON files with schema validation. Each profile defines model selection, context budget, memory parameters, timeouts and Ollama tuning.
Currently tested:
| Profile | RAM | Model | Context | Status |
|---|---|---|---|---|
| reference | 16 GB | qwen3.5:9b | 16K | Mac Mini M4 β operational |
| tiny / medium / large | 4β64+ GB | β | β | planned |
What each profile will control: model configuration, context budget with ratios (system prompt 8β20%, memory 10β40%, history 37β45%, tools 10%), memory parameters (maxResults, minScore, fusion weights), stability timeouts and Ollama-specific tuning (keepAlive, flashAttention, kvCacheType).
Memory & Knowledge Graph β Stanford Generative Agents scoring
OpenClaw provides the base: SQLite with FTS5 full-text search and vector embeddings. That works.
Clawminator adds: A Knowledge Graph as SQLite triple store. Two tables (entities, relations) in memory.db. Traversal via recursive CTEs β up to 3 hops. Temporal weighting and confidence-based extraction via regex + LLM.
Retrieval scoring (based on Park et al., 2023):
score = 0.35 Γ weight + 0.35 Γ confidence + 0.30 Γ recency
| Factor | Mechanism |
|---|---|
| Weight | +0.15 per repetition (Ebbinghaus-inspired) |
| Confidence | Per source: user self-report 0.95, LLM-extracted 0.50, system seed 0.40 |
| Recency | 7-day exponential decay |
Temporal Contradiction Detection: Single-valued predicates automatically invalidate outdated facts ("lives in Vienna" replaces "lives in Berlin").
Cross-system index: KG facts are synchronized into full-text search.
Hybrid retrieval (OpenClaw): BM25 + vector cosine fusion, configurable per profile.
Memory lifecycle: Age-based expiry (default 90 days) + count-based trim. WAL mode prevents locking issues. Seed protection: seedFromWorkspace() doesn't overwrite user-set values on restart.
Anti-hallucination: Extracted entities must appear verbatim in the user message. No phantom connections.
Fork strategy β plugin-first, minimal core patches
Clawminator uses a hybrid approach: β₯90% plugin code, β€10% core patches. OpenClaw is included as a git subtree with automated weekly sync via GitHub Actions.
This means: upstream updates from OpenClaw flow in regularly without breaking Clawminator code. The few core patches are clearly documented and isolated.
Testing β 655 automated + 170 manual tests
Clawminator follows an ASPICE-adapted requirements process with formal IDs, MoSCoW priorities and verification methods.
| Category | Count | Coverage |
|---|---|---|
| Automated | 655 | Skills, router, memory, profiles, gateway |
| Manual | 170 | Telegram integration, macOS features, E2E |
| Total | 825 | All layers + integrations |
The manual test suite covers everything automation can't: Telegram messages, Spotify control, iMessage, camera triggers, Apple Notes β real macOS interactions that need a running desktop.
Health Monitor β real-time dashboard in browser
Clawminator provides a built-in health endpoint at /clawminator/health that shows the overall system state at a glance β directly in the browser, no extra tools needed.
What the health monitor checks:
| Check | What's verified |
|---|---|
| Ollama | Reachability, loaded models, VRAM usage |
| Gateway | Process status, uptime, active sessions |
| Memory | SQLite state, Knowledge Graph size, FTS5 index |
| Disk | Free space, model directory |
| RAM | System memory, swap usage |
| Telegram | Bot connection, last message |
18789 β only accessible on localhost, no external access. No authentication needed because nothing leaves the device.Why local instead of cloud?
Five reasons. No marketing. No problemo.
π Privacy
Everything stays on your device. No telemetry, no tracking, no data shared with third parties.
π° Local-first, cloud-optional
The agent runs locally at zero cost. 90% of your requests are handled on-device β offline, in milliseconds, free.
π‘ Always available
Works without internet (except weather + web search). Your assistant is never offline.
β‘ Fast
Skills respond in 50ms, LLM in 2β15 seconds. No waiting for cloud latency.
π§ Orchestrator mode
Too complex for the local model? Clawminator can delegate tasks to Claude Code, Codex, Gemini or other CLI agents β directly from the server, no API keys to configure in Clawminator. You decide when and whether external help is involved.
Built on OpenClaw
OpenClaw is the framework. Clawminator is a specialization for local hardware.
OpenClaw is a generic open-source AI gateway β it supports cloud LLMs and local models (via Ollama) equally. 26 native tools, 53 bundled skills, 13,700+ community skills on ClawHub. Clawminator is a specialized configuration for a specific use case: local models with 16K context on consumer hardware, where every token counts.
| OpenClaw | Clawminator | |
|---|---|---|
| Approach | Generic framework for all model sizes | Specialized for local models, consumer HW |
| Context Window | Typically 128K+ tokens | 16K tokens (every token budgeted) |
| LLM Tools | 26 native (Clawminator uses these in layer 3) | + 13 own slash commands (0 tokens) |
| Deterministic Skills | β | 75 skills via classifier+router (0 tokens, ~50ms) |
| Model Routing | 1 model configurable | Local + cloud option, complexity analyzer (<5ms) |
| Knowledge Graph | β | SQLite triple store with Stanford scoring |
| Languages | English | German + English |
| Target audience | Developers, power users, all model sizes | Token-efficient setups, 16K context range |
OpenClaw is a generic framework that works with all model sizes. Clawminator specializes in the 16K context range and adds its own layers: a multi-head Query Classifier, 75 deterministic skills, a 7-Layer Router, Quality Gate with async improve loop, and a Knowledge Graph with Stanford scoring.
Mission Timeline
What exists β and what's coming.
β οΈ Planned features do not exist yet β they are intended for future versions.
License & Collaboration
Open source core. Enterprise integration on request.
π MIT License β Core
The full architecture will be publicly available under MIT license once the test suite is complete: query classifier, IR orchestrator, 7-layer router, memory system with Stanford scoring, CLI tooling, all ADRs. Use it, fork it, build on it.
π§ Available separately
Trained classifier weights, enterprise connectors (MS Teams, SharePoint, M365, SAP), deployment automation and domain-specific fine-tunings are not released as open source. These parts emerge from projects with companies that need them.
π¬ Open for conversations
I'm a requirements engineer and software system designer with an automotive background (ASPICE, Bosch since 2017). Clawminator is the combination of both: RE discipline applied to AI agents. Open to consulting, integration projects, and the right full-time role.
Clawminator is in development. Stay tuned. π¦
The project is not yet publicly available. Sign up and we'll notify you about progress, beta access and release. No spam β only real updates.