← all posts

agent infrastructure

The token efficiency stack — caveman + graphify in Cognitia development sessions

2026-05-11 · by Cognitia

If you're running Claude Code or Codex against a real production codebase, your single biggest cost driver is tokens — both the input tokens the agent reads to understand the codebase, and the output tokens the agent generates to make changes.

We've been running both Cognitia OS and the Cognitia portfolio brands through Claude Code for development. Single agent sessions on the bigger repos (Skillucate, Cognitia AI scaffold, Demandara) were hitting $30–60 per session — manageable, but the math wasn't going to scale to running every brand in parallel.

This week we installed two skills that fixed it: Caveman + Graphify. Stacked, our session costs dropped 70–80%. Here's the actual breakdown.

Caveman — output compression

Open-source Claude Code skill by Julius Brussee. Auto-triggers on phrases like 'caveman mode' or 'be brief'. The agent drops articles, filler words, pleasantries while preserving exact code blocks + technical terms. Average output reduction: 65% across our test prompts.

Six intensity levels from lite (just removes hedging) to ultra (heavy abbreviation) to wenyan (classical Chinese compression patterns for max density).

Token cost impact: a typical 5,000-token response becomes ~1,750 tokens. At Claude Opus pricing that's a meaningful per-call save.

Graphify — input compression

Indexes your codebase into a knowledge graph (tree-sitter AST + optional LLM semantic layer). Installs a PreToolUse hook in Claude Code that consults the graph before every file-search call. The agent then reads only the relevant nodes instead of grepping through everything.

On large codebases (500+ files), reported token savings are up to 70x compared to naive full-corpus reading. Our Cognitia portfolio has roughly 2,000 source files across the 6 brands — exactly the regime where graphify shines.

Privacy preserved — code is processed locally via tree-sitter; nothing leaves the developer's machine unless you opt into a semantic LLM extraction.

Stacked impact in our development sessions

Before: a typical "build a new Cognitia OS skill + write tests + deploy" session ran $30–60 in API costs over 30–60 minutes of agent work.

After (caveman + graphify): same shape of session now $6–15. Same agent, same outputs, dramatically less context wasted.

Stacked the savings compound because the input compression means the agent's context window has more headroom for actual thinking, which means fewer retry / clarification round-trips.

Where they don't help

Single-file edits or small projects (<50 files): graphify's setup cost isn't worth it. Caveman still helps a bit but on small responses the absolute savings are small.

Conversational tasks where output verbosity is the value: customer-facing chatbots, explanatory content. Caveman would produce a worse customer experience.

Bound them to development sessions specifically.

How we installed them

Caveman: one-liner installer at github.com/JuliusBrussee/caveman.

Graphify: `pip install graphifyy && graphify install` (note double-y in package name — there's a typosquat-looking name but it's the official one, verified at PyPI).

Both wire themselves into Claude Code's hook + skill system. Activation phrases trigger them automatically; no manual mode-switching needed.

The broader point

Tooling that reduces agent token cost is the unsexy but compounding category in 2026. Everyone wants to talk about new models; the operators winning are the ones who shipped 30% lower marginal cost on existing models.

Cognitia OS will incorporate similar primitives natively as part of the platform — both caveman-style compression at output and graphify-style indexing at input. The Cognitia portfolio brands are the alpha proving ground.

FAQ

Are these skills production-ready?
Caveman has 51,000+ GitHub stars and active development. Graphify has growing adoption but is newer (under 1k stars). Both have official PyPI / npm releases. We treat them as production-ready for our own portfolio work; YMMV for high-stakes integrations.
Do they work outside Claude Code?
Caveman ships SKILL.md drop-ins for Cursor, Cline, Copilot, Gemini CLI and 40+ other agents. Graphify likewise has CLI install commands for most major agents. Both are agent-agnostic by design.

Want to talk to Cognitia?

Build engagements, advisory, or Cognitia OS alpha — all routes start here.

Reach Cognitia →