How I use my Claude subscription as a free API — the OAuth proxy that changes everything

I have a Claude Max subscription. I pay for it. I use it all day with Claude Code in the terminal. But when I want my AI agent (Hermes) to use Claude Sonnet to respond to me via Telegram... I need an API key that costs extra. Why can't I reuse what I already pay for?

Today we solved it. And the result is an 80-line Node.js proxy that changes the rules of the game.

The problem

Claude subscription (Pro, Max) gives you access to the models through the web and Claude Code CLI. But if you want to use Anthropic's API from your own code — a Telegram bot, an agent, an app — you need an API key with separate billing. Input tokens, output tokens, everything billed separately.

I have an agent called Harvie running on my VPS. It's a personal assistant that talks to me via Telegram, manages my training data, prepares my daily briefing and executes tools. Until now it used a free model (StepFun via OpenRouter), but I wanted to give it the power of Claude Sonnet. The problem: I didn't want to pay twice.

The discovery

When you log in with Claude Code CLI (claude), an OAuth token is generated and saved in ~/.claude/.credentials.json. That token is what authenticates all your requests with your subscription. The question was: can I use that same token to make API calls from outside Claude Code?

The answer: yes, with two key tricks.

The two tricks that make it work

1. Authorization: Bearer instead of x-api-key

Anthropic's normal API keys use the x-api-key header. But OAuth tokens use Authorization: Bearer ***. They're different authentication mechanisms. If you send an OAuth token as x-api-key, it doesn't work.

2. The secret header: anthropic-beta

Anthropic's API rejects OAuth tokens unless you include a specific header:

anthropic-beta: oauth-2025-04-20

Without this header, the API returns an error. With it, it accepts your OAuth token and responds as if it were a normal API key. Your Max subscription covers the cost.

The solution: a local proxy

Instead of modifying every application that wants to use the API, we set up an intermediate proxy:

Hermes / curl / your app
        ↓
  Local proxy (:[PORT])    ← injects OAuth token + headers
        ↓
  api.anthropic.com       ← responds with your subscription

The proxy is a Node.js server that:

Listens on http://127.0.0.1:[PORT]
Reads Claude's OAuth token on each request
Replaces authentication with Bearer <token>
Adds the header
Forwards to api.anthropic.com and streams the response back

From Hermes' perspective (or any other app), it's like talking to Anthropic's API directly. It doesn't know there's a proxy in the middle.

The invisible bug that cost us hours

The first version of the proxy seemed to work. The test with curl returned perfect responses. But Hermes failed with "Connection error" after each request — even though the proxy logged a successful HTTP 200.

The problem? gzip.

Node.js's fetch() automatically decompresses gzip responses. But it doesn't remove the content-encoding: gzip header from the response. So the proxy was forwarding already decompressed data with a header saying "this is compressed". Anthropic's Python SDK (which Hermes/openclaw uses) tried to decompress plain text data and silently exploded.

The solution is surgical: three deletes in the headers.

// In REQUEST headers (don't ask upstream for gzip)
delete headers["accept-encoding"];

// In RESPONSE headers (don't lie to the client)
delete resHeaders["content-encoding"];
delete resHeaders["content-length"];

Without this, the proxy seems to work but breaks any Python client using streaming. It cost us hours of debug with detailed logs in the proxy to catch it.

Another bug: the compression model

Hermes uses context compression — when the conversation gets very long, it summarizes it with a cheap model. We had configured google/gemini-3-flash-preview for that. The problem: that model was routed through the Anthropic proxy, which only accepts Claude models. Result: HTTP 404.

The solution: force the compression model to use OpenRouter directly.

# In Hermes' config.yaml
context_compression:
  summary_model: "google/gemini-3-flash-preview"
  summary_provider: "openrouter"  # DON'T leave on "auto"

The result

Now my Hermes agent responds via Telegram using Claude Sonnet 4, through my Max subscription. No extra API keys. No additional billing. The same subscription I already pay for.

Johnny (Telegram) → "What do I train today?"
        ↓
Hermes Agent → queries COROS data, reads weekly plan
        ↓
OAuth Proxy → injects token, calls Anthropic
        ↓
Claude Sonnet 4 → generates response
        ↓
Johnny (Telegram) ← "Today is interval run: 3km warmup..."

The proxy runs as a systemd service, starts automatically with the server, survives reboots. Logs go to a file for debugging. The token is re-read on each request, so if it expires you just need to open claude once to refresh it.

Limitations (the ones you need to know)

Shared rate limit: If you use Claude Code CLI at the same time as Hermes, they compete for the same rate limit.
Token with expiration: The OAuth token isn't eternal. When it expires, you need to run claude to regenerate it. In practice, it lasts quite a while.
Localhost only: The proxy listens on 127.0.0.1 — it's not accessible from outside the server. This is a security feature, not a bug.
Only Anthropic models: You can't pass Google or OpenAI models through this proxy. Only Claude (Haiku, Sonnet, Opus).

How to set it up yourself

If you have a Claude subscription and a VPS:

1. Install Claude Code CLI and log in:

npm install -g @anthropic-ai/claude-code
claude  # follow the OAuth login

2. Verify you have the token:

python3 -c "import json; d=json.load(open('$HOME/.claude/.credentials.json')); print(d['claudeAiOauth']['accessToken'][:20]+'...')"

3. Create the proxy (80 lines of Node.js) and configure it as a systemd service.

4. Test:

curl http://127.0.0.1:[PORT]/v1/messages \
  -H "content-type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-haiku-4-5-20251001","max_tokens":10,"messages":[{"role":"user","content":"hello"}]}'

If you see a JSON response with "content", it works. Point your app to http://127.0.0.1:[PORT] and you're done.

All the proxy code and an automatic installation script are available. If you're interested, write to me.

What's next

With Sonnet powering Hermes, the responses are much more powerful. The next step is to migrate completely from OpenClaw to Hermes as my main agent — something that's already underway. I also want to explore if the proxy can serve other purposes: Next.js apps that use the API, automation scripts, or even sharing it with other developers who have Max subscriptions.

The subscription you already pay for can do much more than you think.

— I, Johnny — configured agent: Harvie. The future isn't waited for, it's built — and yes, sometimes with the help of a machine.