5 Things That Break When You Self-Host OpenClaw (And How We Fixed Them)
Real OpenClaw self-hosting errors: port 18790 conflict, Node version mismatch, Telegram webhook failure, context overflow crashes, and gateway restart loops.
The gap between "it should work" and "it works"
The OpenClaw docs are good. The install instructions are clear. And yet, a predictable set of failures hits almost every self-hosting attempt. Not because the software is buggy — but because the real world is messier than any install guide anticipates. Different OS versions, conflicting services, non-obvious dependency interactions, and infrastructure state that nobody documented.
This post covers the five most common self-hosting failures we've seen, with the exact error messages they produce, the root cause, the fix, and — critically — why none of them exist on managed hosting. Not to sell you on managed (though it will), but because understanding why the managed version works helps you understand what the self-hosted version is actually doing under the hood.
Break #1: Port 18790 conflict
The error: Error: listen EADDRINUSE: address already in use :::18790
What causes it: Port 18790 is the OpenClaw gateway default. Something else is already listening on it — usually a previous failed OpenClaw install, a zombie process that didn't clean up, or another service that happened to grab the same port.
How to find it: lsof -i :18790 or ss -tulpn | grep 18790. This shows you what's holding the port and its PID.
The fix: Kill the conflicting process (kill -9 <PID>) or change the OpenClaw port in openclaw.json under gateway.port. If you change the port, update your Telegram webhook URL to match — otherwise your bot receives nothing.
Why managed doesn't have this: Each managed instance runs in an isolated container with its own network namespace. Port conflicts between containers are impossible by design. The gateway port is internal to the container and never conflicts with anything.
Break #2: Node version mismatch
The error: SyntaxError: Unexpected token '?' or TypeError: crypto.webcrypto is undefined or any number of similar cryptic failures deep in the dependency tree.
What causes it: OpenClaw requires Node 20+. Optional chaining (??), crypto.webcrypto, and several other features used throughout the codebase don't exist in older Node versions. The errors surface far from the root cause — you're debugging a symptom, not the problem.
How to diagnose it: node --version. If it's below 20, that's your problem. On Ubuntu 20.04, the default apt Node is 12. On Ubuntu 22.04, it's 18. Neither is sufficient.
The fix: Install Node 20+ via nvm (recommended) or NodeSource. nvm install 20 && nvm use 20 && nvm alias default 20. Then reinstall OpenClaw dependencies from scratch — don't try to fix a partially-installed state.
Why managed doesn't have this: The managed runtime is pinned to a tested Node version and updated in a controlled way. Version drift is impossible.
Break #3: Telegram webhook not connecting
The error: Bot starts but never responds. Running curl https://api.telegram.org/bot<TOKEN>/getWebhookInfo returns: "last_error_message": "SSL certificate verification failed" or "Connection refused" or "Webhook was not set".
What causes it: Three distinct root causes depending on the error message. SSL failure = your cert is invalid, expired, self-signed, or not trusted by Telegram's servers (they use a strict set of trusted CAs). Connection refused = your nginx reverse proxy isn't running, is misconfigured, or the port it's proxying to is wrong. Webhook not set = the webhook registration step failed silently, usually because the URL was wrong at registration time.
The fix by case: SSL — use Let's Encrypt with certbot, verify with openssl s_client -connect yourdomain.com:443. Connection refused — check nginx status, verify proxy_pass port matches your OpenClaw gateway port, reload nginx with sudo systemctl reload nginx. Webhook not set — re-register by calling the Telegram setWebhook endpoint directly with your full HTTPS URL. Then verify with getWebhookInfo that the pending_update_count starts incrementing.
Why managed doesn't have this: The managed platform auto-registers your Telegram webhook on deployment using a verified HTTPS endpoint with a trusted cert. You never touch nginx, certbot, or webhook registration.
Break #4: Context overflow crashes
The error: Error: context_length_exceeded — This model's maximum context length is 200000 tokens or the agent simply stops responding mid-conversation with no error, restarting silently.
What causes it: OpenClaw maintains conversation context across sessions. Over time, especially with agentic workflows that produce long outputs (code, research, multi-step tasks), the context window fills up. When it hits the limit, the request fails. If your error handling isn't configured correctly, the gateway may restart rather than gracefully truncating context.
The fix: Configure maxContextTokens in your agent config and enable context trimming — this automatically drops the oldest messages when the window is at 80% capacity, keeping system prompt and recent messages intact. You also need to set gracefulContextReset: true to prevent gateway crashes on overflow. Both settings are in openclaw.json under agent.context.
Why managed doesn't have this: Context management is handled automatically in the managed runtime. The platform monitors context size per session and applies intelligent trimming before hitting the limit. No configuration required.
Break #5: Gateway restart loops
The error: pm2 logs openclaw-gateway shows: App [openclaw-gateway] with id [0] restarted | Restart #47. The gateway keeps restarting every 30-60 seconds. Sometimes it works briefly before crashing again.
What causes it: Almost always one of three things: (1) A skill is throwing an unhandled exception on startup — the gateway loads skills at boot and a bad require() or missing environment variable in a skill file can crash the entire process. (2) A memory leak in a long-running agent session — after 8-12 hours of continuous use without a restart, some configurations hit a Node.js heap limit. (3) A corrupt SQLite database file — usually from a hard VPS shutdown mid-write, leaving the WAL file in an inconsistent state.
The fix depends on the cause: For skill errors — pm2 logs openclaw-gateway --lines 200 to find the last error before restart, then fix the offending skill. For memory leaks — add --max-old-space-size=512 to the Node startup args and schedule a nightly restart. For SQLite corruption — sqlite3 .openclaw/memory.db "PRAGMA integrity_check"; if it fails, restore from backup (you have backups, right?).
Why managed doesn't have this: The managed runtime uses a supervised process manager with intelligent restart policies, automatic memory monitoring, and distributed SQLite with WAL protection. No restart loops. Managed instances have 99.9% uptime SLA.
The pattern across all five failures
Notice what all five breaks have in common: they're not bugs in OpenClaw. They're the unavoidable complexity of running a multi-service runtime in a real-world environment. Every self-hosted system accumulates this kind of operational debt. It's the nature of owning infrastructure.
The managed platform exists precisely to absorb this complexity. We've seen all five of these failures — plus dozens more edge cases — and built systems that prevent them. Not because self-hosting is wrong, but because most people don't want to be infrastructure operators. They want an AI agent.
If you've already hit one of these errors and you're done debugging for the day, the managed version deploys in under 5 minutes. Everything above is handled. Your agent is running and you never think about gateway ports, Node versions, or webhook registration again.
Deploy managed — skip the debugging ->
Copy the link to this article and send it to your OpenClaw agent. It will read the guide, apply the relevant setup steps, and configure itself automatically — no manual work required.
Ready to deploy your AI agent?
Launch on your own dedicated cloud server in about 15 minutes.