Rocket.Chat, local LLMs, and a bot that knows when to shut up about HA

A personal chatbot wired to Rocket.Chat does not have to be a SaaS black box. The interesting parts are all in plain Python: incoming webhooks, a “⏳ Thinking…” message you later replace with the real answer, and an orchestrator that decides whether you are asking for lights, bus departures, or a normal chat turn. Below is how that hangs together — not as a tutorial you must copy, but as a map of where the sharp edges are.

Why update a message instead of posting three new ones?

Rocket.Chat lets you spam a channel, or you can behave like a product: post once, show progress, then mutate that post when work finishes. The REST API’s chat.update is the difference between a bot that feels like a CLI and one that feels like Slack-era UX.

Conceptually:

# Pseudocode shaped like the real flow
thinking_id = post_placeholder(room_id, "⏳ _Thinking…_")
try:
    reply = orchestrate(user_text, room_id)  # slow: LLM, tools, ASR…
finally:
    update_message(room_id, thinking_id, reply)  # or fall back to a fresh post

If update fails (network blip, permission edge case), you fall back to a webhook payload — users still get an answer; they just see two bubbles instead of one polished edit.

The webhook payload is your contract

Whatever Rocket.Chat sends to your Flask (or FastAPI) handler, normalize it once: text, user_name, channel_id, and attachment metadata. Everything downstream — history keys, “thinking” placement, HA routing — hangs off those fields.

def normalize_incoming(data: dict) -> tuple[str, str, str]:
    user_msg = (data.get("text") or "").strip()
    user_id = data.get("user_name") or data.get("user", {}).get("username") or "rocket_user"
    room_id = data.get("channel_id") or data.get("rid") or ""
    return user_msg, user_id, room_id

Attachments are a parallel path: images and PDFs never touch the same code path as plain text until you have bytes in hand. Voice goes to ASR first; the transcript becomes user_msg for the rest of the pipeline.

Voice: wake phrases as a poor man’s intent router

Voice is where wake phrases earn their keep. If someone says “hey kivi, turn off the office lights,” you strip the prefix and force Home Assistant tooling — otherwise the model might improvise a story about MQTT. Text chat can use the same trick for consistency.

# Longest phrase wins so "hey kivi" beats "kivi"
def strip_wake_phrase(text: str, phrases: frozenset[str]) -> tuple[bool, str]:
    lower = text.lower().lstrip()
    for phrase in sorted(phrases, key=len, reverse=True):
        if lower.startswith(phrase):
            rest = lower[len(phrase):].lstrip(" ,.:!?")
            return True, rest or text
    return False, text

The ASR service itself is just another HTTP boundary: send audio bytes, get text, treat errors as first-class replies (“service down” beats silent failure).

The orchestrator’s one non-negotiable rule

For Home Assistant–shaped queries, tool calling failure is not an excuse to free-chat. If MCP tools are unavailable, you return a boring error string. The user might be annoyed; they will not act on a hallucinated light.living_room state. That single policy saves more family arguments than any system prompt.

def ask_with_tools(messages, is_ha_query: bool):
    try:
        return ollama_tool_loop(messages)
    except ToolError:
        if is_ha_query:
            return "Home Assistant tools are temporarily unavailable. Try again shortly."
        return plain_llm_fallback(messages)

Everything else — optional web search, regex fast-paths for transit commands, !search — is negotiable. HA is not.

Shipping it without Kubernetes guilt

For the hobby deployment, “GitOps” is a shell script and SSH. The ASR stack builds a Docker image on the target; the bot is plain files plus systemd.

#!/usr/bin/env bash
# Trimmed from a real sync — rsync bot code, then bounce the unit
set -euo pipefail
rsync -avz --exclude="__pycache__" ./chat_ai/ rocket@bot:/home/rocket/scripts/chat_ai/
rsync -avz ./chat-ai.py rocket@bot:/home/rocket/scripts/
ssh rocket@bot 'sudo systemctl restart chat-ai.service && systemctl is-active chat-ai.service'

A minimal unit file is enough to keep the process honest:

[Unit]
Description=Rocket.Chat AI bridge
After=network-online.target

[Service]
Type=simple
User=rocket
WorkingDirectory=/home/rocket/scripts
ExecStart=/usr/bin/python3 /home/rocket/scripts/chat-ai.py
Restart=on-failure
RestartSec=3

[Install]
WantedBy=multi-user.target

No Helm, no Argo — but the same questions you would ask in prod still apply: who can SSH, who owns the token file, and what happens when Ollama restarts mid-request? The script is not magic; it is a contract that “works on my machine” becomes “works where Rocket.Chat can reach it.” Platform work at scale is the same idea with different nouns.

A closing image

Picture three lanes merging: Rocket.Chat (human timing), local LLM (probabilistic), Home Assistant (deterministic). The orchestrator is the traffic cop. Creative models belong in the chat lane; the house does not run on vibes — it runs on entity IDs and tools that return strings your automation can trust.