Category: AI

Let’s be real, local-first tooling is an essential capability when your internet is not stable, in a world where every cloud service out-there is frequently crashing, not to mention speed-privacy-ownership concerns while sending code, prompts, logs, or half-baked ideas to a remote API.

To avoid all these noise and make my workflow more autonomous-quicker-stable-private I made a machine where I can code with an AI assistant Locally and Fully Offline.

I do not think everyone should run a local coding model, this is still a serious tech task. But if you enjoy owning and playing with your stack, and you have the hardware for it, it can be a very satisfying experience.

What I’ve used for the AI coding assistant that can run without the cloud

This is a Debian machine with 64 GB GPU and OpenCode as the coding agent with llama.cpp .

When you get your local llama-server running, OpenCode talks to it like it would talk to any OpenAI-compatible provider.

The difference is that the whole loop stays on my local machine.

Neat!

Be warned – Local AI assistant is hungry for GPU memory

Running Qwen3.6 27B Q8_0 with 256k context in reasoning mode loads around 50GB of the GPU memory and gives around 64 tokens/s for prompt+generation.

That is quite good for a local model with that much context.

There are some Caveats

Qwen3.6 27B at 256k context is about three times slower compared to a hosted frontier model.

You have to care about model storage, updates, server flags, GPU memory, and cooling.

Category: AI

The AI industry has poorly thought through the trust model for agents operating in untrusted environments

This is because the model itself doesn’t have a hardened boundary there by design.

Experiences from Setting Up Fully Offline Local Only AI Assisted Workstation

What I’ve used for the AI coding assistant that can run without the cloud

Be warned – Local AI assistant is hungry for GPU memory

There are some Caveats