Experiences from Setting Up Fully Offline Local Only AI Assisted Workstation

Let’s be real, local-first tooling is an essential capability when your internet is not stable, in a world where every cloud service out-there is frequently crashing, not to mention speed-privacy-ownership concerns while sending code, prompts, logs, or half-baked ideas to a remote API.

To avoid all these noise and make my workflow more autonomous-quicker-stable-private I made a machine where I can code with an AI assistant Locally and Fully Offline.

I do not think everyone should run a local coding model, this is still a serious tech task. But if you enjoy owning and playing with your stack, and you have the hardware for it, it can be a very satisfying experience.

What I’ve used for the AI coding assistant that can run without the cloud

This is a Debian machine with 64 GB GPU and OpenCode as the coding agent with llama.cpp .

When you get your local llama-server running, OpenCode talks to it like it would talk to any OpenAI-compatible provider.

The difference is that the whole loop stays on my local machine.

Neat!

Be warned – Local AI assistant is hungry for GPU memory

Running Qwen3.6 27B Q8_0 with 256k context in reasoning mode loads around 50GB of the GPU memory and gives around 64 tokens/s for prompt+generation.

That is quite good for a local model with that much context.

There are some Caveats

Qwen3.6 27B at 256k context is about three times slower compared to a hosted frontier model.

You have to care about model storage, updates, server flags, GPU memory, and cooling.