Skip to main content

Command Palette

Search for a command to run...

I Built a Free Local AI Coding Assistant with Warp Terminal and Ollama — and It Completely Replaced GitHub Copilot

Updated
4 min read
I Built a Free Local AI Coding Assistant with Warp Terminal and Ollama — and It Completely Replaced GitHub Copilot

Last night at 10:30 PM, I was about to add a batch retry feature to my scraper script.

GitHub Copilot's subscription had just expired. I stared at the empty editor for a moment, then remembered seeing something on Reddit about Warp terminal's AI features — "use plain English to operate the command line," no internet required, no subscription fee.

I decided to give it a try.

My Problem: AI Coding Tools Are Getting Expensive

GitHub Copilot runs $10/month, and Claude API charges by the token. For an indie developer like me, AI expenses have become something I actually have to budget for. More importantly, some of my projects involve internal data, and I really don't want my code and context passing through third-party servers.

That night, I searched around and found a combination mentioned in a Reddit r/programming thread: Warp Terminal + Ollama. Warp is an AI-powered terminal, and Ollama is a local LLM runtime. Together, they promised "zero cost, zero privacy concerns" AI coding.

Step One: Install Warp Terminal

Honestly, at first I just wanted a prettier terminal.

Warp (warp.dev) has been trending on Hacker News for the past two years. Built in Rust, it supports macOS, Windows, and Linux. Its biggest differentiator from traditional terminals is the built-in AI Block feature — you can describe what you want to do in plain English, and the AI generates the corresponding commands, even executing them directly.

Installation is straightforward:

# macOS
brew install warp

# Windows
winget install warp

# Linux
curl -fsSL https://apt.packages.warp.dev/warp-signing.gpg | sudo apt-key add -
echo "deb https://apt.packages.warp.dev stable main" | sudo tee /etc/apt/sources.list.d/warp.list
sudo apt update && sudo apt install warp

After installing, I enabled Warp's AI Block feature — Settings → AI Features → turn on "Block with AI."

Step Two: Run Ollama Locally

Next up: Ollama.

Ollama (ollama.com) has been the hottest local LLM runtime tool for the past couple of years. Its core selling point is "run a model with one command." It automatically downloads model files and runs them locally, with no cloud services required.

Installation steps:

# macOS / Linux
curl -fsSL https://ollama.com/install.sh | sh

# Windows
# Just download OllamaSetup.exe from ollama.com/download

Once installed, pull a model suitable for coding:

# Qwen2.5:1.5B — compact, decent coding ability, runs fine on 8GB MacBook
ollama run qwen2.5:1.5b

# If your machine is beefier, try 7B or 14B
ollama run qwen2.5:7b

I ran Qwen2.5:1.5B on Windows. On first launch, Ollama automatically downloaded the model file (about 1GB), then ran it locally. Zero latency, zero cost.

Step Three: Connect Warp and Ollama

This was the most delightful part.

Warp's AI Block defaults to a cloud AI provider (requires internet), but it also supports custom AI providers. I followed the docs to set Ollama as Warp's local AI backend:

  1. Open Warp → Settings → AI Providers → Add Custom Provider
  2. API Endpoint: http://localhost:11434 (Ollama's default port)
  3. Model: qwen2.5:1.5b

Now, in Warp, I press Cmd/Ctrl + K to bring up the AI Block, type a description in plain English, and Warp calls my local Ollama model to generate commands or code.

Real Results: I Used It for an Entire Evening of Development

That night, I used it for the following:

  • Generate Regex: I typed "write a regex that matches Chinese email addresses," and Warp AI instantly returned an accurate regex expression. I copied it directly into my code.
  • Explain Error Logs: I pasted a stack trace and asked "what does this error mean?" Ollama responded in clear English with an explanation and a fix suggestion.
  • Write Retry Logic: I described what I needed — "add an exponential backoff retry mechanism to this scraper, max 5 attempts" — and Warp AI generated the complete Python code. I reviewed it and used it directly.

Throughout the entire session, I never opened Copilot, never searched online, never waited for an API response.

What made me even happier: since Ollama runs completely locally, my code and context never left my machine. This "privacy peace of mind" is something paid APIs simply can't provide.

The Bottom Line

Pair Warp Terminal with Ollama, and you get a completely free, fully local, zero-latency AI coding assistant. For indie developers, this combo's cost-to-value ratio far exceeds any paid subscription.

The only prerequisite is having enough RAM on your machine (8GB+ to run the 1.5B model comfortably).


rayslifelab.com

37 views

More from this blog

拒绝被AI套牢:我是如何用自带密钥把推理成本压到零的

我拒绝每个月给AI公司送钱后,发现了一个更香的办法。 事情是这样的。前几个月不是AI火嘛,我也跟风搞了个小工具。刚开始一切美好——AI公司白给的免费额度跟不要钱似的。三个月后,免费额度用完了,我傻眼了:一个月200刀!还不够我一个人造的。 这哪是工具?这是个碎钞机。 我开始琢磨:能不能不花这冤枉钱?还真让我想出一个损招——BYOK(Bring Your Own Key,自带密钥)。 我的方案是这样的 我在工具里留了一个设置项:你想用自己的API Key?没问题,自己填进去。推理成本?用户自己扛。...

May 7, 20261 min read4
拒绝被AI套牢:我是如何用自带密钥把推理成本压到零的

把家人照片传到云端AI到底有多危险?我在断网状态下用本地算力建了一个私密人脸库

上个月我干了一件让我后怕的事。 为了找去年春节拍的一张全家福,我把整个相册——包括孩子的、家人的、自己的几千张照片——一股脑丢进了一个"智能相册"App。它确实好用,输入"戴红色围巾的妈妈"秒出结果。 但第二天我看到一条新闻:某大厂被曝用用户上传的私人照片训练面部识别模型。 我的脸、我孩子的脸、我妻子的脸,全变成了训练数据。 那一刻我决定:再也不把照片传到任何云端AI了。 我的痛点:找照片是刚需,但隐私不能妥协 给家人找照片这件事,几乎每周都会发生—— "那张在海边的照片你还有吗?""去年生日那...

May 7, 20261 min read
把家人照片传到云端AI到底有多危险?我在断网状态下用本地算力建了一个私密人脸库

I Consumed AI Content for 2 Years. Then I Decided to Build Something With It

中文图文 我消费了两年 AI 工具后,决定用 AI 做点东西——然后做出来了 我大概在 2024 年初就开始用 ChatGPT 了。 那时候每天刷各种 AI 新闻、收藏教程、订阅 Newsletter,朋友圈分享 AI 工具的文章必点赞。我觉得自己很"懂" AI。 但有一天我仔细算了一下:两年来,我到底用 AI 做成了什么事? 答案是:0 件。 我让 AI 给我写过诗、润色过邮件、翻译过文档——这些有用,但它们本质上只是"用 AI 说话"。不是用 AI 做事。 我的转折点:那个我想解决的痛点...

Apr 28, 20261 min read
I Consumed AI Content for 2 Years. Then I Decided to Build Something With It
R

Ray's blog

18 posts