Artificial Intelligence, DevOps, Data Science

LLM Run

I had a simple computer — no GPU, just a CPU. Initially, I thought of using free cloud resources and hosting an LLM there.

By Mutahir Shahzad
3 min read
LLM Run

“No GPU… Yet I Ran an LLM” — My Local AI Story 🚀

I had a simple computer — no GPU, just a CPU.

Initially, I thought of using free cloud resources and hosting an LLM there.

Then a friend casually said:

“You don’t even need a GPU. You can run an LLM locally.”

At first, I didn’t believe it.

But after doing a little research, I realized — he was completely right.

💡 Realization Moment

Not all LLMs are heavyweight.

Some models are so lightweight that they can run on a CPU, and heavy ones can be quantized to become smaller and faster.

The question then was:

If I’m running an LLM locally, which model will actually be useful for me?

🧠 Model Choice

My focus was on coding — debugging, code generation, and explanations.

So I chose: CodeLlama 7B

This model is based on Meta’s LLaMA architecture and specially trained for programming tasks.

Perfect tool for a developer.

🔧 Why Ollama?

There were many options for local LLMs, but I chose Ollama because it provides a zero-drama setup: install, pull the model, and run.

This time, I decided:

“Let’s run it on Docker — clean, isolated, and professional setup.”

🐳 Running an LLM on Docker (Step-by-Step)

Step 1: Install Docker

If Docker is not installed yet:


sudo apt update
sudo apt install docker.io -y
sudo systemctl start docker
sudo systemctl enable docker

📸 Screenshot idea:

Terminal showing docker --version ✅

Step 2: Pull the Ollama Docker Image


docker pull ollama/ollama

📸 Screenshot idea:

Docker image pulling progress ✅

Step 3: Run the Ollama Container


docker run -d \
 --name ollama \
 -p 11434:11434 \
 ollama/ollama

This means:

  • Ollama will run in the background
  • Port 11434 is exposed
  • Your system stays clean (no conflicts)

Step 4: Pull & Run CodeLlama 7B


docker exec -it ollama ollama run codellama:7b

Note:

The first time you run this, it will take a while — the model is being downloaded.

During this time, grabbing a coffee is completely justified ☕😄

📸 Screenshot idea:

Model download progress and the >>> prompt appearing ✅

Step 5: Talk to the Model 💬

Now you can directly write:

  • Explain this Python error
  • Write a REST API in Flask
  • Optimize this SQL query

If the model feels slow:

  • Try a smaller model than 7B
  • Or use a quantized variant

🎯 Final Thoughts

This whole journey taught me one thing:

LLMs are not just for people with big GPUs.

With a little understanding, the right tools, and a smart setup — powerful AI can run on a CPU too.

Whether you’re a developer, a student, or just curious —

local LLM + Docker = future-proof skill 💪

Mutahir Shahzad

Mutahir Shahzad

Abdul Wahab - Developer & Content Creator

A passionate developer sharing insights about technology, programming, and industry trends. Always learning and building innovative solutions.

Share this article

Stay Updated

Get the latest tech insights and tutorials delivered to your inbox.