LLM Run
I had a simple computer — no GPU, just a CPU. Initially, I thought of using free cloud resources and hosting an LLM there.

“No GPU… Yet I Ran an LLM” — My Local AI Story 🚀
I had a simple computer — no GPU, just a CPU.
Initially, I thought of using free cloud resources and hosting an LLM there.
Then a friend casually said:
“You don’t even need a GPU. You can run an LLM locally.”
At first, I didn’t believe it.
But after doing a little research, I realized — he was completely right.
💡 Realization Moment
Not all LLMs are heavyweight.
Some models are so lightweight that they can run on a CPU, and heavy ones can be quantized to become smaller and faster.
The question then was:
If I’m running an LLM locally, which model will actually be useful for me?
🧠 Model Choice
My focus was on coding — debugging, code generation, and explanations.
So I chose: CodeLlama 7B
This model is based on Meta’s LLaMA architecture and specially trained for programming tasks.
Perfect tool for a developer.
🔧 Why Ollama?
There were many options for local LLMs, but I chose Ollama because it provides a zero-drama setup: install, pull the model, and run.
This time, I decided:
“Let’s run it on Docker — clean, isolated, and professional setup.”
🐳 Running an LLM on Docker (Step-by-Step)
Step 1: Install Docker
If Docker is not installed yet:
sudo apt update sudo apt install docker.io -y sudo systemctl start docker sudo systemctl enable docker
📸 Screenshot idea:
Terminal showing docker --version ✅
Step 2: Pull the Ollama Docker Image
docker pull ollama/ollama
📸 Screenshot idea:
Docker image pulling progress ✅
Step 3: Run the Ollama Container
docker run -d \ --name ollama \ -p 11434:11434 \ ollama/ollama
This means:
- Ollama will run in the background
- Port 11434 is exposed
- Your system stays clean (no conflicts)
Step 4: Pull & Run CodeLlama 7B
docker exec -it ollama ollama run codellama:7b
⏳ Note:
The first time you run this, it will take a while — the model is being downloaded.
During this time, grabbing a coffee is completely justified ☕😄
📸 Screenshot idea:
Model download progress and the >>> prompt appearing ✅
Step 5: Talk to the Model 💬
Now you can directly write:
- Explain this Python error
- Write a REST API in Flask
- Optimize this SQL query
If the model feels slow:
- Try a smaller model than 7B
- Or use a quantized variant
🎯 Final Thoughts
This whole journey taught me one thing:
LLMs are not just for people with big GPUs.
With a little understanding, the right tools, and a smart setup — powerful AI can run on a CPU too.
Whether you’re a developer, a student, or just curious —
local LLM + Docker = future-proof skill 💪