Cloud-based AI tools like ChatGPT have become extremely popular, but they come with limitations such as internet dependency, privacy concerns, and recurring subscription costs. For those seeking a more private, offline, and cost-effective alternative, Ollama offers a game-changing solution. Ollama lets you run powerful large language models (LLMs) locally for free, giving you full control over your data and performance.
In this tutorial, a step-by-step guide will be provided to help you install Ollama, run models like LLaMA 2, use the built-in HTTP API, and even create custom models tailored to your needs.
What Is Ollama?
Ollama provides an open-source runtime environment for LLMs that you can install on your local machine. Unlike cloud platforms such as OpenAI or Anthropic, Ollama runs models entirely offline, ensuring:
- Full privacy: Data remains on the local system.
- Low latency: Ollama avoids communication with remote servers, delivering fast response times.
- Zero cost: No API charges or subscriptions.
Running LLMs locally requires a decent amount of system resources, including RAM and CPU/GPU power, depending on the model you use.
How to Install Ollama
The installation of Ollama is straightforward and platform-specific packages are available.
For macOS and Linux:
Use the following script to install Ollama:
curl -fsSL https://ollama.com/install.sh | sh
For Windows:
A .exe
installer is provided on the official Ollama website. After downloading the installer, follow the prompted steps to complete the installation.
Post-Installation Check:
Run the following command to confirm that Ollama installed successfully:
ollama
If the Ollama CLI displays help or version info, the setup is complete.
Running Your First Model in Ollama
After you install Ollama, you can run LLMs easily using simple commands.
Listing Available Models:
ollama list
Running the LLaMA 2 Model:
ollama run llama2
Ollama automatically downloads the required model if it’s not already available locally. Upon completion, conversations with the model can begin immediately.
For lighter models:
ollama run mistral
These options make it possible to choose between performance and resource usage based on your system’s capability.
Using Ollama’s Local API
You can run Ollama in server mode to enable HTTP-based API integrations for your applications and scripts.
Start the Ollama Server:
ollama serve
Sample Python Code to Use the API:
import requests
import json
response = requests.post(
"http://localhost:11434/api/generate",
json={"model": "llama2", "prompt": "Write a poem about AI"},
stream=True
)
poem = ""
for line in response.iter_lines():
if line:
data = json.loads(line)
poem += data.get("response", "")
print(poem)
This method allows for the integration of AI into local apps without depending on external APIs. Use cases include chatbots, automation systems, and more.
Customizing LLM Behavior with Modelfiles
To tailor an LLM to specific needs, Ollama supports the creation of custom models via .modelfile
definitions.
Sample Modelfile (mymodel.modelfile
):
FROM llama2
PARAMETER temperature 0.7
SYSTEM "You are a helpful coding assistant."
This configuration sets the base model to llama2
, adjusts the creativity level with temperature
, and instructs the model to behave as a coding assistant.
Build the Custom Model:
ollama create mymodel -f mymodel.modelfile
Ollama creates and compiles a new model layer locally. Once the process finishes, the model is ready for use.
Run the Custom Model:
ollama run mymodel
You can now start personalized interactions with the model using your predefined instructions.
Managing Installed Models
Use the following command to list and manage all available models:
ollama list
This will display the model name, size, and modification date.
Remove unused models with the following command to free up space:
ollama remove llama2
This is particularly useful during experimentation with multiple models.
Final Thoughts
By using Ollama, free local execution of LLMs becomes possible without sacrificing speed, privacy, or flexibility. Whether it’s for personal exploration or production-grade application integration, Ollama offers an efficient and powerful alternative to cloud AI platforms.
Quick Recap:
- Ollama was installed on a local machine
- LLMs like LLaMA and Mistral were executed offline
- API requests were integrated using Python
- Custom models were built and configured
- Model management (listing/removal) was demonstrated
If you’re tired of cloud limitations and want ChatGPT-like performance locally, Ollama is the tool to try. The freedom to run models offline, without cost or data sharing, is a huge win for developers, researchers, and AI enthusiasts.
To run Ollama on Kubernetes check out my blog –click here
For more such tutorials, don’t forget to subscribe to the blog and share your thoughts or questions in the comments below!