Run LLMs Locally for Free Using Ollama

Cloud-based AI tools like ChatGPT have become extremely popular, but they come with limitations such as internet dependency, privacy concerns, and recurring subscription costs. For those seeking a more private, offline, and cost-effective alternative, Ollama offers a game-changing solution. Ollama lets you run powerful large language models (LLMs) locally for free, giving you full control over your data and performance.

In this tutorial, a step-by-step guide will be provided to help you install Ollama, run models like LLaMA 2, use the built-in HTTP API, and even create custom models tailored to your needs.

What Is Ollama?

Ollama provides an open-source runtime environment for LLMs that you can install on your local machine. Unlike cloud platforms such as OpenAI or Anthropic, Ollama runs models entirely offline, ensuring:

Full privacy: Data remains on the local system.
Low latency: Ollama avoids communication with remote servers, delivering fast response times.
Zero cost: No API charges or subscriptions.

Running LLMs locally requires a decent amount of system resources, including RAM and CPU/GPU power, depending on the model you use.

How to Install Ollama

The installation of Ollama is straightforward and platform-specific packages are available.

For macOS and Linux:

Use the following script to install Ollama:

curl -fsSL https://ollama.com/install.sh | sh

For Windows:

A .exe installer is provided on the official Ollama website. After downloading the installer, follow the prompted steps to complete the installation.

Post-Installation Check:

Run the following command to confirm that Ollama installed successfully:

ollama

If the Ollama CLI displays help or version info, the setup is complete.

Running Your First Model in Ollama

After you install Ollama, you can run LLMs easily using simple commands.

Listing Available Models:

ollama list

Running the LLaMA 2 Model:

ollama run llama2

Ollama automatically downloads the required model if it’s not already available locally. Upon completion, conversations with the model can begin immediately.

For lighter models:

ollama run mistral

These options make it possible to choose between performance and resource usage based on your system’s capability.

Using Ollama’s Local API

You can run Ollama in server mode to enable HTTP-based API integrations for your applications and scripts.

Start the Ollama Server:

ollama serve

Sample Python Code to Use the API:

import requests
import json

response = requests.post(
  "http://localhost:11434/api/generate",
  json={"model": "llama2", "prompt": "Write a poem about AI"},
  stream=True
)

poem = ""
for line in response.iter_lines():
  if line:
    data = json.loads(line)
    poem += data.get("response", "")
print(poem)

This method allows for the integration of AI into local apps without depending on external APIs. Use cases include chatbots, automation systems, and more.

Customizing LLM Behavior with Modelfiles

To tailor an LLM to specific needs, Ollama supports the creation of custom models via .modelfile definitions.

Sample Modelfile (`mymodel.modelfile`):

FROM llama2
PARAMETER temperature 0.7
SYSTEM "You are a helpful coding assistant."

This configuration sets the base model to llama2, adjusts the creativity level with temperature, and instructs the model to behave as a coding assistant.

Build the Custom Model:

ollama create mymodel -f mymodel.modelfile

Ollama creates and compiles a new model layer locally. Once the process finishes, the model is ready for use.

Run the Custom Model:

ollama run mymodel

You can now start personalized interactions with the model using your predefined instructions.

Managing Installed Models

Use the following command to list and manage all available models:

ollama list

This will display the model name, size, and modification date.

Remove unused models with the following command to free up space:

ollama remove llama2

This is particularly useful during experimentation with multiple models.

Final Thoughts

By using Ollama, free local execution of LLMs becomes possible without sacrificing speed, privacy, or flexibility. Whether it’s for personal exploration or production-grade application integration, Ollama offers an efficient and powerful alternative to cloud AI platforms.

Quick Recap:

Ollama was installed on a local machine
LLMs like LLaMA and Mistral were executed offline
API requests were integrated using Python
Custom models were built and configured
Model management (listing/removal) was demonstrated

If you’re tired of cloud limitations and want ChatGPT-like performance locally, Ollama is the tool to try. The freedom to run models offline, without cost or data sharing, is a huge win for developers, researchers, and AI enthusiasts.

To run Ollama on Kubernetes check out my blog –click here

For more such tutorials, don’t forget to subscribe to the blog and share your thoughts or questions in the comments below!

Run LLMs Locally for Free Using Ollama

What Is Ollama?

How to Install Ollama

For macOS and Linux:

For Windows:

Post-Installation Check:

Running Your First Model in Ollama

Listing Available Models:

Running the LLaMA 2 Model:

For lighter models:

Using Ollama’s Local API

Start the Ollama Server:

Sample Python Code to Use the API:

Customizing LLM Behavior with Modelfiles

Sample Modelfile (`mymodel.modelfile`):

Build the Custom Model:

Run the Custom Model:

Managing Installed Models

Final Thoughts

Quick Recap:

Leave a Comment Cancel Reply

Sign up to receive email updates, fresh news and more!

Run LLMs Locally for Free Using Ollama

What Is Ollama?

How to Install Ollama

For macOS and Linux:

For Windows:

Post-Installation Check:

Running Your First Model in Ollama

Listing Available Models:

Running the LLaMA 2 Model:

For lighter models:

Using Ollama’s Local API

Start the Ollama Server:

Sample Python Code to Use the API:

Customizing LLM Behavior with Modelfiles

Sample Modelfile (mymodel.modelfile):

Build the Custom Model:

Run the Custom Model:

Managing Installed Models

Final Thoughts

Quick Recap:

Related Posts

Leave a Comment Cancel Reply

Sample Modelfile (`mymodel.modelfile`):