How to Deploy an ML Model on Kubernetes

Deploy ML model on kubernetes

If you’ve ever trained a machine learning model and wondered, “How do I deploy it so others can actually use it?”, this guide is for you.

In this hands-on walkthrough, you’ll learn how to:

  • Train a simple ML model using scikit-learn
  • Serve it as an API using FastAPI
  • Containerize the app with Docker/Podman
  • And finally, deploy it on Kubernetes using Deployment, Service, and Ingress objects

By the end, you’ll have your own machine learning model running on Kubernetes, ready to handle real-world prediction requests.

Watch the full tutorial here:

Get the complete code and YAML files on GitHub: GitHub Repository

Step 1: Understanding the ML Deployment Flow

Before we jump into code, let’s clarify what “deployment” really means in the ML world.

  • Training: The process of teaching a model to recognize patterns from data. For example, using the Iris dataset, we train a RandomForest model to predict flower species.
  • Model File (model.pkl): After training, we save the learned model to a .pkl file — this file stores all the patterns the model has learned.
  • Inference: When we use the trained model to make predictions on new data.
  • Deployment: Turning the model into a live service that can respond to requests — typically through an API.

The typical flow looks like this:

Step 2: Train a Simple ML Model

We’ll start by training a small RandomForest model using the Iris dataset.
Create a file named <em><strong>train_model.py</strong></em>:

from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import joblib

iris = load_iris()
X, y = iris.data, iris.target

clf = RandomForestClassifier(n_estimators=10)
clf.fit(X, y)

joblib.dump(clf, "model.pkl")
print("Saved model.pkl")

Run it locally:

python3 train_model.py

Once done, a model.pkl file will be created, which will be used for predictions.

Step 3: Build the ML Inference API Using FastAPI

Next, we’ll create an API that loads the model and serves predictions.
This makes our machine learning model accessible to users and other services.

Create a file named <em><strong>app.py</strong></em>:

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load("model.pkl")

class Instance(BaseModel):
    data: list

@app.get("/healthz")
def health_check():
    return {"status": "ok"}

@app.post("/predict")
def predict(inst: Instance):
    data = np.array(inst.data).reshape(1, -1)
    prediction = model.predict(data).tolist()
    return {"prediction": prediction}

Run the API locally:

uvicorn app:app --host 0.0.0.0 --port 8000

Test it using curl:

curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"data":[5.1,3.5,1.4,0.2]}'

You should get a prediction instantly.

Step 4: Containerize the ML App

To run this app on Kubernetes, it must be containerized.

Create a Containerfile (or Dockerfile):

FROM python:3.11-slim
WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.pkl .
COPY app.py .

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

Build and run it locally:

podman build -t ml-demo .
podman run -p 8000:8000 ml-demo

Now your ML inference API can run inside a container, and it’s portable, isolated, and production-ready.

Step 5: Deploy on Kubernetes

Once your image is built and pushed to a registry (like Quay or Docker Hub), create three Kubernetes manifests.

Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ml-demo
  template:
    metadata:
      labels:
        app: ml-demo
    spec:
      containers:
      - name: ml
        image: quay.io/nikhil811/techinik:k8s
        ports:
        - containerPort: 8000

Service

apiVersion: v1
kind: Service
metadata:
  name: ml-demo-svc
spec:
  selector:
    app: ml-demo
  ports:
  - port: 80
    targetPort: 8000
  type: ClusterIP

Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: ml-demo-ingress
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: rke2-server
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: ml-demo-svc
            port:
              number: 80

Apply them all:

kubectl apply -f .

Check resources:

kubectl get pods,svc,ingress

Finally, test your deployed model via the ingress URL:

curl -X POST http://rke2-server/predict \
-H "Content-Type: application/json" \
-d '{"data":[5.1,3.5,1.4,0.2]}'

You’ve just deployed a machine learning model on Kubernetes!

Step 6: Recap

Let’s recap what we achieved:

  1. Trained a machine learning model using Scikit-learn
  2. Built a prediction API using FastAPI
  3. Containerized the app with Docker/Podman
  4. Deployed it on Kubernetes using Deployment, Service, and Ingress

This is the exact workflow followed in production ML systems from training to inference to deployment.

In the next guide, we’ll scale this app using a Horizontal Pod Autoscaler and add Prometheus monitoring for real-time insights.

Final Thoughts

Machine learning deployment doesn’t have to be complicated.
With tools like FastAPI, Docker, and Kubernetes, even beginners can build scalable ML systems that run anywhere.

So go ahead train your model, containerize it, and deploy it.
Your next AI-powered microservice is just a few commands away.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top