Back to PortfolioVisit Live
Production · Live 24/7

Linux Homelab Stack

Production-Ready Self-Hosted Infrastructure

Docker · Traefik · Prometheus · Grafana · Ubuntu 22.04

Core Implementations

Reverse Proxy & Automated SSL

All 7 services secured behind a single entry point with automatic HTTPS certificates — no manual renewal, no exposed ports.

Traefik v3Let's EncryptTLS TerminationDocker

Container Infrastructure

Full production-style stack deployed on Linux server — isolated networks, one-command deployment, web-based management UI.

DockerDocker ComposeUbuntu ServerPortainer

Live Monitoring

Real-time dashboards tracking server health, resource usage and service availability. Instant alerts on downtime.

PrometheusGrafanaPromQLUptime Kuma

Security Hardening

Server locked down to industry standards — key-only SSH access, minimal firewall, automated attack prevention.

UFWSSH Hardeningfail2banLinux

Zero-Touch Automation

System updates itself, backs up all data nightly and recovers automatically after reboot. Built and documented as code.

Bash ScriptingCronWatchtowerGit

Linux Homelab Infrastructure Stack

Self-hosted production-adjacent infrastructure on Ubuntu Server 22.04 — fully managed as code.

Every component is defined in version-controlled files. No manual clicks. No undocumented state. The goal was not to collect services, but to demonstrate a system administrator's thinking: structured architecture, security awareness, operational automation, and clear documentation.


Overview

InfrastructureUbuntu Server 22.04 LTS on VMware
Services7 containerised applications
DeploymentDocker Compose, one-command stack launch
SSLAutomatic HTTPS via Let's Encrypt
MonitoringReal-time metrics and uptime monitoring
AutomationNightly updates and daily backups
Repositorygithub.com/overthinkinglord/homelab-stack

Architecture

All external traffic enters through Traefik — the only publicly exposed component. Backend services communicate over an isolated internal network, invisible to the outside.

Internet
    │
    ▼
Traefik (Port 80/443)          ← Reverse proxy + Auto SSL
    │
    │  [proxy network — only Traefik exposed]
    │
    ├──────────────────────────────────┐
    │                                  │
    ▼                                  ▼
Grafana                          Uptime Kuma
(Dashboards)                     (Availability monitoring)
    │
    │  [internal network]
    │
    ├──────────────────────┐
    ▼                      ▼
Prometheus            Portainer
+ Node Exporter       (Docker UI)
(Metrics)

Watchtower (background) → auto-updates selected containers
Cron (02:00 daily)      → automated volume backups

Why this design? External-facing and internal traffic are deliberately separated. A compromised external service cannot reach internal backend components. Exposure is minimal and intentional.


Services

ServiceDomainPurposeAuto-Update
Traefiktraefik.stan-homelab.duckdns.orgReverse proxy + SSLManual
Grafanagrafana.stan-homelab.duckdns.orgMetrics dashboards✅ Yes
Prometheusprometheus.stan-homelab.duckdns.orgMetrics storage✅ Yes
Uptime Kumauptime.stan-homelab.duckdns.orgUptime monitoring✅ Yes
Portainerportainer.stan-homelab.duckdns.orgDocker management UIManual
Node ExporterInternal onlySystem metrics collector✅ Yes
WatchtowerInternal onlyContainer auto-updater

Tech Stack

Containerisation

Docker and Docker Compose are used to define and run all services. Each service lives in its own container with explicit network assignments and volume mounts. Portainer provides a web-based management UI on top.

Reverse Proxy & SSL

Traefik v3 acts as the single entry point for all external traffic. Routing is defined via Docker labels on each container — when a new service is added, three label lines are enough to expose it securely. Let's Encrypt certificates are provisioned automatically via DNS challenge and stored in acme.json.

Why Traefik over Nginx? Traefik reads Docker labels directly — no separate config files per service. Adding a new service takes 3 lines instead of a new Nginx config block. It also integrates Let's Encrypt natively.

Monitoring

Two-layer monitoring approach:

  • Uptime Kuma — answers "is it up or down?" with 60-second checks and instant alerting
  • Prometheus + Grafana — answers "how is it performing?" with real-time CPU, memory, disk, and network metrics

Node Exporter runs on the internal network and exposes 60+ system metrics to Prometheus every 15 seconds. A custom PromQL dashboard was built from scratch to display real CPU load — not just imported templates.

Security Hardening

The server was hardened before any services were deployed:

  • SSH key-based authentication only — password login disabled entirely
  • PermitRootLogin no — root SSH access blocked
  • UFW firewall — only ports 22, 80, and 443 open
  • fail2ban — automatic IP banning after repeated failed login attempts
  • Unattended security upgrades enabled

Automation

The system is designed to maintain itself:

  • Watchtower runs at 03:00 daily using an opt-in label model — only explicitly tagged containers are updated. Traefik and Portainer are excluded and updated manually.
  • Backup script runs at 02:00 daily — before Watchtower — archiving all Docker volumes with 14-day retention and a full audit log.
  • Makefile provides a unified CLI for the entire stack: make up, make down, make status, make backup.

Infrastructure as Code

Everything lives in Git from day one. No manual state. The .gitignore ensures secrets and certificates never reach the repository. A .env.example documents required configuration without exposing real values.


Key Technical Decisions

Opt-in Watchtower model Core infrastructure (Traefik, Portainer) is excluded from auto-updates. A misconfigured update on the reverse proxy takes down the entire stack. These are updated manually after reviewing changelogs.

Backup before updates The backup cron job runs at 02:00, Watchtower at 03:00. If an auto-update breaks something, a fresh backup from the same night is always available.

Network separation Two Docker networks: proxy for external-facing traffic, internal for backend communication. Node Exporter, for example, has no business being reachable externally — it only lives on the internal network.

Traefik network binding When containers are connected to multiple Docker networks, Traefik must be explicitly told which network to use for routing. This was discovered during debugging and documented in TROUBLESHOOTING.md.


Deployment

git clone git@github.com:overthinkinglord/homelab-stack.git
cd homelab-stack
cp .env.example .env
# Fill in DOMAIN, DUCKDNS_TOKEN, passwords
make up

Repository Structure

homelab-stack/
├── traefik/
│   ├── docker-compose.yml
│   ├── traefik.yml
│   └── acme.json          ← gitignored
├── monitoring/
│   ├── docker-compose.yml
│   └── prometheus/
│       └── prometheus.yml
├── portainer/
│   └── docker-compose.yml
├── watchtower/
│   └── docker-compose.yml
├── scripts/
│   └── backup.sh
├── docs/
│   ├── DECISIONS.md
│   └── TROUBLESHOOTING.md
├── Makefile
├── .env.example
├── .gitignore
└── README.md

Key Challenges Solved

Real problems encountered during the build — each one required investigation, not just googling an error message.


Traefik routing failure with multi-network containers

Traefik could reach containers directly by IP but returned Gateway Timeout on all domain requests. The root cause was that monitoring containers were connected to both proxy and internal networks — Traefik didn't know which one to use for routing. Fixed by explicitly setting network: proxy in traefik.yml under providers.docker. This is not documented prominently in Traefik's official docs and required reading through network inspection output to identify.

Traefik Docker Networking Debugging


Let's Encrypt DNS challenge failure for subdomains

DuckDNS free tier does not reliably propagate TXT records for subdomains (e.g. grafana.stan-homelab.duckdns.org) within Let's Encrypt's timeout window. Traefik logs showed propagation: time limit exceeded. Solution was to use tls=true without a certresolver for internal services — Traefik issues a self-signed certificate acceptable for a local network. The root domain (traefik.stan-homelab.duckdns.org) retained a valid Let's Encrypt certificate.

Let's Encrypt DNS Challenge TLS DuckDNS


Uptime Kuma Bad Gateway behind Traefik

After fixing the routing issue, Uptime Kuma still returned Bad Gateway while Grafana and Prometheus worked correctly. Uptime Kuma requires WebSocket connections which Traefik doesn't upgrade automatically. Fixed by adding custom request headers middleware with Connection: Upgrade and Upgrade: websocket labels on the container.

WebSocket Traefik Middleware Debugging


Watchtower Docker API version mismatch

Same issue as Traefik — Watchtower's internal Docker client was too old for the installed Docker daemon. Unlike Traefik where updating to latest resolved it, Watchtower required an explicit DOCKER_API_VERSION=1.44 environment variable. This pattern — older client libraries bundled inside container images — is a recurring gotcha worth knowing.

Docker API Watchtower Version Compatibility


Skills Demonstrated

Docker Docker Compose Traefik v3 Let's Encrypt Prometheus Grafana PromQL Uptime Kuma Portainer Ubuntu Server 22.04 UFW SSH Hardening fail2ban Bash Scripting Cron Watchtower Makefile Git Infrastructure as Code Network Isolation TLS Termination


Stanislav Shtelmakh — Linux Homelab Stack — 2026 github.com/overthinkinglord/homelab-stack