Linux Homelab Stack
Production-Ready Self-Hosted Infrastructure
Core Implementations
Reverse Proxy & Automated SSL
All 7 services secured behind a single entry point with automatic HTTPS certificates — no manual renewal, no exposed ports.
Container Infrastructure
Full production-style stack deployed on Linux server — isolated networks, one-command deployment, web-based management UI.
Live Monitoring
Real-time dashboards tracking server health, resource usage and service availability. Instant alerts on downtime.
Security Hardening
Server locked down to industry standards — key-only SSH access, minimal firewall, automated attack prevention.
Zero-Touch Automation
System updates itself, backs up all data nightly and recovers automatically after reboot. Built and documented as code.
Linux Homelab Infrastructure Stack
Self-hosted production-adjacent infrastructure on Ubuntu Server 22.04 — fully managed as code.
Every component is defined in version-controlled files. No manual clicks. No undocumented state. The goal was not to collect services, but to demonstrate a system administrator's thinking: structured architecture, security awareness, operational automation, and clear documentation.
Overview
| Infrastructure | Ubuntu Server 22.04 LTS on VMware |
| Services | 7 containerised applications |
| Deployment | Docker Compose, one-command stack launch |
| SSL | Automatic HTTPS via Let's Encrypt |
| Monitoring | Real-time metrics and uptime monitoring |
| Automation | Nightly updates and daily backups |
| Repository | github.com/overthinkinglord/homelab-stack |
Architecture
All external traffic enters through Traefik — the only publicly exposed component. Backend services communicate over an isolated internal network, invisible to the outside.
Internet
│
▼
Traefik (Port 80/443) ← Reverse proxy + Auto SSL
│
│ [proxy network — only Traefik exposed]
│
├──────────────────────────────────┐
│ │
▼ ▼
Grafana Uptime Kuma
(Dashboards) (Availability monitoring)
│
│ [internal network]
│
├──────────────────────┐
▼ ▼
Prometheus Portainer
+ Node Exporter (Docker UI)
(Metrics)
Watchtower (background) → auto-updates selected containers
Cron (02:00 daily) → automated volume backups
Why this design? External-facing and internal traffic are deliberately separated. A compromised external service cannot reach internal backend components. Exposure is minimal and intentional.
Services
| Service | Domain | Purpose | Auto-Update |
|---|---|---|---|
| Traefik | traefik.stan-homelab.duckdns.org | Reverse proxy + SSL | Manual |
| Grafana | grafana.stan-homelab.duckdns.org | Metrics dashboards | ✅ Yes |
| Prometheus | prometheus.stan-homelab.duckdns.org | Metrics storage | ✅ Yes |
| Uptime Kuma | uptime.stan-homelab.duckdns.org | Uptime monitoring | ✅ Yes |
| Portainer | portainer.stan-homelab.duckdns.org | Docker management UI | Manual |
| Node Exporter | Internal only | System metrics collector | ✅ Yes |
| Watchtower | Internal only | Container auto-updater | — |
Tech Stack
Containerisation
Docker and Docker Compose are used to define and run all services. Each service lives in its own container with explicit network assignments and volume mounts. Portainer provides a web-based management UI on top.
Reverse Proxy & SSL
Traefik v3 acts as the single entry point for all external traffic. Routing is defined via Docker labels on each container — when a new service is added, three label lines are enough to expose it securely. Let's Encrypt certificates are provisioned automatically via DNS challenge and stored in acme.json.
Why Traefik over Nginx? Traefik reads Docker labels directly — no separate config files per service. Adding a new service takes 3 lines instead of a new Nginx config block. It also integrates Let's Encrypt natively.
Monitoring
Two-layer monitoring approach:
- Uptime Kuma — answers "is it up or down?" with 60-second checks and instant alerting
- Prometheus + Grafana — answers "how is it performing?" with real-time CPU, memory, disk, and network metrics
Node Exporter runs on the internal network and exposes 60+ system metrics to Prometheus every 15 seconds. A custom PromQL dashboard was built from scratch to display real CPU load — not just imported templates.
Security Hardening
The server was hardened before any services were deployed:
- SSH key-based authentication only — password login disabled entirely
PermitRootLogin no— root SSH access blocked- UFW firewall — only ports 22, 80, and 443 open
- fail2ban — automatic IP banning after repeated failed login attempts
- Unattended security upgrades enabled
Automation
The system is designed to maintain itself:
- Watchtower runs at 03:00 daily using an opt-in label model — only explicitly tagged containers are updated. Traefik and Portainer are excluded and updated manually.
- Backup script runs at 02:00 daily — before Watchtower — archiving all Docker volumes with 14-day retention and a full audit log.
- Makefile provides a unified CLI for the entire stack:
make up,make down,make status,make backup.
Infrastructure as Code
Everything lives in Git from day one. No manual state. The .gitignore ensures secrets and certificates never reach the repository. A .env.example documents required configuration without exposing real values.
Key Technical Decisions
Opt-in Watchtower model Core infrastructure (Traefik, Portainer) is excluded from auto-updates. A misconfigured update on the reverse proxy takes down the entire stack. These are updated manually after reviewing changelogs.
Backup before updates The backup cron job runs at 02:00, Watchtower at 03:00. If an auto-update breaks something, a fresh backup from the same night is always available.
Network separation
Two Docker networks: proxy for external-facing traffic, internal for backend communication. Node Exporter, for example, has no business being reachable externally — it only lives on the internal network.
Traefik network binding When containers are connected to multiple Docker networks, Traefik must be explicitly told which network to use for routing. This was discovered during debugging and documented in TROUBLESHOOTING.md.
Deployment
git clone git@github.com:overthinkinglord/homelab-stack.git
cd homelab-stack
cp .env.example .env
# Fill in DOMAIN, DUCKDNS_TOKEN, passwords
make up
Repository Structure
homelab-stack/
├── traefik/
│ ├── docker-compose.yml
│ ├── traefik.yml
│ └── acme.json ← gitignored
├── monitoring/
│ ├── docker-compose.yml
│ └── prometheus/
│ └── prometheus.yml
├── portainer/
│ └── docker-compose.yml
├── watchtower/
│ └── docker-compose.yml
├── scripts/
│ └── backup.sh
├── docs/
│ ├── DECISIONS.md
│ └── TROUBLESHOOTING.md
├── Makefile
├── .env.example
├── .gitignore
└── README.md
Key Challenges Solved
Real problems encountered during the build — each one required investigation, not just googling an error message.
Traefik routing failure with multi-network containers
Traefik could reach containers directly by IP but returned Gateway Timeout on all domain requests. The root cause was that monitoring containers were connected to both proxy and internal networks — Traefik didn't know which one to use for routing. Fixed by explicitly setting network: proxy in traefik.yml under providers.docker. This is not documented prominently in Traefik's official docs and required reading through network inspection output to identify.
Traefik Docker Networking Debugging
Let's Encrypt DNS challenge failure for subdomains
DuckDNS free tier does not reliably propagate TXT records for subdomains (e.g. grafana.stan-homelab.duckdns.org) within Let's Encrypt's timeout window. Traefik logs showed propagation: time limit exceeded. Solution was to use tls=true without a certresolver for internal services — Traefik issues a self-signed certificate acceptable for a local network. The root domain (traefik.stan-homelab.duckdns.org) retained a valid Let's Encrypt certificate.
Let's Encrypt DNS Challenge TLS DuckDNS
Uptime Kuma Bad Gateway behind Traefik
After fixing the routing issue, Uptime Kuma still returned Bad Gateway while Grafana and Prometheus worked correctly. Uptime Kuma requires WebSocket connections which Traefik doesn't upgrade automatically. Fixed by adding custom request headers middleware with Connection: Upgrade and Upgrade: websocket labels on the container.
WebSocket Traefik Middleware Debugging
Watchtower Docker API version mismatch
Same issue as Traefik — Watchtower's internal Docker client was too old for the installed Docker daemon. Unlike Traefik where updating to latest resolved it, Watchtower required an explicit DOCKER_API_VERSION=1.44 environment variable. This pattern — older client libraries bundled inside container images — is a recurring gotcha worth knowing.
Docker API Watchtower Version Compatibility
Skills Demonstrated
Docker Docker Compose Traefik v3 Let's Encrypt Prometheus Grafana PromQL Uptime Kuma Portainer Ubuntu Server 22.04 UFW SSH Hardening fail2ban Bash Scripting Cron Watchtower Makefile Git Infrastructure as Code Network Isolation TLS Termination
Stanislav Shtelmakh — Linux Homelab Stack — 2026 github.com/overthinkinglord/homelab-stack