# ADR-005: PM2 + Caddy Deployment

## Status
Accepted

## Context
CMS-aiChemist runs on a VPS and needs production deployment infrastructure that:

- Manages Node.js processes (restart on crash, memory limits, logging)
- Provides HTTPS with automatic certificate management
- Routes requests to appropriate backend services
- Serves static files efficiently
- Monitors application health
- Supports blue-green or rolling deploys (future)

Deployment options considered:
- **Docker + nginx**: Industry standard, but adds container complexity
- **systemd + nginx**: Native Linux, but manual TLS certificate management
- **PM2 + Caddy**: Process manager + modern web server with auto HTTPS
- **Kubernetes**: Overkill for two Node.js apps on single VPS

## Decision
Use PM2 for Node.js process management and Caddy as the reverse proxy with automatic Let's Encrypt TLS.

### PM2 Configuration (ecosystem.config.cjs)
```javascript
module.exports = {
  apps: [
    {
      name: 'cms-web',
      cwd: '/home/jgatlit/apps/CMS/apps/web',
      script: 'pnpm',
      args: 'start',
      instances: 1,
      max_memory_restart: '256M',
      env: { NODE_ENV: 'production', PORT: 3500 }
    },
    {
      name: 'cms-content',
      cwd: '/home/jgatlit/apps/CMS/apps/cms-theme',
      script: 'pnpm',
      args: 'preview',
      interpreter: '/home/jgatlit/.local/share/fnm/node-versions/v22.12.0/installation/bin/node',
      instances: 1,
      max_memory_restart: '256M',
      env: { NODE_ENV: 'production', PORT: 3501 }
    }
  ]
}
```

### Caddy Configuration
- **cms.chem.dev** → reverse_proxy localhost:3500 (cms-web)
- **cms-content.chem.dev** → reverse_proxy localhost:3501 (cms-content)
- **cms.chem.dev/ref/** → file_server browse (static-site/)
- Auto TLS via Let's Encrypt
- gzip and zstd compression
- Security headers (HSTS, X-Frame-Options, etc.)
- JSON structured logging

### Deployment Process
1. `git pull` latest changes
2. Build apps: `cd apps/web && pnpm build` and `cd apps/cms-theme && pnpm build`
3. Restart PM2: `pm2 restart cms-web cms-content`
4. Verify: `curl -I https://cms.chem.dev` (expect 200)

## Consequences

### Positive
- **Auto-restart**: PM2 restarts crashed processes automatically
- **Memory limits**: 256M per process prevents runaway memory leaks
- **Auto HTTPS**: Caddy handles Let's Encrypt certificate issuance and renewal
- **Process logs**: PM2 captures stdout/stderr, viewable via `pm2 logs`
- **Zero-downtime reload**: PM2 supports graceful restart (future)
- **Static serving**: Caddy file_server provides browsable directory for tokens/docs
- **Modern defaults**: Caddy includes security headers and compression out of the box

### Negative
- **No containerization**: Processes run directly on host (less isolation)
- **Manual deploys**: No CI/CD pipeline yet (planned)
- **Port conflicts**: Must ensure 3500/3501 are not used by other services
- **Single instance**: Not load-balanced (acceptable for current scale)

### Neutral
- **Verified live URLs** (2026-04-06):
  - https://cms.chem.dev/ (200 - Next.js home)
  - https://cms.chem.dev/dashboard (200 - Next.js dashboard)
  - https://cms.chem.dev/design-system (200 - Next.js catalog)
  - https://cms-content.chem.dev/ (302 - Astro setup redirect)
  - https://cms.chem.dev/ref/tokens/ (200 - static tokens)
  - https://penpot.chem.dev (200 - no regression)

### Operational Notes
- **PM2 commands**:
  - Start: `pm2 start ecosystem.config.cjs`
  - Restart: `pm2 restart cms-web cms-content`
  - Logs: `pm2 logs cms-web --lines 100`
  - Status: `pm2 status`
- **Caddy commands**:
  - Reload config: `sudo systemctl reload caddy`
  - View logs: `sudo journalctl -u caddy -f`
- **Critical lesson**: ALWAYS rebuild and restart after committing code changes. Stale chunks cause ChunkLoadError 400s in production.
- **Static sync**: `ops/sync-static.sh` copies tokens, briefs, handoffs, docs to static-site/

### Future Enhancements
- CI/CD pipeline (GitHub Actions) to automate build + deploy
- PM2 cluster mode (multiple instances per app) if traffic increases
- Health check endpoints for monitoring
- Blue-green deployment strategy
- Automated rollback on failed health checks
