Projects

AI systems first, then the infrastructure that runs them. ForgeDocs AI leads; the rest is operating credibility — production services I provision, harden, and watch. Most are running right now; the rest ship as versioned, documented repos.

ForgeDocs AI

v0.1

Production-grade agentic document-intelligence platform, built grounding-first to do the opposite of most RAG demos. Documents are ingested with structure-aware chunking and transactional, per-tenant dedupe, then answered over hybrid retrieval — dense pgvector + Postgres FTS/BM25 fused with Reciprocal Rank Fusion — so every reply carries [n] citations resolving to document, page, and char span, and abstains ("I don't have enough information") rather than guessing. A LangGraph Planner→Retriever→Synthesizer→Verifier loop self-corrects and refuses ungrounded answers. What sets it apart isn't the model call — it's everything around it that makes an LLM safe to ship: multi-tenant Postgres Row-Level Security behind a NOBYPASSRLS role, verified-JWT tenancy that overrides client headers, an always-on deterministic eval floor in CI with an optional LLM-as-judge on top, and pluggable ChatModel/Embedder/Reranker/OCRBackend/Tracer protocols that run local-first (Ollama, sentence-transformers) or degrade to a hash embedder so CI is always runnable offline. ~3,444 lines of Python across 15 modules, 12 documented architecture decisions with trade-offs. No always-on hosted demo — the repo is the artifact.

Architecture

POST /ingest  ·  PDF · text · md · image / scanned PDF
        │
        ▼
 ingestion: load → OCR (none|tesseract|vision) → structure-aware
   chunk (~512 tok, ~15% overlap) → embed → index
   · per-tenant dedupe: UNIQUE (tenant_id, sha256)
        │
        ▼
 ┌───────────────────────────────────────────────┐
 │ Postgres + pgvector (Supabase)                 │
 │  · dense (cosine)  +  lexical (FTS / BM25)     │
 │  · Row-Level Security, GUC app.current_tenant  │
 │  · NOBYPASSRLS role  ·  NULL tenant → 0 rows   │
 └───────────────────────┬───────────────────────┘
        │  POST /query · /query/stream (SSE)
        ▼
 retrieval: dense ∪ lexical → RRF (k=60) → rerank?
        │
        ▼
 agents (LangGraph): Planner → Retriever
        → Synthesizer → Verifier ⟲ self-correct
        │   refuses ungrounded answers
        ▼
 answer + [n] citations → document_id · page · span
        │                       │
        ▼                       ▼
 GET /traces (every step a span)   eval harness:
 in-proc recorder | Langfuse       faithfulness ·
                                   citation validity ·
                                   abstention  → CI floor

What it demonstrates

Hybrid retrieval done right: dense pgvector cosine + Postgres FTS/BM25 fused with Reciprocal Rank Fusion (k=60) and an optional cross-encoder rerank — the cheapest, biggest quality win in RAG.
Grounding as a hard contract: every answer carries [n] markers resolving to document_id + page + char span, with honest abstention instead of confident hallucination.
Multi-agent orchestration with LangGraph — a Planner→Retriever→Synthesizer→Verifier graph whose self-correction loop refuses ungrounded answers.
Schema-guided structured extraction into Pydantic targets (invoices, papers) with a validate-then-repair loop.
An evaluation harness as a CI quality gate: deterministic RAGAS-shaped metrics (faithfulness, citation validity, context precision, abstention accuracy) over JSONL golden sets, plus an optional claim-level LLM-as-judge that caught cited-but-unsupported claims the token metric missed.
Multi-tenant isolation enforced at the database, not just the app: Postgres Row-Level Security via a per-transaction GUC behind a dedicated NOBYPASSRLS role (SET ROLE per connection), verified live to block cross-tenant reads, INSERTs, and no-context queries.
Tenant identity derived from a server-verified Supabase JWT that overrides the X-Tenant-ID header (invalid token = hard 401), with a server-side sub→tenant mapping so a user can't switch tenants by editing a claim.
Zero-setup observability: every pipeline step wrapped in a tracing span, an in-process recorder powering a GET /traces dashboard with no external infra, and a Langfuse adapter behind the same interface flag.
Clean swappable abstractions everywhere — ChatModel/Embedder/Reranker/OCRBackend/Tracer protocols running local-first (Ollama, bge-small) or against API providers, with OCR'd images re-entering the same parse→chunk→embed→cite pipeline.

terraform-homelab

Live

The IaC repo that provisions and operates this site. Hardened Ubuntu VPS on Vultr, DNS via Cloudflare, remote state in Cloudflare R2, HTTPS via Caddy with auto Let's Encrypt. Deployed through a GitHub Actions GitOps pipeline — open a PR to see a sticky terraform plan comment, merge to trigger a saved-plan apply that pauses at a production review gate. ./deploy.sh from a laptop still works and is the documented fallback.

Architecture

git push → PR
        │
        ▼
 GitHub Actions: pr-check
  · fmt · validate · tflint · tfsec · plan
  · sticky plan comment on the PR
        │  merge to main
        ▼
 GitHub Actions: deploy
  · validate → plan → tfplan artifact
  · pause at `production` review gate ⏸
  · snapshot tfstate → backups/<ts>.tfstate
  · terraform apply (saved plan)
        │
        ▼
 ┌──────────────┬──────────────┬──────────────┐
 │  Vultr API   │ Cloudflare   │ Cloudflare R2│
 │  (compute,   │ DNS API      │ (state +     │
 │   SSH key)   │ (A record)   │  backups/)   │
 └──────┬───────┴──────────────┴──────────────┘
        │  cloud-init user_data
        ▼
 ┌──────────────────────────────┐
 │ Ubuntu VPS                   │
 │  · non-root user, key-only   │
 │  · sshd hardening drop-in    │
 │  · UFW (22/80/443)           │
 │  · fail2ban                  │
 │  · Caddy (auto Let's Encrypt)│
 └──────────────┬───────────────┘
                │ HTTPS :443
                ▼
            user browser

What it demonstrates

Modular Terraform (ssh, compute, dns modules)
Cloud-init bootstrap with security baseline (UFW, fail2ban, sshd hardening, unattended-upgrades)
S3-compatible remote state on Cloudflare R2 — solo-operator trade-off documented in the README
Auto-HTTPS via Caddy + Let's Encrypt HTTP-01
Site redeploy via null_resource triggered by archive checksum (no VM rebuild on content changes)
GitOps pipeline: PR plan sticky comments, merge-to-main deploy with saved-plan applies, workflow_dispatch escape hatch with -replace=<address> support
production GitHub Environment as a manual approval gate — no apply runs without a human click
State-snapshot safety net: every apply copies homelab/terraform.tfstate to backups/<UTC-timestamp>.tfstate via boto3 against R2's S3-compatible API, compensating for R2's lack of native PutBucketVersioning
Plan and apply jobs share tfplan + site.tar.gz via a single bundled artifact, so the apply runs the exact bytes the reviewer approved

monitoring-platform

Running 24/7

Containerized observability stack I built to watch live game servers I host. Prometheus + Grafana + node-exporter + a custom Python exporter doing real-time UDP threat detection. Two alerting channels by design — Discord webhooks for event-driven pushes, Prometheus for continuous time-series.

github.com/julivnexe/monitoring-platform

What it demonstrates

Custom Prometheus exporter in Python reading iptables counters and ss -uan socket state
Host vs bridge network namespace decisions (NET_ADMIN for the exporter, bridge for Prometheus/Grafana, host.docker.internal:host-gateway for cross-namespace scraping)
Push/pull telemetry split with independent alert cooldowns
Declarative Grafana provisioning (datasource and dashboard on first boot)
Production hardening: pinned image tags, healthchecks, named volumes, restart: unless-stopped

halo-ce-command-center

Running 24/7

A self-hosted operations toolkit for the Halo Custom Edition servers I host. Layered DDoS defense, Discord notifications for player activity, in-game K/D/A leaderboards driven by SAPP Lua hooks, and a Prometheus + Grafana monitoring stack — all wired together in one Docker Compose deployment on a single VPS.

github.com/julivnexe/Halo-CE-Command-Center

What it demonstrates

Layered defense: sysctl tuning, iptables rate limits, ipset blocklists fed from FireHOL + Spamhaus reputation lists (~4,600 CIDRs), auto-ban of attacker /24 subnets on PPS spikes
Real-time game telemetry: SAPP Lua hooks tail CSV event logs, feeding a Python bot that tracks K/D/A/captures per IP into SQLite
Leaderboards surfaced two ways — in-game commands (/stats, /top, /rank) and Discord slash commands
Discord embeds with country-flag + VPN detection on player joins/leaves
Auto-provisioned Prometheus + Grafana — datasource and dashboard live from first boot; ~153 MB RAM idle across 5 containers
Sits in the stack between halo-vps-ansible (host hardening) and monitoring-platform (observability)

infra-automator

v0.1

A Click-based Python CLI (infra up | harden | deploy | status | destroy) that owns the lifecycle of a small cloud footprint across Vultr and DigitalOcean. Wraps Terraform for provisioning, Ansible for hardening, and Docker Compose for service deploys behind one uniform interface.

github.com/julivnexe/infra-automator

What it demonstrates

Provider-agnostic CLI design via shared Terraform output contracts
Ansible primary path with self-contained Bash fallback that reaches the same end state
Secrets passed via TF_VAR_* at apply time, never written to disk
GitHub Actions CI: lint, type-check, unit tests, terraform fmt/validate per provider
One-command teardown returns the account to zero state

k3s-homelab

v0.1

A second deployment of this site onto a single-node k3s cluster — same static files as terraform-homelab, completely different shape. Two nginx pods behind the bundled traefik ingress, TLS by cert-manager + Let's Encrypt prod, packaged as a Helm chart. The container image is built on the VPS and imported straight into k3s's containerd via docker save | k3s ctr images import — no external registry (documented multi-node alternative in the chart README). Built to be defensible in an interview, not just to make a thing work. Live demo available on request — the cluster is destroyed between sessions and re-spins in ~90 seconds via ./deploy.sh apply.

What it demonstrates

Helm chart authoring — Chart.yaml / values.yaml / templates with the standard _helpers.tpl name + labels scaffolding
cert-manager + Let's Encrypt via the ingress-shim pattern — one annotation on the Ingress reconciles into a Certificate, Order, Challenge chain
Staging-then-prod ClusterIssuer dance to avoid burning prod Let's Encrypt rate limits while debugging
Registry-less image distribution for single-node k3s; documented ghcr.io alternative for multi-node
klipper-lb (k3s's stand-in for a cloud LoadBalancer) understood and explained — what it is, when you'd outgrow it
Guaranteed QoS pods (requests == limits), httpGet probes against a dedicated /healthz endpoint
kubeconfig as the auth boundary — mTLS via the embedded client cert, not source-IP allowlist on port 6443

← Back to home