Available for work --:--:-- PT

Julian Vargas.

AI Engineer — production RAG and agent systems, and the infrastructure they run on.

Most engineers can call an LLM API. Fewer can deploy, observe, secure, and evaluate what they ship. I build production RAG and agent systems — grounding, evaluation, and observability first — and the infrastructure they run on.

Open to remote AI Engineer, AI Platform, and LLMOps roles.

Explore the work
01 / Work 2024 — Present
01 Shipped · v0.1.0
  • 2-layerTenant isolation · Postgres RLS
  • 5Pluggable model protocols
  • 12Architecture decisions documented
  • 0-setupIn-process tracing dashboard

ForgeDocs AI

Production-grade agentic document intelligence — the flagship project here, and the clearest picture of how I build AI systems. Upload documents, extract structured data, and answer hard questions over them with inline [n] citations that resolve to document_id + page + char span. Grounding first: hybrid retrieval (pgvector dense + Postgres BM25 fused with RRF), a LangGraph Planner → Retriever → Synthesizer → Verifier loop that refuses ungrounded answers, and honest abstention instead of hallucination. Hardened like a service, not a demo: two-layer multi-tenant isolation enforced by Postgres Row-Level Security under a dedicated NOBYPASSRLS role, tenant identity from a verified Supabase JWT, and an always-on eval harness (faithfulness, citation validity, abstention) as an offline CI floor. Every component — ChatModel, Embedder, Reranker, OCRBackend, Tracer — is a swappable protocol with local-first defaults, so CI runs fully offline. Observability is built in: every step is a tracing span behind a zero-setup GET /traces dashboard. Twelve architecture decisions are written down with trade-offs.

  • Python
  • FastAPI
  • LangGraph
  • Postgres · pgvector
  • Hybrid RAG · RRF
  • Pydantic v2
  • Next.js · React
  • Row-Level Security
  • GitHub Actions
Architecture
POST /ingest  ·  PDF · text · md · image / scanned PDF
        │
        ▼
 ingestion: load → OCR (none|tesseract|vision) → structure-aware
   chunk (~512 tok, ~15% overlap) → embed → index
   · per-tenant dedupe: UNIQUE (tenant_id, sha256)
        │
        ▼
 ┌───────────────────────────────────────────────┐
 │ Postgres + pgvector (Supabase)                 │
 │  · dense (cosine)  +  lexical (FTS / BM25)     │
 │  · Row-Level Security, GUC app.current_tenant  │
 │  · NOBYPASSRLS role  ·  NULL tenant → 0 rows   │
 └───────────────────────┬───────────────────────┘
        │  POST /query · /query/stream (SSE)
        ▼
 retrieval: dense ∪ lexical → RRF (k=60) → rerank?
        │
        ▼
 agents (LangGraph): Planner → Retriever
        → Synthesizer → Verifier ⟲ self-correct
        │   refuses ungrounded answers
        ▼
 answer + [n] citations → document_id · page · span
        │                       │
        ▼                       ▼
 GET /traces (every step a span)   eval harness:
 in-proc recorder | Langfuse       faithfulness ·
                                   citation validity ·
                                   abstention  → CI floor
02 Live
  • ~3 minProvision time
  • 3CI workflows · 1 review gate
  • $6/moAlways-on cost
  • 100%IaC coverage

terraform-homelab

The operating half of the story — proof I run what I build, not just call an API. The repo that built and ships this page: provisions a hardened Ubuntu VPS on Vultr, points a Cloudflare-managed domain at it, and serves the site over HTTPS via Caddy. Remote state in Cloudflare R2. Modular Terraform, cloud-init bootstrapping, a real security baseline (UFW + fail2ban + sshd hardening), and a one-shot redeploy path for the static site that doesn't rebuild the VM. Deployed via a GitHub Actions GitOps pipeline — PRs get a sticky terraform plan comment, merges pause at a production review gate, and the approved plan is applied byte-for-byte. State is snapshotted to backups/ before every apply.

  • Terraform
  • GitHub Actions
  • cloud-init
  • Vultr
  • Cloudflare DNS
  • Cloudflare R2
  • Caddy
  • UFW
  • fail2ban
Architecture
git push → PR
        │
        ▼
 GitHub Actions: pr-check
  · fmt · validate · tflint · tfsec · plan
  · sticky plan comment on the PR
        │  merge to main
        ▼
 GitHub Actions: deploy
  · validate → plan → tfplan artifact
  · pause at `production` review gate ⏸
  · snapshot tfstate → backups/<ts>.tfstate
  · terraform apply (saved plan)
        │
        ▼
 ┌──────────────┬──────────────┬──────────────┐
 │  Vultr API   │ Cloudflare   │ Cloudflare R2│
 │  (compute,   │ DNS API      │ (state +     │
 │   SSH key)   │ (A record)   │  backups/)   │
 └──────┬───────┴──────────────┴──────────────┘
        │  cloud-init user_data
        ▼
 ┌──────────────────────────────┐
 │ Ubuntu VPS                   │
 │  · non-root user, key-only   │
 │  · sshd hardening drop-in    │
 │  · UFW (22/80/443)           │
 │  · fail2ban                  │
 │  · Caddy (auto Let's Encrypt)│
 └──────────────┬───────────────┘
                │ HTTPS :443
                ▼
            user browser
03 Running 24/7
  • 4Alert streams
  • 24/7Uptime since deploy
  • 2Channels (push · pull)

monitoring-platform

A production-shaped observability stack for the game servers I host. Prometheus + Grafana + a custom Python exporter reading iptables and socket counters, all in Docker Compose. Two channels by design — Discord for events that need a human, Prometheus for trends. Pairs with halo-ce-command-center, which handles the active defense + game-telemetry layer; this stack visualizes what the command center sees.

  • Python
  • Prometheus
  • Grafana
  • Docker Compose
  • node-exporter
  • Discord webhook
  • Linux
04 Running 24/7
  • 5–10 GbpsDefense ceiling, single VPS
  • 4,631Reputation CIDRs (FireHOL + Spamhaus)
  • ~153 MBRAM at idle
  • 5Containers, 1 VPS

halo-ce-command-center

A self-hosted operations toolkit for the Halo Custom Edition dedicated servers I host. Layered DDoS defense (sysctl rate limits, iptables, ipset reputation feeds from FireHOL and Spamhaus, auto-banning attacker /24 subnets on PPS spikes), Discord notifications for player joins/leaves with country flags and VPN detection, in-game stat tracking via SAPP Lua hooks (K/D/A and captures per IP, with /stats, /top, /rank commands surfaced both in-game and through Discord slash commands), and Prometheus + Grafana metrics — all five containers wired together in one Docker Compose stack on a single VPS.

  • Python
  • Lua (SAPP)
  • Docker Compose
  • iptables · ipset
  • Prometheus
  • Grafana
  • Discord webhook
  • SQLite
05 Shipped · v0.1.0
  • 5Lifecycle commands
  • 2Cloud providers

infra-automator

A single Click-based CLI — infra up | harden | deploy | status | destroy — that owns the full lifecycle of a small cloud footprint. Terraform provisions on Vultr or DigitalOcean through a shared output contract, so the Python layer stays zero-provider-specific. Ansible handles hardening (SSH key-only, UFW, fail2ban, unattended-upgrades) with a self-contained Bash fallback that reaches the same end state. Docker Compose stacks sync to every node over SSH; teardown is one command. CI runs on every push: lint, type-check, unit tests, and Terraform fmt/validate per provider stack.

  • Python
  • Click
  • Terraform
  • Ansible
  • Docker Compose
  • GitHub Actions
  • Vultr
  • DigitalOcean
06 Shipped · v0.1.0
  • ~90sCluster rebuild time
  • 20 MiBContainer image
  • 2 podsnginx replicas
  • 8Interview-defensible insights

k3s-homelab

A second deployment of this site — same static files, a completely different shape. Single-node k3s cluster on a Vultr VPS, two nginx pods behind the bundled traefik ingress, TLS by cert-manager + Let's Encrypt prod, packaged as a Helm chart. The container image is built on the VPS and imported straight into k3s's containerd via docker save | k3s ctr images import — no external registry. Companion to terraform-homelab so the two deployment styles sit side by side as references. Live demo available on request — the cluster is destroyed between sessions to save the meter and re-spins in ~90 seconds.

  • Kubernetes
  • k3s
  • Helm
  • Docker
  • cert-manager
  • Let's Encrypt
  • traefik
  • Terraform
02 / About

I'm a self-taught engineer who builds production LLM systems — retrieval, agents, structured extraction — and the infrastructure they run on. The part I care about most is whether the thing actually works outside the demo: knowing when an answer is grounded versus guessed, catching the regressions before users do, watching how the whole thing degrades when something upstream breaks.

The infra work is the other half. I provision, harden, and operate real services in code — VPS, DNS, TLS, monitoring — and try to write down what I chose, what I weighed, and the parts I'd do differently next time. Most of what's on this site I run on my own boxes.

27, bilingual (EN/ES), US/Mexican dual citizen, based in California on Pacific Time. I work remote, async-first. I move quickly when something clicks and I'd rather ship a small thing that works than plan a big thing that doesn't.

03 / Stack

AI / LLM Systems

  • LangGraph (multi-agent)
  • Hybrid RAG · RRF
  • pgvector + BM25
  • Cross-encoder rerank
  • Pydantic extraction
  • RAGAS-shaped evals
  • LLM-as-judge
  • Langfuse / tracing
  • Ollama (local-first)

Backends & Infrastructure

  • Python · FastAPI
  • Next.js · TypeScript
  • Postgres / Supabase
  • Terraform
  • Ansible · cloud-init
  • Docker · Compose
  • Cloudflare · Vultr
  • Caddy · Let's Encrypt

Observability & Security

  • Prometheus
  • Grafana
  • Postgres RLS
  • UFW · fail2ban
  • Linux · systemd
  • GitHub Actions (CI)
  • pytest · ruff · mypy
04 / Contact

Let's talk.

California · Pacific Time · Open to remote, anywhere