Projects

Production systems I've designed and built end-to-end — backend platforms, AI infrastructure, and distributed systems.

ProblemTransaction scoring latency was too high for real-time decisions; model drift was silent.

ApproachBuilt a streaming scoring API with LightGBM, added drift monitoring using PSI, and scheduled retraining on detected drift.

ResultScoring p95 under 12ms. Drift caught 2 weeks before accuracy would have degraded measurably.

ML pipelinereal-time scoringdrift monitoringcost optimization

PythonLightGBMFastAPIscikit-learnDocker

Case study Live demo Source

AI Workflow Automation Platform

ProblemWorkflow steps were tightly coupled; one failure cascaded into lost jobs with no recovery path.

ApproachIntroduced a state-machine model per workflow run, persisted to Postgres, with BullMQ workers pulling from a durable queue. Idempotent step handlers allow safe retry.

ResultJob failure rate dropped from ~4% to under 0.1%. Recovery from worker crashes became automatic.

event-drivendistributed workersmulti-tenantstate machine

Node.jsPostgreSQLPrismaBullMQOpenAI

Case study Live demo Source

SaaS Website Builder Infrastructure

ProblemSite builds were synchronous and blocking; concurrent publishes caused database contention.

ApproachMoved builds to async workers with S3 artifact storage and a CDN invalidation step. Added per-tenant build queues to prevent noisy-neighbour problems.

ResultMedian build time fell from 8s to 1.4s. P99 dropped from 40s to under 6s.

versioned renderingbuild workersstorage pipelinemulti-tenant

Node.jsPostgreSQLS3CDNDocker

Case study Live demo Source

AI Customer Support Platform

ProblemLLM responses cited wrong sources; hallucinated product details caused support escalations.

ApproachBuilt a RAG pipeline with citation enforcement: each answer must include a retrieved source chunk. Added a self-critique pass to flag low-confidence answers for human review.

ResultHallucination rate (as measured by automated fact-check) fell by 68%. Escalation rate down 31%.

RAG pipelinevector searchingestion pipelineconversation orchestration

PythonPostgreSQLPineconeOpenAIFastAPI

Case study Live demo Source

Real-Time Data Processing Pipeline

High-throughput streaming pipeline for ingesting, transforming, and routing millions of events per second.

stream processingexactly-once deliveryschema registrybackpressure control

GoKafkaClickHouseKubernetesgRPC

Case study Live demo Source

Distributed Rate Limiter Service

Token-bucket and sliding-window rate limiting as a standalone service, supporting multi-region consistency.

token bucketsliding windowmulti-region syncsidecar-ready

GoRedisgRPCPrometheusDocker

Case study Live demo Source

Schema Evolution in Long-Lived Systems

Research implementation of eight real-world schema evolution scenarios across a three-service microservices architecture, covering PostgreSQL migrations, REST API versioning, and event schema evolution with backward compatibility patterns.

backward compatibilityAPI versioningevent schema evolutiondatabase migrations

PythonPostgreSQLFastAPIKafkaDocker

Case study Live demo Source

Change Data Capture Pipeline

CDC pipeline that streams database changes into an event log, supports consumers, replay, and schema evolution with a demo consumer that builds projections.

CDCevent sourcingstream processingschema evolution

PythonPostgreSQLKafkaDocker

Case study Live demo Source

Retrieval Experiment Platform

A tool for testing and evaluating RAG retrieval pipelines by comparing chunking strategies, embedding models, and reranking methods using metrics like Precision@K and nDCG.

retrieval evaluationchunking strategiesembedding comparisonreranking

PythonRAGEmbeddingsNLP

Case study Live demo Source

Designing Idempotent APIs at Scale

Production-quality research system comparing six idempotency strategies for a payments API domain, built with FastAPI, PostgreSQL, Redis, and RabbitMQ.

idempotency patternsdistributed systemssaga patternoutbox pattern

PythonFastAPIPostgreSQLRedisRabbitMQDocker

Case study Live demo Source

Hallucination Mitigation in Enterprise LLM Apps

Production-grade research system for evaluating, benchmarking, and mitigating hallucinations in enterprise LLM applications with multiple RAG variants and guardrail frameworks.

RAG pipelineguardrailscitation enforcementself-critique

PythonRAGLLMNLPpytest

Case study Live demo Source

Config Service

Centralized configuration service used across internal systems for managing application settings and feature flags.

configuration managementinternal tooling

TypeScriptNode.js

Case study Live demo Source

Failure Recovery Patterns in Microservices

Platform simulating microservice failures to evaluate retries, circuit breakers, bulkheads, and idempotency. Measures reliability, latency, and duplicate prevention to guide resilient system design.

circuit breakersretry patternsbulkhead isolationoutbox pattern

PythonFastAPIPostgreSQLRedisDockerPrometheusGrafana

Case study Live demo Source

IaC Maintainability Study

Comprehensive empirical study examining how structural design decisions in Terraform infrastructure-as-code affect long-term maintainability, drift susceptibility, and change management complexity.

infrastructure as codedrift detectionmaintainability metricsreference architectures

TerraformHCLAWSPython

Case study Live demo Source

LLM Gateway — AI Infrastructure

Production-ready unified API gateway for routing requests across multiple LLM providers with built-in rate limiting, response caching, cost tracking, and OpenTelemetry observability.

API gatewaymodel routingrate limitingcost tracking

PythonFastAPIRedisPostgreSQLDockerOpenTelemetry

Case study Live demo Source