Ahmed Waregh
Back home

Writing

Technical papers and deep-dives on systems I've built and problems I've solved in production — distributed systems, AI infrastructure, and platform engineering.

Hierarchical Chunking Strategies for Production RAG Systems: Balancing Retrieval Precision and Context Coherence

2024

Internal Technical Report

Retrieval-Augmented Generation systems degrade in precision as knowledge bases grow. This paper examines chunking strategies — fixed-size, paragraph-level, and hierarchical parent–child — across corpora of varying size and domain density. We introduce a re-ranking layer using cross-encoder models and show it recovers precision lost at scale while remaining compatible with standard vector-search backends. Benchmarks are run against a golden dataset of 2,400 support queries across four enterprise tenants.

RAGLLMVector SearchInformation Retrieval

Multi-Tenant Event Sourcing at Scale: Schema Isolation, Replay Semantics, and Operational Lessons

2024

Internal Technical Report

Event sourcing in multi-tenant SaaS systems introduces tension between tenant isolation and operational simplicity. We describe our experience migrating a 30-tenant workflow platform from a shared event log to a namespace-isolated architecture, covering schema-per-tenant trade-offs, aggregate snapshot strategies to bound replay time, and the tooling required to safely replay tenant event streams without cross-tenant interference.

Event SourcingMulti-TenantDistributed SystemsCQRS

Exactly-Once Delivery in Heterogeneous Sink Pipelines: Lessons from a High-Throughput Kafka Consumer Fleet

2025

Internal Technical Report

Exactly-once semantics in streaming pipelines are well-studied within a single system but become subtle when events must be durably committed to multiple heterogeneous sinks — analytics stores, billing aggregators, and alerting systems — in a single logical transaction. We detail the rebalance-listener pattern, idempotency key design, and per-sink commit protocols that enabled zero duplicate charges across 40M+ daily events on a Kafka-backed pipeline.

Apache KafkaStream ProcessingExactly-Once DeliveryDistributed Systems

Clock-Independent Rate Limiting: Eliminating Skew Drift in Distributed Token-Bucket Implementations

2025

Internal Technical Report

Token-bucket rate limiters that compute refill amounts using client-side timestamps accumulate systematic drift when hosts have clock skew. This paper quantifies the drift under realistic NTP conditions and proposes using authoritative server-side timestamps — specifically Redis server time via Lua scripts — to eliminate client clock dependence entirely. We compare bucket accuracy across five implementations under 50ms and 200ms of injected skew.

Rate LimitingDistributed SystemsRedisAlgorithms