LLM Evals Workbench
A reproducible framework for prompt/model evaluation with scenario-driven scoring and regression tracking.
A curated set of ML systems and tools with engineering details.
A reproducible framework for prompt/model evaluation with scenario-driven scoring and regression tracking.
A retrieval pipeline toolkit for indexing, re-ranking, and latency profiling under production constraints.
Validation jobs and drift alerts to keep online and offline features aligned.