Issue 01 · Pipeline Inspector

Overview

An LLM benchmark pipeline: generate cases, build rubrics, run models, score responses. Walk the four steps in order, then dig into stats and analysis.

Cases

Pipeline progress

- / 4

Up next

Generate

The pipeline

4 steps

01
Generate
Create clinical cases from stems using an LLM
Pending
02
Rubric
Generate scoring rubrics for each case
Pending
03
Run
Run subject models against each case
Pending
04
Score
Score each response against the rubric
Pending

Departments

Analysis & reference

Stats

Token usage, cost, and runtime per step

Analysis

Score distributions and model comparisons

Experiments

Slideshow of experiment findings

Tutorials

Quick-reference for every installed skill