Issue 01 ยท Pipeline Inspector

Overview

An LLM benchmark pipeline: generate cases, build rubrics, run models, score responses. Walk the four steps in order, then dig into stats and analysis.

Cases

-

Pipeline progress

- / 4

Up next

Generate

The pipeline

4 steps

  1. 01

    Generate

    Create clinical cases from stems using an LLM

    Pending
  2. 02

    Rubric

    Generate scoring rubrics for each case

    Pending
  3. 03

    Run

    Run subject models against each case

    Pending
  4. 04

    Score

    Score each response against the rubric

    Pending

Departments

Analysis & reference