Changelog — Fyntune

v1.4.2 2026-04-14

Multi-model comparison and eval batch API

NEW Eval batch API — run the same criteria set across multiple models in one request. Returns side-by-side delta scores.
NEW Dashboard comparison view for multi-model evals: overlaid score timelines per model.
IMPROVED Eval run latency reduced by 18% via eval worker pool optimization.

v1.4.0 2026-02-19

LLM-as-judge calibration tooling

NEW Calibration wizard: upload 20+ human-labeled samples to calibrate your LLM-as-judge criteria against human judgment.
NEW Calibration score displayed per criterion in the dashboard — shows agreement rate with human labels.
FIX Fixed a race condition in concurrent eval runs that could produce stale delta scores for high-frequency deploys.

v1.3.5 2025-12-08

Vercel integration and overage alerts

NEW Official Vercel deploy hook integration. Eval runs trigger automatically on Vercel preview and production deployments.
NEW Eval run usage alerts: email notification at 80% and 100% of monthly limit. Optional Slack alert.
IMPROVED TypeScript SDK: added full type coverage for eval result objects and criteria config.

v1.3.0 2025-09-22

Custom YAML criteria and guardrail compliance eval

NEW Custom criteria in fyntune.yaml: define rule-based or LLM-as-judge criteria alongside default suite.
NEW Guardrail compliance eval type: production input distribution sampling with configurable sample size.
IMPROVED GitLab CI integration plugin updated to support GitLab 17.x pipeline syntax.

v1.2.0 2025-06-15

GitHub Actions integration and Slack alerts

NEW fyntune-ai/eval-action@v2 GitHub Actions integration. Blocks PR merge on regression with inline PR comment showing delta scores.
NEW Slack webhook integration: regression alerts with criterion deltas and direct link to eval run.
FIX Python SDK: resolved @track decorator incompatibility with async LLM call functions.

v1.1.0 2025-03-04

Prompt version tracking and delta dashboard

NEW Prompt version tagging via CLI: fyntune prompt tag command stores diff and links eval results to specific prompt file versions.
NEW Dashboard delta view: side-by-side eval scores for any two prompt versions with per-criterion breakdown.
IMPROVED Factuality eval: ground truth comparison now supports multi-document context.

v1.0.0 2024-11-18

Initial release — Fyntune eval platform

NEW Python SDK with @track decorator for OpenAI, Anthropic, and any REST LLM API.
NEW Default eval suite: 42 criteria covering semantic similarity, factuality, coherence, and tone consistency.
NEW Web dashboard with eval run history, per-criterion scores, and quality timeline charts.
NEW Starter (free) and Team ($149/mo) pricing tiers.

What's new in Fyntune