Confidence thresholds

Confidence is the dial that turns AI from cautious to aggressive. Two floors and a fallback chain.

By ChristopherUpdated May 14, 20263 min read

Confidence thresholds

Every AI reply and every AI label carries a confidence score from 0 to 100. The thresholds you set decide what happens at each level.

This is the most important AI knob in Ochre. It is also the most misunderstood.

What confidence is

Confidence is the AI's own estimate of how likely its answer is correct. It is calibrated against your past tickets: when the AI has said "90% confident" on similar messages, how often was it right.

It is not a probability in the strict sense, but it correlates well enough with correctness that you can set thresholds you trust.

Two floors that matter

Two thresholds drive behavior:

Auto-send floor. Below this, replies do not send themselves. They drop to draft. Default 85%.
Silence floor. Below this, the AI does not surface a reply at all. The ticket goes to a human untouched. Default 50%.

Between the two, the AI drafts but does not send. The agent reviews and either edits or discards.

The cascade

For a workspace in auto-send mode with default thresholds:

Confidence ≥ 85: AI sends.
Confidence 50 to 84: AI drafts, agent reviews.
Confidence < 50: AI is silent, ticket goes to a human.

For a workspace in draft mode:

Confidence ≥ 50: AI drafts.
Confidence < 50: AI is silent.

For suggest mode:

Confidence ≥ 50: AI shows a suggestion next to the composer.
Confidence < 50: agent sees the conversation with no AI hint.

See Auto-send vs draft vs suggest for the modes themselves.

Auto-labeling has its own floors

Auto-labeling thresholds are separate. Defaults:

Auto-label floor: 60% (below this, no label is applied).
Auto-route on label floor: 75% (below this, the label exists but does not drive routing).

See Auto-labeling: topic, priority, confidence.

How to set the floors

Open AI → Drafting. Three sliders:

AI reply: auto-send floor.
AI reply: silence floor.
Labels: auto-route floor (also editable on AI → Auto-labeling).

Most teams should:

Start auto-send floor at 90%.
Lower it to 80% after two weeks of clean receipts.
Never lower silence floor below 40% on production traffic.

Confidence calibration

Confidence is calibrated as you accumulate receipts. The first hundred AI replies have rougher confidence scores than the thousandth. Treat the first week as a calibration period.

If you see consistent confidence patterns that do not match outcome, the Quality assurance review flow lets reviewers correct the AI and the calibration improves.

Per-channel and per-topic floors

You can override floors per channel and per topic. Common overrides:

Chat widget: auto-send floor 80%, silence floor 40% (more permissive).
Email: auto-send floor 90%, silence floor 60% (more conservative).
Topic = "billing": auto-send floor 95% or send-disabled (very conservative).
Topic = "bug": send-disabled by Guardrails and bypass labels regardless of confidence.

What confidence does NOT mean

High confidence is not "no review needed". A 95% confident reply can still be wrong about a fact. Confidence is a probability, not a guarantee.
Low confidence is not "the AI is broken". It often means the question is genuinely hard or ambiguous.
Confidence is not a quality score. A reply can be confidently bad. CSAT and QA reviews catch quality problems.

Watching the floors in production

In the receipts feed and the Spend page, three numbers tell you if floors are well set:

Auto-send rate. Replies sent automatically as a percent of total. 40 to 70% is healthy at default settings.
Silence rate. Replies suppressed as a percent of total. Should be under 10%.
Edit rate on auto-sent replies. Replies sent then rolled back or hand-edited by a human after the fact. Should be near zero. If above 1%, raise the auto-send floor.

Recommended starting setup

Auto-send floor: 90%.
Silence floor: 50%.
Auto-label floor: 60%.
Auto-route on label floor: 75%.
Per-topic override: send-disabled on "billing" for the first month.

Was this article helpful?

← Back to Ochre Help

Confidence thresholds

What confidence is

Two floors that matter

The cascade

Auto-labeling has its own floors

How to set the floors

Confidence calibration

Per-channel and per-topic floors

What confidence does NOT mean

Watching the floors in production

Recommended starting setup

Related