POST /monce-haiku → orchestrate, Haiku fallback if trust < 65 POST /monce-sonnet → orchestrate, Sonnet fallback if trust < 65 POST /comprendre → alias for /monce-haiku POST /deterministic → fan-out only, never calls LLM, always $0 POST /stream → SSE: deterministic event first, then LLM event if needed
{
"text": "Bonjour, je cherche du feuilleté 44.2...",
"factory_id": 3,
"anthropic": false // default false — controls LLM fallback
}
anthropic: false (default) means never call LLM, even if trust is low.
Set anthropic: true to allow the LLM fallback when trust drops below threshold.
{
"version": "0.1.0",
"mode": "deterministic" | "haiku_enhanced" | "sonnet_enhanced",
"llm_invoked": false,
"llm_model": null,
"trust_aggregate": 67,
"trust_assessment": {
"aggregate": 67,
"needs_llm": false,
"reason": "Aggregate trust 67 >= 65 — deterministic sufficient",
"ok_services": [...],
"high_trust": [...],
"low_trust": [...],
"failed": [...]
},
"trust_breakdown": {
"emailclassifier": {"trust": 65, "status": "ok", "label": "Email", "latency_ms": 42},
...
},
"dominant": {
"service": "businessclassifier",
"label": "Dispatch",
"trust": 100,
"prediction": "demande_devis"
},
"consensus": {
"emailclassifier": "Devis",
"businessclassifier": "demande_devis",
...
},
"synthesis": null, // null if deterministic, LLM text if enhanced
"classifiers": { ... }, // full raw response per service
"errors": { ... }, // errors for failed services
"latency_ms": 287,
"fanout_ms": 287,
"llm_ms": null
}
The /stream endpoint returns Server-Sent Events for the landing page:
event: deterministic
data: { full deterministic response }
event: llm_start // only if needs_llm
data: {"model": "haiku", "reason": "..."}
event: llm_done // only if LLM succeeded
data: { full enhanced response }
event: complete // only if deterministic was sufficient
data: {"reason": "Aggregate trust 67 >= 65"}
All 10 downstream calls are made concurrently via asyncio.gather.
Every call sends anthropic: false. A slow or failing service does not block the others.
Timeout: 10s per service, 12s global.
Each classifier returns trust in different shapes. The orchestrator normalizes:
1. response.trust_score (int) 2. response.trust.score (nested dict) 3. response.trust (int) 4. Fallback: max(Probability) * 100