Accelerating Agent Cycles in Responses API via WebSockets

Friends, I want to share from the OpenAI ecosystem: the Responses API has added a WebSocket mode that speeds up agent cycles.
I reviewed the team's notes: they introduced a persistent connection and state cache to avoid rebuilding history on each request.
Key changes: cached rendered tokens; fewer network hops; faster safety classifiers; compatibility with response.create + previous_response_id.
Result: agent cycles improved by up to 40%; GPT‑5.3‑Codex‑Spark reached ~1,000 TPS (peaks up to 4,000 TPS). Alpha partners confirmed the gains.
Why it matters: model latency to users depends on reducing API overhead.
Which parts of your stack would you accelerate first?
#OpenAI #ResponsesAPI #WebSockets #LLM


Latest comments
No comments yet.