docs/Architecture Overview

Architecture Overview

Every Guava call involves two systems running simultaneously: Guava's hosted Dialog System and your Expert — a small, long-running service that connects to Guava's API and steers the conversation.

GUAVA CLOUDDialog Systemaudio · STT · LLM · TTSCalleraudioWebSocketYour ExpertPython · TypeScript · ...Your Infrastructurelocal or self-hostedorGuava Hostingmanaged by Guava

The Dialog System

The Dialog System is Guava's managed service running in the cloud. It handles everything time-sensitive during a call: receiving the caller's audio, running speech-to-text, querying the language model, synthesizing the response, and streaming it back to the caller.

Because the entire pipeline runs as a fully integrated architecture rather than a chain of off-the-shelf APIs, the Dialog System delivers best-in-class latency and naturalness. Callers hear a response that feels like a real conversation, not a chatbot reading from a script.

Your Expert

Your Expert is the code you write. Using the Guava SDK (Python or TypeScript), it connects to the Dialog System over a persistent WebSocket and steers the agent in real time — setting its persona, sending mid-call instructions, responding to events like on_question or on_action, and issuing commands like transfer or hangup.

Because your Expert is just code, you can do anything: call your CRM, query a database, hit an external API, or chain into another specialized AI sub-agent. For the most common patterns — intent detection, document Q&A, vector search — Guava ships a helper library so you can get up and running fast without reinventing the wheel.

Your Expert is not in the latency-critical path. The Dialog System handles all real-time audio processing independently — your Expert can spend time on complex reasoning, external API calls, or chaining multiple AI models without the caller ever noticing a pause.

During development, your Expert runs on your local machine, and Guava routes calls to it directly. You can rapidly iterate by changing the code and restarting the process — no public web server or ngrok required.

Deployment

When it's time to move to production, you'll want your Expert deployed in a highly-available configuration, running continuously and ready to handle calls at any time. Because Guava Experts only make outbound connections, it's easy to run an Expert behind a NAT or firewall.

We recommend running multiple instances of the same Expert. Guava round-robins new calls across connected Experts, giving you horizontal scaling and redundancy by default. If an Expert instance dies mid-call, Guava will attempt to hand the call off to another active instance — which means you should keep in-memory state to a minimum and design your Expert to be stateless where possible.

You have two options for deploying your Expert:

  • Your Infrastructure — deploy to your own servers, VM, or serverless compute platform. You control the environment.
  • Guava Hosting — push your Expert with a single guava deploy command and Guava manages the rest.

See the Deployment guide for a full walkthrough of both options.

The Quickstart walks you through a complete working example in minutes. Once you're comfortable with the basics, the SDK Reference covers every callback and call command in detail. If you want to see a real-world use case before diving into reference docs, the example walkthroughs show full Expert implementations for common scenarios.

Questions? hi@goguava.ai