Architecture Overview
Every Guava call involves two systems running simultaneously: Guava's hosted Dialog System and your Expert — a small, long-running service that connects to Guava's API and steers the conversation.
The Dialog System
The Dialog System is Guava's managed service running in the cloud. It handles everything time-sensitive during a call: receiving the caller's audio, running speech-to-text, querying the language model, synthesizing the response, and streaming it back to the caller.
Because the entire pipeline runs as a fully integrated architecture rather than a chain of off-the-shelf APIs, the Dialog System delivers best-in-class latency and naturalness. Callers hear a response that feels like a real conversation, not a chatbot reading from a script.
Your Expert
Your Expert is the code you write. Using the Guava SDK (Python or TypeScript), it connects to the Dialog System over a persistent WebSocket and steers the agent in real time — setting its persona, sending mid-call instructions, responding to events like on_question or on_action, and issuing commands like transfer or hangup.
Because your Expert is just code, you can do anything: call your CRM, query a database, hit an external API, or chain into another specialized AI sub-agent. For the most common patterns — intent detection, document Q&A, vector search — Guava ships a helper library so you can get up and running fast without reinventing the wheel.
Your Expert is not in the latency-critical path. The Dialog System handles all real-time audio processing independently — your Expert can spend time on complex reasoning, external API calls, or chaining multiple AI models without the caller ever noticing a pause.
During development, your Expert runs on your local machine, and Guava routes calls to it directly. You can rapidly iterate by changing the code and restarting the process — no public web server or ngrok required.
Deployment
When it's time to move to production, you'll want your Expert deployed in a highly-available configuration, running continuously and ready to handle calls at any time. Because Guava Experts only make outbound connections, it's easy to run an Expert behind a NAT or firewall.
We recommend running multiple instances of the same Expert. Guava round-robins new calls across connected Experts, giving you horizontal scaling and redundancy by default. If an Expert instance dies mid-call, Guava will attempt to hand the call off to another active instance — which means you should keep in-memory state to a minimum and design your Expert to be stateless where possible.
You have two options for deploying your Expert:
- Your Infrastructure — deploy to your own servers, VM, or serverless compute platform. You control the environment.
- Guava Hosting — push your Expert with a single
guava deploycommand and Guava manages the rest.
See the Deployment guide for a full walkthrough of both options.
What to read next
The Quickstart walks you through a complete working example in minutes. Once you're comfortable with the basics, the SDK Reference covers every callback and call command in detail. If you want to see a real-world use case before diving into reference docs, the example walkthroughs show full Expert implementations for common scenarios.
Questions? hi@goguava.ai
