Documentation

Vector Stores

Guava provides four ready-made VectorStore implementations that can be passed directly to DocumentQA as the store argument. Each wraps a popular vector database and handles embedding, indexing, and similarity search.

Python only: Vector store backends are currently available in Python only. TypeScript equivalents are not yet available.

Installation

Install only the backend(s) you need:

pip install 'gridspace-guava[chromadb]'
pip install 'gridspace-guava[lancedb]'
pip install 'gridspace-guava[pgvector]'
pip install 'gridspace-guava[pinecone]'

Importing a backend class without the corresponding extra installed raises ImportError with an install hint.

ChromaVectorStore

from guava.helpers.chromadb import ChromaVectorStore

Parameter	Type	Required	Default	Description
`path`	`str \| None`	No	`"./chroma_data"`	Directory for persistent storage. Pass `None` for an in-memory ephemeral store.
`collection_name`	`str`	No	`"chunks"`	ChromaDB collection name.
`embedding_model`	`EmbeddingModel \| None`	No	`None`	External embedding model. When omitted, ChromaDB's built-in `all-MiniLM-L6-v2` model is used — no external API needed.

LanceDBStore

from guava.helpers.lancedb import LanceDBStore

Parameter	Type	Required	Default	Description
`path`	`str`	No	`"./lancedb_data"`	Local path or GCS URI (e.g. `"gs://bucket/lancedb"`) for storage.
`table_name`	`str`	No	`"chunks"`	LanceDB table name.
`embedding_model`	`EmbeddingModel`	Yes	—	Embedding model to use. Pass a configured instance such as `VertexAIEmbedding`.

Note: LanceDB silently drops tables that predate the current schema version. This triggers a full re-index the next time DocumentQA ingests documents.

PgVectorStore

from guava.helpers.pgvector import PgVectorStore

Parameter	Type	Required	Default	Description
`db_url`	`str`	Yes	—	PostgreSQL connection string (e.g. `"postgresql://user:pass@host/db"`).
`table_name`	`str`	No	`"guava_chunks"`	Table name for stored chunks.
`embedding_model`	`EmbeddingModel`	Yes	—	Embedding model to use. Pass a configured instance such as `VertexAIEmbedding`.

PgVectorStore creates the vector extension, chunks table, and HNSW cosine index automatically on first connect. If the connecting user lacks CREATE EXTENSION privileges, initialization will fail.

Managed Postgres: Managed services (Cloud SQL, AlloyDB, RDS) are untested but expected to work since the implementation uses standard psycopg.

PineconeVectorStore

from guava.helpers.pinecone import PineconeVectorStore

Parameter	Type	Required	Default	Description
`api_key`	`str \| None`	No	env `PINECONE_API_KEY`	Pinecone API key. If omitted, reads from the environment.
`index_name`	`str`	No	`"guava-chunks"`	Pinecone index name. Created automatically if it does not exist.
`cloud`	`str`	No	`"aws"`	Serverless cloud provider for index creation. Ignored if the index already exists.
`region`	`str`	No	`"us-east-1"`	Serverless region for index creation. Ignored if the index already exists.
`embedding_model`	`EmbeddingModel \| None`	No	`PineconeInferenceEmbedding`	Defaults to `multilingual-e5-large` (1024-dim) via Pinecone's hosted Inference API.

Cold start: Pinecone index creation can take 30–60 seconds on first use. Subsequent instantiations with the same index_name skip creation and connect immediately.

PineconeInferenceEmbedding

from guava.helpers.pinecone import PineconeInferenceEmbedding

Parameter	Type	Required	Default	Description
`pc`	`Pinecone`	Yes	—	A configured `Pinecone` client instance.
`model`	`str`	No	`"multilingual-e5-large"`	Pinecone inference model name.
`dimensionality`	`int`	No	`1024`	Output vector size.

GenerationModel

Any implementation of the guava.helpers.rag.GenerationModel interface works with DocumentQA in local mode. The examples on this page use VertexAIGeneration, but other LLM providers can be used as well.

Examples

vector_store_examples.py

from guava.helpers.rag import DocumentQA
from guava.helpers.vertexai import VertexAIEmbedding, VertexAIGeneration
from google import genai

client = genai.Client(vertexai=True, project="my-project", location="us-central1")
embedding = VertexAIEmbedding(client=client)   # gemini-embedding-001, 768-dim
generation = VertexAIGeneration(client=client)  # gemini-2.5-flash


# ChromaDB — no external embedding API required; persists to disk by default
from guava.helpers.chromadb import ChromaVectorStore

store = ChromaVectorStore()                     # path="./chroma_data" by default
store = ChromaVectorStore(path=None)            # in-memory/ephemeral

qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")


# LanceDB — local path or GCS URI; requires an embedding model
from guava.helpers.lancedb import LanceDBStore

store = LanceDBStore("./lancedb_data", embedding_model=embedding)
store = LanceDBStore("gs://my-bucket/lancedb", embedding_model=embedding)  # GCS

qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")


# pgvector — Postgres connection string; table and indexes created automatically
from guava.helpers.pgvector import PgVectorStore

store = PgVectorStore(
    db_url="postgresql://user:password@localhost:5432/mydb",
    embedding_model=embedding,
)
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")


# Pinecone — set PINECONE_API_KEY; index and embeddings are fully managed
from guava.helpers.pinecone import PineconeVectorStore

store = PineconeVectorStore()                   # index_name="guava-chunks" by default

qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")

→ Next: Inbound Calls

Questions? hi@goguava.ai

from guava.helpers.rag import DocumentQA
from guava.helpers.vertexai import VertexAIEmbedding, VertexAIGeneration
from google import genai

client = genai.Client(vertexai=True, project="my-project", location="us-central1")
embedding = VertexAIEmbedding(client=client)   # gemini-embedding-001, 768-dim
generation = VertexAIGeneration(client=client)  # gemini-2.5-flash


# ChromaDB — no external embedding API required; persists to disk by default
from guava.helpers.chromadb import ChromaVectorStore

store = ChromaVectorStore()                     # path="./chroma_data" by default
store = ChromaVectorStore(path=None)            # in-memory/ephemeral

qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")


# LanceDB — local path or GCS URI; requires an embedding model
from guava.helpers.lancedb import LanceDBStore

store = LanceDBStore("./lancedb_data", embedding_model=embedding)
store = LanceDBStore("gs://my-bucket/lancedb", embedding_model=embedding)  # GCS

qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")


# pgvector — Postgres connection string; table and indexes created automatically
from guava.helpers.pgvector import PgVectorStore

store = PgVectorStore(
    db_url="postgresql://user:password@localhost:5432/mydb",
    embedding_model=embedding,
)
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")


# Pinecone — set PINECONE_API_KEY; index and embeddings are fully managed
from guava.helpers.pinecone import PineconeVectorStore

store = PineconeVectorStore()                   # index_name="guava-chunks" by default

qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")