Vector Stores
Guava provides four ready-made VectorStore implementations that can be passed directly to DocumentQA as the store argument. Each wraps a popular vector database and handles embedding, indexing, and similarity search.
Python only: Vector store backends are currently available in Python only. TypeScript equivalents are not yet available.
Installation
Install only the backend(s) you need:
pip install 'gridspace-guava[chromadb]'pip install 'gridspace-guava[lancedb]'pip install 'gridspace-guava[pgvector]'pip install 'gridspace-guava[pinecone]'
Importing a backend class without the corresponding extra installed raises ImportError with an install hint.
ChromaVectorStore
from guava.helpers.chromadb import ChromaVectorStore
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
path | str | None | No | "./chroma_data" | Directory for persistent storage. Pass None for an in-memory ephemeral store. |
collection_name | str | No | "chunks" | ChromaDB collection name. |
embedding_model | EmbeddingModel | None | No | None | External embedding model. When omitted, ChromaDB's built-in all-MiniLM-L6-v2 model is used — no external API needed. |
LanceDBStore
from guava.helpers.lancedb import LanceDBStore
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
path | str | No | "./lancedb_data" | Local path or GCS URI (e.g. "gs://bucket/lancedb") for storage. |
table_name | str | No | "chunks" | LanceDB table name. |
embedding_model | EmbeddingModel | Yes | — | Embedding model to use. Pass a configured instance such as VertexAIEmbedding. |
Note: LanceDB silently drops tables that predate the current schema version. This triggers a full re-index the next time DocumentQA ingests documents.
PgVectorStore
from guava.helpers.pgvector import PgVectorStore
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
db_url | str | Yes | — | PostgreSQL connection string (e.g. "postgresql://user:pass@host/db"). |
table_name | str | No | "guava_chunks" | Table name for stored chunks. |
embedding_model | EmbeddingModel | Yes | — | Embedding model to use. Pass a configured instance such as VertexAIEmbedding. |
PgVectorStore creates the vector extension, chunks table, and HNSW cosine index automatically on first connect. If the connecting user lacks CREATE EXTENSION privileges, initialization will fail.
Managed Postgres: Managed services (Cloud SQL, AlloyDB, RDS) are untested but expected to work since the implementation uses standard psycopg.
PineconeVectorStore
from guava.helpers.pinecone import PineconeVectorStore
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
api_key | str | None | No | env PINECONE_API_KEY | Pinecone API key. If omitted, reads from the environment. |
index_name | str | No | "guava-chunks" | Pinecone index name. Created automatically if it does not exist. |
cloud | str | No | "aws" | Serverless cloud provider for index creation. Ignored if the index already exists. |
region | str | No | "us-east-1" | Serverless region for index creation. Ignored if the index already exists. |
embedding_model | EmbeddingModel | None | No | PineconeInferenceEmbedding | Defaults to multilingual-e5-large (1024-dim) via Pinecone's hosted Inference API. |
Cold start: Pinecone index creation can take 30–60 seconds on first use. Subsequent instantiations with the same index_name skip creation and connect immediately.
PineconeInferenceEmbedding
from guava.helpers.pinecone import PineconeInferenceEmbedding
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
pc | Pinecone | Yes | — | A configured Pinecone client instance. |
model | str | No | "multilingual-e5-large" | Pinecone inference model name. |
dimensionality | int | No | 1024 | Output vector size. |
GenerationModel
Any implementation of the guava.helpers.rag.GenerationModel interface works with DocumentQA in local mode. The examples on this page use VertexAIGeneration, but other LLM providers can be used as well.
Examples
from guava.helpers.rag import DocumentQA
from guava.helpers.vertexai import VertexAIEmbedding, VertexAIGeneration
from google import genai
client = genai.Client(vertexai=True, project="my-project", location="us-central1")
embedding = VertexAIEmbedding(client=client) # gemini-embedding-001, 768-dim
generation = VertexAIGeneration(client=client) # gemini-2.5-flash
# ChromaDB — no external embedding API required; persists to disk by default
from guava.helpers.chromadb import ChromaVectorStore
store = ChromaVectorStore() # path="./chroma_data" by default
store = ChromaVectorStore(path=None) # in-memory/ephemeral
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")
# LanceDB — local path or GCS URI; requires an embedding model
from guava.helpers.lancedb import LanceDBStore
store = LanceDBStore("./lancedb_data", embedding_model=embedding)
store = LanceDBStore("gs://my-bucket/lancedb", embedding_model=embedding) # GCS
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")
# pgvector — Postgres connection string; table and indexes created automatically
from guava.helpers.pgvector import PgVectorStore
store = PgVectorStore(
db_url="postgresql://user:password@localhost:5432/mydb",
embedding_model=embedding,
)
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")
# Pinecone — set PINECONE_API_KEY; index and embeddings are fully managed
from guava.helpers.pinecone import PineconeVectorStore
store = PineconeVectorStore() # index_name="guava-chunks" by default
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")Questions? hi@goguava.ai
from guava.helpers.rag import DocumentQA
from guava.helpers.vertexai import VertexAIEmbedding, VertexAIGeneration
from google import genai
client = genai.Client(vertexai=True, project="my-project", location="us-central1")
embedding = VertexAIEmbedding(client=client) # gemini-embedding-001, 768-dim
generation = VertexAIGeneration(client=client) # gemini-2.5-flash
# ChromaDB — no external embedding API required; persists to disk by default
from guava.helpers.chromadb import ChromaVectorStore
store = ChromaVectorStore() # path="./chroma_data" by default
store = ChromaVectorStore(path=None) # in-memory/ephemeral
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")
# LanceDB — local path or GCS URI; requires an embedding model
from guava.helpers.lancedb import LanceDBStore
store = LanceDBStore("./lancedb_data", embedding_model=embedding)
store = LanceDBStore("gs://my-bucket/lancedb", embedding_model=embedding) # GCS
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")
# pgvector — Postgres connection string; table and indexes created automatically
from guava.helpers.pgvector import PgVectorStore
store = PgVectorStore(
db_url="postgresql://user:password@localhost:5432/mydb",
embedding_model=embedding,
)
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")
# Pinecone — set PINECONE_API_KEY; index and embeddings are fully managed
from guava.helpers.pinecone import PineconeVectorStore
store = PineconeVectorStore() # index_name="guava-chunks" by default
qa = DocumentQA(store=store, generation_model=generation, documents=[doc1, doc2])
answer = qa.ask("What is the deductible?")