Marginis a permission-aware company knowledge base for Indian SMEs. Your team uploads or forwards documents; anyone with access asks questions in natural language and gets answers grounded in those documents with clickable citations, or an honest “not in your documents.” This policy explains what data we collect, where it lives, and how it is processed.
1. What Data We Collect
Margin collects the minimum data required to run your company knowledge base:
- Account info — name, email address, and profile picture from your Google OAuth sign-in, plus the workspace and role you belong to.
- Documents you upload or forward — files your team adds through manual upload, the Google Picker, email-in, or WhatsApp (contracts, policies, invoices, spreadsheets, etc.). Prose documents (PDF/DOCX/TXT/MD) have their text extracted; scanned pages are run through OCR. Spreadsheets (XLSX/CSV) are parsed sheet-by-sheet into structured rows. We interpret messy structure in memory to answer accurately, but we never alter, reformat, or write back your original file.
- Document chunks and embeddings — prose documents are split into chunks; for each chunk we generate a vector embedding via Voyage AI. Chunks, vectors, and source metadata are stored in your workspace and used solely to retrieve relevant context when someone on your team asks a question.
- Structured spreadsheet data — tabular rows are stored as structured data (with a short, AI-generated description per sheet) so numerical questions are computed, not guessed. Spreadsheets are queried, never embedded as prose.
- Permissions and visibility settings— each document's visibility (everyone, admins only, or a named list of users) and its confidential flag, which you control.
- Questions, answers, and Q&A history — each question, the grounded answer, the sources cited, the detected intent, and a timestamp are saved to your workspace so answers are searchable, autofill suggests past questions, and admins can mark answers as verified.
- Connector query results — when you use a read-only connector (e.g. Zoho), live results are returned to you in real time and named to their source. They are not stored or cached into your knowledge base.
2. How Your Data Is Stored
All user data is stored in a Supabase project (PostgreSQL + pgvector + Storage) hosted in the India region (AWS ap-south-1, Mumbai). Data is encrypted at rest and in transit (TLS 1.2+).
Row-Level Security is enabled on every table, with no exceptions. Identity is always derived from the authenticated server-side session — never from a client-supplied workspace or user identifier — and the permission filter that decides which documents a person can see is enforced both by RLS and in the query layer (defence in depth). A user can never see, or get metadata about, a document they lack rights to.
The application is hosted on Vercel. Secrets, API keys, and connector tokens live only in the server environment and are never exposed to client code. When you ask a computational (spreadsheet) question, the relevant structured rows are sent to Vercel Sandbox — an isolated, short-lived microVM — so generated code can compute the answer rather than the model guessing at arithmetic. That data exists only for the lifetime of the computation and is not retained by the sandbox afterwards.
3. Google API Services — Scopes and Limited Use Disclosure
Margin requests only the narrowest Google scopes needed to sign you in and let you pick specific files:
openid, email, and profile — to authenticate you and show your name, email, and avatar.drive.file — used together with the Google Picker so that we can access only the specific files you explicitly select to import into your knowledge base.
We never request any gmail.* scope, never request drive.readonly, and never request any Drive scope broader than drive.file. We cannot see, list, or read any file in your Drive that you did not pick. Margin's use and transfer of information received from Google APIs adheres to the Google API Services User Data Policy, including the Limited Use requirements. We do not use Google user data for advertising, and we do not use it to develop, improve, or train any generalized AI / machine-learning model.
4. How AI Processing Works
Margin uses the Anthropic Claude API to classify what a question is asking, to write grounded answers from your documents, and to reason over spreadsheet data. It uses Voyage AI to turn document chunks and queries into embeddings for retrieval, and Google Cloud Document AI to read text from scanned pages, handwriting, and images.
- What is sent to Anthropic: only the specific document chunks, spreadsheet rows, and question text relevant to the query someone on your team just asked — never your whole corpus in one batch.
- What is sent to Voyage AI: only the text being embedded (a document chunk or a query). It is returned as a vector and is not retained for training.
- What is sent to Google Cloud Document AI: only the scanned pages, photos, or image-based files that need optical character recognition (OCR) — used to extract their text so the document becomes searchable.
- Your documents are never used to train AI models. Per Anthropic's Commercial Terms of Service, inputs and outputs sent via the API are not used for model training, and Voyage likewise does not train on the text we send per their privacy policy.
- Grounded or silent. Every knowledge answer cites its sources. When retrieval confidence is low, Marginreturns an explicit “not in your documents” and suggests the nearest documents you are permitted to see — it never fabricates an answer from the model's own priors.
Confidential items. Documents marked Confidentialare excluded from proactive surfacing and from any retrieval the asking user is not permitted to perform. The “not in your documents” suggestions are computed from the permitted set only, so a document you cannot access never leaks — not even its title.
5. Connectors and Ingestion Channels
Marginis “live but curated”: you always choose what enters the corpus. We never pull data you did not push or pick.
- Read-only, per-user connectors. External systems such as Zoho CRM and Zoho Books are queried with your own OAuth token, never a shared company login, so the external system's own permissions apply automatically. Connectors are read-only: Margin reads status and records, and never writes, updates, or takes actions on your behalf.
- Live passthrough, never ingested. Connector results are fetched in real time and returned with the source named. They are not embedded or cached into your knowledge base.
- Ingestion channels you control. Beyond manual upload and the Google Picker, you can forward documents in by email (handled by Resend, which also sends our sign-in and transactional email) or via WhatsApp (handled by an Indian Business Solution Provider such as AiSensy or Interakt, only when you enable that channel). Items arriving through these channels are staged for a member to review and approve before they enter the corpus.
6. Data Retention
- Account and workspace data is retained for as long as your account and workspace are active.
- Uploaded documents, chunks, embeddings, and structured spreadsheet data are retained until a workspace admin deletes or rejects the document in the Library, or until the workspace is closed.
- Q&A history is retained within your workspace for as long as the workspace is active.
- Connector tokens are refreshed automatically and overwritten as needed; if you revoke access from the external system, the stored token becomes invalid.
- On a verified deletion request (see Section 7), the relevant data is permanently removed from our application database. Managed point-in-time backups used for disaster recovery age out on their normal schedule; we do not retain or restore from them after a deletion request.
7. Your Rights
The controls available to you include:
- Access — view the documents, answers, and history available to you through the app, subject to your workspace permissions.
- Correction and visibility— re-upload a corrected document, and (as a workspace admin) set each document's visibility — everyone, admins only, or a named list of users — and its confidential flag at any time.
- Document deletion — a workspace admin can delete or reject individual documents from the Library, which removes their stored content, chunks, embeddings, and structured rows.
- Account deletion and data export — Margin does not currently offer self-serve account deletion or a one-click “export all my data” download. To request deletion of your account, or a copy of the data we hold about you, contact your workspace admin or email us (see Section 9) and we will action the request.
- Revoke access — disconnect Google or any connector at any time. For Google, use your Google Account permissions page.
We do not sell personal data, and we do not use your documents to train AI models.
8. Third-Party Services (Subprocessors)
Margin relies on the following third parties to process data on its behalf. The last two are engaged only when you enable the relevant feature.
| Service | Purpose | Data shared |
|---|
| Supabase | PostgreSQL database, pgvector store, authentication, file storage | All user data (stored in AWS Mumbai, ap-south-1), with Row-Level Security on every table |
| Anthropic | Claude API — intent routing, grounded answers, spreadsheet reasoning | The document chunks, spreadsheet rows, and question text relevant to a query. Not used for training. |
| Voyage AI | Text embeddings for retrieval | Document chunks and query text — sent as text, returned as vectors. Not used for training. |
| Google Cloud Document AI | OCR of scanned pages, handwriting, and images | Only the image-based files that need text extraction |
| Google APIs | OAuth sign-in; Google Picker file import | OAuth tokens (openid, email, profile, drive.file); access only to the specific files you pick |
| Vercel | Application hosting; Vercel Sandbox for spreadsheet computation | Request logs (IP, user agent); for computational questions, the relevant spreadsheet rows are sent to an isolated sandbox to run the computation |
| Resend | Inbound email ingestion; sign-in and transactional email | Emails (and their attachments) you forward in, plus the address we send notifications to |
| WhatsApp BSP (AiSensy / Interakt) | WhatsApp ingestion and queries — only when you enable it | Messages and documents you send to the WhatsApp channel |
| Zoho (read-only connector) | Live CRM / Books lookups — only when a user connects it | Queried with that user's own OAuth token, read-only; results are returned live and never cached into the knowledge base |
We may update this policy as Marginevolves; material changes will be reflected in the “last updated” date above and, where appropriate, communicated to you.