Privacy Policy

Last updated: 21 July 2026 · Version 2.0

Semurg is designed for privacy-first operation. The Shield pipeline tokenises personal data ("PII") at the perimeter before any external model call. Raw PII never leaves your instance unmasked. This Privacy Policy explains what we collect, why, who we share it with, and how you control it.

1. Who We Are (Data Controller)

Semurg AI ("we", "us"). Postal address available on request via [email protected]. Where Semurg processes Customer Data on behalf of a Customer (i.e. the Customer is the controller of personal data about its end-users), Semurg acts as a processor and the relationship is governed by the Data Processing Agreement (DPA) at /dpa.

2. Data We Collect

Account data — pseudonymous session identifiers (no email required by default), Access Key hash, country selected at onboarding.
Optional contact email — only if you explicitly provide one for billing receipts or account recovery.
Usage telemetry — request counts, latency, error rates, model selected, modality (text/image/audio/video), bytes moved (no message content).
Documents you upload — stored in your knowledge graph; PII tokenised on ingestion via the Shield pipeline.
Conversation history — stored with PII tokens, not raw PII; retained until you delete it or request erasure.
Third-party API keys — stored AES-256-GCM encrypted in your sovereign vault, derived from your Access Key; used only for model calls you initiate; never logged in plaintext.
Billing data — Stripe customer ID, last-4 of card (held by Stripe, not by us), invoice history, country for tax purposes.
Diagnostic logs — stripped of PII via the Semurg PIILogger module before write; retained 30 days, then rotated.

3. Why We Process It (Legal Bases under GDPR Art. 6)

Performance of contract (Art. 6(1)(b)) — to provide the Platform you have requested.
Legitimate interests (Art. 6(1)(f)) — to secure the Platform, prevent fraud, and improve service reliability. We balance these interests against your rights and freedoms; you may object at any time (see § 7).
Legal obligation (Art. 6(1)(c)) — to comply with tax, accounting and regulatory obligations.
Consent (Art. 6(1)(a)) — for any optional processing where consent is the appropriate basis (e.g. marketing emails, never sent without opt-in).

We do not train models on your inputs, outputs, or knowledge graph. We do not sell your data.

4. Data Residency

You select a Home Cloud and Region during onboarding. Customer Data is stored at rest only in the Home Region. You additionally select Allowed Processing Regions and Denied Processing Regions; we will not move your data into a Denied Region for any purpose. Migration between Home Regions is initiated by you via Settings → Data Residency and is logged immutably.

5. International Transfers

Where data is processed outside the EEA or UK, we use Standard Contractual Clauses (SCCs) approved by the European Commission under GDPR Art. 46(2)(c) (Module 2/3 as applicable), and the UK Addendum to the International Data Transfer Agreement for UK transfers. Where you have set Denied Processing Regions, no transfer to those regions occurs regardless of legal basis.

6. Sub-Processors

We engage the following categories of sub-processor. The current authoritative list, updated at least 30 days in advance of changes, is published at /dpa#sub-processors.

Cloud infrastructure: Vultr (primary), AWS, GCP, Hetzner, Exoscale, Akamai (per your residency selection).
Payments: Stripe (PCI-DSS L1).
Email transactional: Postmark / SendGrid (where you provide an email).
Edge / TLS: Cloudflare (proxy + Universal SSL).
Optional external LLM providers, only if you bring your own keys: OpenAI, Anthropic, Google, xAI, OpenRouter, etc. We never share data with an external provider unless you have configured the key for that provider.

7. Your Rights (EU/UK GDPR, plus equivalent rights under CCPA/CPRA, LGPD, PIPEDA, APP)

Access (Art. 15) — request a copy of data we hold about you.
Rectification (Art. 16) — correct inaccurate data.
Erasure / Right to be Forgotten (Art. 17) — request deletion.
Restriction (Art. 18) — limit processing in defined circumstances.
Portability (Art. 20) — receive your data in JSON / CSV.
Object (Art. 21) — object to processing based on legitimate interests.
Withdraw consent (Art. 7(3)) — at any time, where processing is consent-based.
Right against automated decision-making (Art. 22) — we do not make solely-automated decisions producing legal or similarly significant effects on you.
Right to lodge a complaint (Art. 77) — with the ICO (ico.org.uk), the Office of the Australian Information Commissioner (oaic.gov.au), or your local data-protection authority.
California-specific rights (CCPA/CPRA): the right to know, delete, correct, opt-out of "sale" / "sharing" (we do neither), and limit use of sensitive personal information.

Most rights can be exercised directly from Settings; otherwise email [email protected]. We will respond within 30 calendar days (GDPR Art. 12), extendable by a further two months for complex requests.

8. Retention

Account data — for the life of your account, then 30 days for export, then deleted.
Conversation history — retained until you delete it or request erasure.
Documents and knowledge graph — retained until you delete the items or the account.
PII token maps — 90 days (configurable lower), then swept.
Operational logs — 30 days, then rotated; aggregate (non-personal) telemetry retained up to 13 months.
Billing records — minimum legally required period (typically 7 years for AU/EU/UK tax purposes).
Backups — encrypted; rotated within 90 days. Erasure requests propagate through the next backup-rotation cycle.

9. Security

See /security for a fuller description. Highlights: TLS 1.3 in transit; AES-256-GCM at rest; per-user vault encryption derived from the user's Access Key; no shared admin master key; sovereign-deployment option (Sovereign Bind) where the entire Platform runs on Customer-controlled hardware bound to a TPM 2.0 device.

10. Children

The Platform is not directed at children under 16 (under 13 in the United States). We do not knowingly collect personal data from children. If you believe a child has provided us personal data, please contact [email protected] and we will delete it promptly.

11. Cookies

We use a minimal set of strictly-necessary cookies for session authentication and CSRF protection. We do not use advertising or cross-site tracking cookies. See /cookies for the full list.

12. Changes to this Policy

We may update this Policy from time to time. Material changes will be communicated to you at least 30 days in advance via in-product notice or email (where you have provided one). The "Last updated" date at the top reflects the current version.

13. PII Tokenisation Details (the Shield pipeline)

Detection. A multi-stream NER pipeline (regex + neural NER + credential scanner + entropy scanner) detects personal data spans across 13 default jurisdictions (AU, US, UK, EU, CA, IN, SG, JP, BR, MX, ZA, NZ, CN) plus any jurisdictions Customer enables. Job titles, form labels, and common legal vocabulary are explicitly stop-listed to avoid false positives.
Tokenisation = per-session. Within one chat session, the SAME personal data value receives the SAME [PII:TYPE:N] token across every turn, every retrieved document, and every history entry — so the AI can reason coherently. Across DIFFERENT sessions, the same value MAY receive different tokens (this bounds correlation risk).
Egress = tokenised; return = restored. External LLM providers receive only tokens, never raw values. Responses are restored inside the Semurg perimeter before being presented to Customer or stored.
Local-model path = raw. When Customer routes a chat to a local model that runs on-cluster (no egress), no tokenisation is applied — there is no privacy boundary to enforce.
Audit trail. Every PII restore attempt is logged (token types only — never raw values). Customer can review this trail under Settings → Audit.
Email outbound (agents). When a Semurg agent sends mail to an external recipient, non-name personal data is STRIPPED entirely (not tokenised) because the recipient cannot detokenise. Names are preserved in plaintext (the recipient already knows the parties).

14. Agents — What Data They Hold

Per-agent identity. Each agent (Cluster, Intelligence, Sales, Marketing, Partnerships, Support, Analytics, plus Customer-defined agents) has its own identifier and its own mailbox alias. Agents are first-class Platform members, not bot-API shims.
Per-agent state. An agent stores its conversation memory, pending actions, sent mail, and Approval-Gate decisions in its own owner-scoped subgraph. Other Customers cannot read this subgraph.
Sub-agent spawning. Where an agent spawns a sub-agent (e.g. a research delegate), the sub-agent inherits the parent's owner-scope. Cross-owner sub-agent spawning requires the target owner's explicit share grant.
Agent activity in /messages. Customer can review every email an agent has sent or received under /messages → Inbox → filter by recipient. The filter shows only Customer's own agents' mail.

15. Sharing — Opt-in, User-Controlled

Default-deny. No data crosses a user boundary unless that user explicitly grants access.
Per-scope (User / Team / Org). Customer toggles sharing per resource and per recipient scope (a specific user, a team, the whole org).
Per-feature (Chat / Dashboards / Reports / RAG-context). Customer can share a resource for one feature and withhold it from another.
Org admins cannot bypass. An organisation admin can manage seats and billing, but cannot read, search, export, or decrypt member content without an explicit share grant from that member.
Revocation. Withdrawing a share invalidates access on the recipient's next request.

16. Free-Tier Device Fingerprint

What we collect. A SHA-256 hash of (IP address + User-Agent string + browser fingerprint signature) is computed locally and stored at the time you claim the one-time 10 MB free allocation.
Why. To deter abuse of the free tier (repeat-claim by the same device).
What we do NOT do. We do not reverse-look-up the hash to identify you, we do not share the hash with third parties, and we do not use it for advertising. The raw IP is not retained alongside the hash.
Retention. The fingerprint hash is retained for 24 months from issuance of the free credit, then deleted.

17. Privacy Posture — Today vs Roadmap

We separate what is structurally enforced today from what is on the architectural roadmap, so Customer is not relying on a property that is not yet enforced by construction:

Today (policy-enforced). Organisation admins cannot read individual member content via the product surface. This is enforced by access-control policy and code path, not by structural cryptography.
On the roadmap (architecture-enforced). Client-held-key zero-knowledge mode in which the org's admins cannot decrypt employee content by construction. Not live today.
No model training on Customer Data. Customer inputs and outputs are never used to train any model, internal or external. Enforced both by code path (no training corpus reads from Customer-data tables) and by policy.

18. Contact

Privacy enquiries: /contact or [email protected]. Security disclosures: /contact?topic=security or [email protected].

← Home Terms Privacy

💬 Send feedback