Platform Architecture
Neostra's microservices architecture, service interactions, data flows, and database strategy.
Service Overview
Neostra is composed of independent microservices, each owning its domain. Services communicate via REST APIs and Google Cloud Pub/Sub for event-driven processing.
| Service | Port | Language | Database | Purpose |
|---|---|---|---|---|
| neostra-core | 9000 | Java/Spring Boot | MongoDB | DSAR, workflows, assessments, privacy centers, breach management |
| astra-core | 9010 | Java/Spring Boot | MongoDB | Tenant provisioning, user management, RBAC, subscriptions |
| cpmp-api-service | 9001 | Java/Spring Boot | PostgreSQL (read) | Consent REST API, publishes to Pub/Sub |
| cpmp-log-receipt-service | 9002 | Java/Spring Boot | PostgreSQL (write) | Pub/Sub consumer, writes consent ledger |
| cpmp-reporting-service | 9003 | Java/Spring Boot | PostgreSQL | Analytics, exports, preference center |
| cpmp-crawler-service | 9004 | Java/Spring Boot | MongoDB | Selenium-based cookie and script scanning |
| data-discovery-boot | 8080 | Java/Spring Boot | PostgreSQL | Scan orchestration, integration management |
| data-discovery-scanner | Worker | Python 3 | PostgreSQL | PII detection using Presidio + regex |
| cpmp-modal | CDN | Svelte 5/TypeScript | — | Cookie consent banner widget |
Database Strategy
Neostra uses a polyglot persistence approach:
Used by neostra-core, astra-core, and cpmp-crawler-service for flexible document storage.
Key collections include: tenants, users, roles, permissions, subject-requests, workflow-instances, assessments, assessment-templates, cookies, collection-points, privacy-centers, integrations, audit-records, and 30+ more.
All documents inherit from BaseDocument with fields: id, createdAt, updatedAt, version (optimistic locking). Multi-tenant isolation is enforced via indexed tenantId on every document.
Used by CPMP services and Data Discovery for structured, immutable records.
CPMP tables: consent_receipts (one row per subject, JSONB receipt data), consent_ledger (append-only audit trail with hash chain), users, otps.
Discovery tables: sources, scans, scan_tasks, findings, data_points, attributes, integrations.
Schema migrations managed via Flyway (CPMP) and Liquibase (Discovery).
Consent Pipeline (CPMP)
The consent management system uses an event-driven architecture for reliable, auditable consent processing:
The consent ledger maintains integrity using SHA-256 hash chains. Each entry references the previous entry's hash, creating an immutable, tamper-evident audit trail.
Data Discovery Pipeline
Data discovery uses a two-tier architecture separating orchestration from scanning:
The Python scanner supports multiple data sources: PostgreSQL, MySQL, MongoDB, AWS S3, AWS DynamoDB. PII detection identifies: emails, phone numbers, Aadhaar numbers, PAN cards, IP addresses, UPI IDs, and more.
DSAR Workflow
Data Subject Access Requests flow through a configurable workflow engine:
Cross-Service Communication
Services communicate through these integration patterns:
Internal service calls use private key authentication. Each service holds a secret key configured via environment variables (CONSENT_API_PRIVATE_KEY, CRAWLER_SERVICE_PRIVATE_KEY, discProxyRequestPrivateKey). The calling service includes this key in request headers for verification.
Infrastructure
Deployment
All services are containerized with Docker and deployed on Google Kubernetes Engine (GKE). Google Cloud Build handles CI/CD pipelines. Database connections use Cloud SQL Proxy for PostgreSQL.
Storage
Four GCS buckets: published-public-privacy-page, dpdp-public-file-uploads, neostra-attachments, neostra-conversation. AWS S3 bucket published-privacy-page-public in eu-north-1 for EU data sovereignty.
Last updated 1 week ago
Built with Documentation.AI