Getting Started

Platform Architecture

Neostra's microservices architecture, service interactions, data flows, and database strategy.

Service Overview

Neostra is composed of independent microservices, each owning its domain. Services communicate via REST APIs and Google Cloud Pub/Sub for event-driven processing.

Service	Port	Language	Database	Purpose
neostra-core	9000	Java/Spring Boot	MongoDB	DSAR, workflows, assessments, privacy centers, breach management
astra-core	9010	Java/Spring Boot	MongoDB	Tenant provisioning, user management, RBAC, subscriptions
cpmp-api-service	9001	Java/Spring Boot	PostgreSQL (read)	Consent REST API, publishes to Pub/Sub
cpmp-log-receipt-service	9002	Java/Spring Boot	PostgreSQL (write)	Pub/Sub consumer, writes consent ledger
cpmp-reporting-service	9003	Java/Spring Boot	PostgreSQL	Analytics, exports, preference center
cpmp-crawler-service	9004	Java/Spring Boot	MongoDB	Selenium-based cookie and script scanning
data-discovery-boot	8080	Java/Spring Boot	PostgreSQL	Scan orchestration, integration management
data-discovery-scanner	Worker	Python 3	PostgreSQL	PII detection using Presidio + regex
cpmp-modal	CDN	Svelte 5/TypeScript	—	Cookie consent banner widget

Database Strategy

Neostra uses a polyglot persistence approach:

Used by neostra-core, astra-core, and cpmp-crawler-service for flexible document storage.

Key collections include: tenants, users, roles, permissions, subject-requests, workflow-instances, assessments, assessment-templates, cookies, collection-points, privacy-centers, integrations, audit-records, and 30+ more.

All documents inherit from BaseDocument with fields: id, createdAt, updatedAt, version (optimistic locking). Multi-tenant isolation is enforced via indexed tenantId on every document.

The consent management system uses an event-driven architecture for reliable, auditable consent processing:

The consent ledger maintains integrity using SHA-256 hash chains. Each entry references the previous entry's hash, creating an immutable, tamper-evident audit trail.

Data Discovery Pipeline

Data discovery uses a two-tier architecture separating orchestration from scanning:

The Python scanner supports multiple data sources: PostgreSQL, MySQL, MongoDB, AWS S3, AWS DynamoDB. PII detection identifies: emails, phone numbers, Aadhaar numbers, PAN cards, IP addresses, UPI IDs, and more.

DSAR Workflow

Data Subject Access Requests flow through a configurable workflow engine:

Cross-Service Communication

Services communicate through these integration patterns:

Internal service calls use private key authentication. Each service holds a secret key configured via environment variables (CONSENT_API_PRIVATE_KEY, CRAWLER_SERVICE_PRIVATE_KEY, discProxyRequestPrivateKey). The calling service includes this key in request headers for verification.

Infrastructure

Deployment

All services are containerized with Docker and deployed on Google Kubernetes Engine (GKE). Google Cloud Build handles CI/CD pipelines. Database connections use Cloud SQL Proxy for PostgreSQL.

Storage

Four GCS buckets: published-public-privacy-page, dpdp-public-file-uploads, neostra-attachments, neostra-conversation. AWS S3 bucket published-privacy-page-public in eu-north-1 for EU data sovereignty.

Was this page helpful?

Last updated 1 week ago

Built with Documentation.AI