Compliance Documents, Profiles & Questionnaire Intake — Design
_Status: Draft / In Review_ • _Author: Shielda Platform_ • _Last updated: 2026-04-19_
Status: Draft / In Review • Author: Shielda Platform • Last updated: 2026-04-19
Sibling spec to COMPLIANCE.md. This doc is the authoritative plan for the compliance-docs expansion; it extends — it does not replace — the existing complianceDocuments pipeline (templates, collector, generator, exporter, clarification chat) already shipped in control-plane/src/lib/compliance-docs.
---
Goals
Goal Hard constraint G1 Generate compliance documents — automatic for named certifications (SOC 2, ISO 27001, HIPAA, GDPR, PCI-DSS, NIS 2) or manual on request. All LLM execution runs inside the customer's Go agent (BYOK). G2 Users can upload existing compliance documents (PDF / DOCX / MD / TXT) and we store them. Bytes live on user-side storage (storageConfigs); control plane stores metadata only. G3 Org- and feature-level profiles become first-class resources (like integrations). Users can review, edit, or request auto-discovery. Every field carries provenance (agent / user / LLM / integration / scan) and is versioned. G4 When data is missing, ask the user; every answer is persisted into the correct profile (not buried in chat). Questions are deduped by profileField so we don't ask twice. G5 Users can upload or paste questionnaires; the platform answers what it can and asks for the rest. Answers are plain text, stored server-side. Whitelisted file formats; reply is not an auto-generated doc unless explicitly requested. G6 Scheduled runs for compliance docs / questionnaires / profile auto-fill to batch LLM calls and save BYOK tokens. Hard per-schedule + per-org monthly token caps; off-peak windows. G7 All information + operations executed on the user side. The control plane dashboard only displays. LLM imports are restricted to the agent relay (CI-enforced).
---
What Already Exists (Build-on List)
Surface Location 22 document templates (individual / package / vendor, incl. caiq-response, sig-lite-response, custom-vendor-qa) control-plane/src/lib/compliance-docs/templates Section data collector (10 data sources) control-plane/src/lib/compliance-docs/collector.ts Doc generator + exporter (markdown / html / json) control-plane/src/lib/compliance-docs/generator.ts complianceDocuments table (sections + approval flow) control-plane/src/db/schema/tables.tsL2591 Per-framework evaluators (SOC 2, ISO 27001, HIPAA, GDPR, PCI-DSS, NIS 2) control-plane/src/lib/compliance/-evaluator.ts Agent → control-plane HTTP relay pattern (Counselor) agent/internal/grpc/httprelay.go, control-plane/src/lib/agent-relay.ts Scan-task dispatch via heartbeat (scantasks table, agent pickup) /api/agents/heartbeat Pre-signed storage via storageConfigs control-plane/src/db/schema/tables.tsL482 Cron dispatch pattern control-plane/src/app/api/schedules Clarification chat (in-memory session) [control-plane/src/app/api/compliance-docs/[id]/chat](../control-plane/src/app/api/compliance-docs/%5Bid%5D/chat)
---
Gap Analysis
Need Gap G1 Today callLLM() runs in the control plane. Needs routing through the agent relay (same pattern as Counselor). G2 No upload endpoint; no origin='uploaded' path on complianceDocuments; no link from a doc row to storageConfigs. G3 Only users.onboardingProfile (free-form JSONB) and organizations.settings.industrycompanySizeregion exist. There is no structured, reviewable, versioned org / feature profile. G4 Clarification is a chat session. Answers vanish after the session closes; we ask the same question when generating the next doc. G5 Questionnaire templates exist as narrative sections, not row-level answers. No file parser, no per-row answer model, no "needs user" inbox, no export-back path. G6 No schedule type for docs / questionnaires; no token budget; no batching / model-tier awareness. G7 callLLM() is reachable from anywhere in the control plane today — no lint gate.
---
Architecture (End State)
---
Data Model
All schema changes are additive. Follow the existing Drizzle conventions in tables.ts; relations go into relations.ts; migration file named drizzle/0024complianceprofilesandintake.sql.
5.1 orgprofiles — one row per org
Column Type Notes id uuid PK orgid uuid → organizations (unique) Tenancy scope version int Bumped on each snapshot replace snapshot jsonb { industry, regionsOfOperation, employeeCount, revenueBand, regulatedData[], customerGeos[], certificationTargets[], executiveContacts[] } status text draft \ active \ stale lastautodiscoveredat timestamp lasteditedby uuid → users lasteditedat timestamp createdat / updatedat timestamp updatedAt via trigger (existing 0015updatedattrigger.sql)
5.2 featureprofiles — one row per (org, service?, domain)
Domain is the compliance lens: access-control, data-handling, logging-monitoring, incident-response, network-security, third-party-risk, business-continuity, change-management, physical-security, vendor-assessment. Null serviceid means org-level.
Column Type Notes id uuid PK orgid uuid → organizations serviceid uuid → services nullable domain text Enum above snapshot jsonb Schema per domain (registry in lib/profiles/domains.ts) source text auto \ user \ hybrid confidence real Aggregate 0..1 status text draft \ confirmed \ stale discoveredfromscanid uuid → scans nullable Provenance for auto-fill createdat / updatedat timestamp unique (orgid, serviceid, domain)
5.3 profilefields — per-field provenance & history
Gives us diff, restore, and source attribution. Every write to a profile snapshot also writes a row here.
Column Type Notes id uuid PK orgid uuid profiletype text org \ feature profileid uuid Polymorphic (FK enforced by trigger + app code) key text Dotted path, e.g. accessControl.mfaEnforced value jsonb valuetype text string, number, boolean, enum, list, object source text agent \ user \ llm \ integration \ scan evidenceref jsonb { kind, id, excerpt?, url? } confidence real confirmedby uuid → users nullable confirmedat timestamp nullable supersededby uuid self-ref nullable Chain: current - null, older rows point forward createdat timestamp
5.4 profilequestions — dedup'd question inbox
Column Type Notes id uuid PK orgid uuid origintype text doc \ questionnaire \ autofill originid uuid nullable docId / intakeId profilefield text Dotted path — dedup key questiontext text options jsonb nullable { choices:[…], allowFreeText:bool } severity text blocking \ optional answer jsonb nullable answeredby uuid → users nullable answeredat timestamp nullable status text open \ answered \ dismissed \ expired expiresat timestamp nullable createdat timestamp unique (orgid, profilefield) where status='open' Partial unique
5.5 compliancedocuments — extend existing
Additive columns:
origin ∈ generated \ uploaded \ imported sourcefile = { objectKey, contentType, bytes, checksumSha256 }
5.6 questionnaireintakes & questionnaireitems
questionnaireitems id, intakeid, rowindex, section?, controlid?, questiontext, expectedformat ('yesno''text''evidencelink''rating'), answertext?, answerevidence jsonb?, -- [{kind,id,excerpt}] confidence real, needsuser bool, answeredat?, answeredby ('agent''user:<uuid'), profilefieldrefs text[], status ('pending''answered''skipped''needsuser')
5.7 complianceschedules
---
Agent Execution Model
All LLM calls for this feature must execute inside the Go agent to satisfy G7.
6.1 New agent packages
All parsers run on the agent — the control plane never sees raw uploads. Bytes land in storageConfigs (user-side); only normalized rows / metadata return to the control plane.
6.2 gRPC additions
In proto/shielda/v1/agent.proto:
HTTP relay fallback mirrors the Counselor pattern for customers without gRPC connectivity.