Skip to main content

Data Protocol Reference

The Data Protocol models warehouse tables, streams, and files with governance metadata and lifecycle automation hooks.

API Surface

import { createDataProtocol, createDataCatalog, registerValidator } from '@cpms/data';
MethodDescription
createDataProtocol(manifest)Normalizes a manifest and returns an immutable protocol instance.
protocol.validate(names?)Runs validator registry (defaults to all validators).
protocol.match(expr)Tiny query language for subsets (schema.fields:=:email, lineage.consumers:contains:external).
protocol.diff(next)Structural + semantic diff; identifies adds/removes/changes.
protocol.generateMigration(next)Produces SQL-style migration hints.
protocol.generateDocs()Emits markdown reference docs.
protocol.generateSchema()Emits JSON schema friendly to Monaco and tooling.
protocol.set(path, value)Returns a new protocol with the update applied (immutably).

Manifest Fields

  • dataset – Identity, type, lifecycle state, owners, tags.
  • schema – Field dictionary plus primary_key, keys.unique, keys.partition.
  • lineagesources + consumers with URN references.
  • operations – Refresh cadence, expected-by SLAs, retention windows.
  • governance – Policy classification, legal basis, residency.
  • quality – Freshness timestamp, row-count estimates, null-rate instrumentation.

Built-In Validators

NamePurpose
core.shapeEnsures dataset name + schema fields exist and lifecycle status is valid.
schema.keysValidates that declared primary keys exist within fields.
governance.pii_policyWarns when PII fields are missing PII classification or encryption.
operations.refreshValidates refresh schedule enumerations.

Extend the registry with registerValidator('team.rule', fn) to enforce domain-specific checks.

Catalog Helpers

const catalog = createDataCatalog(manifests.map(createDataProtocol));
const governanceAlerts = catalog.piiEgressWarnings();
const cycles = catalog.detectCycles();

detectCycles() surfaces lineage loops, while piiEgressWarnings() highlights when PII flows to external consumers.

Example Manifest

datasets/user_events.json
{
"protocol": "data",
"dataset": {
"name": "user_events",
"type": "fact-table",
"lifecycle": { "status": "active" }
},
"schema": {
"primary_key": "event_id",
"fields": {
"event_id": { "type": "string", "required": true },
"user_id": { "type": "string", "required": true },
"email": { "type": "string", "pii": true }
},
"keys": {
"partition": { "field": "event_date", "type": "daily" }
}
},
"governance": {
"policy": { "classification": "pii", "legal_basis": "gdpr" },
"storage_residency": { "region": "eu-west-1", "encrypted_at_rest": true }
}
}

Try this manifest in the playground to see the validator output inline.