Related Atlas entry
Purpose
Provide a single, repeatable operational model for running district integrations with consistent auth, job execution, logging, and audit. This playbook documents how to operate the platform as it evolves from prototype to a stable integration kernel.
When to use this playbook
- Before running or re-running connector jobs in production.
- When onboarding a new integration connector or data contract.
- When triaging failed runs, retries, or audit log gaps.
Signals to stop or escalate
- Diff outputs exceed expected thresholds or include mass removals.
- Auth or credential failures block access to source or vendor systems.
- Schema drift or mapping mismatches invalidate the run.
Current maturity
- Platform status: prototyping.
- Many operational steps are provisional and should be confirmed as the kernel solidifies.
Audience and access
- Primary operators: integration owner and platform maintainer.
- Secondary reviewers: IT leadership and future maintainers.
- Required access: platform admin credentials, database access (TBD), connector credentials (per integration).
Platform kernel goals (what must exist before production)
- Job runner with retry, scheduling, and status reporting.
- Connector interface contract (inputs, outputs, error handling).
- Audit log model with immutable run summaries.
- Admin UI for visibility into runs and errors.
Patterns in use
Connector lifecycle (standard operating pattern)
- Define the data contract
- Source fields, transformation rules, and target schema.
- Mapping tables for IDs and vendor-specific values.
- Implement connector logic
- Extract source data, normalize, and compute diff.
- Apply changes with idempotent safeguards.
- Configure credentials and secrets
- Store vendor tokens securely (TBD: secret storage approach).
- Run a dry-run / staging mode
- Validate diffs without applying.
- Execute production runs
- Monitor logs and audit summaries.
- Review outcomes and tune
- Adjust mapping rules and retry policies based on failure data.
Operational workflow (current and near-term)
- Trigger runs manually via the admin UI (TBD) or CLI (TBD).
- Validate job status, error counts, and run duration.
- Collect audit reports and attach to integration records.
- Report notable changes and failures in the decision log.
Inputs
- SIS/HR source data feeds.
- Vendor API endpoints and credentials.
- Mapping tables and normalization rules.
Outputs
- Vendor state changes.
- Audit logs and run summaries.
- Admin UI visibility and job status data.
Monitoring and observability
- Required: per-run summary counts (planned/applied/failed).
- Required: error rate and top failure categories.
- Recommended: time per phase and job latency.
Failure modes and recovery
- Schema drift: block apply and surface validation errors.
- Vendor API throttling: apply backoff and retry policies.
- Partial runs: enable checkpointing and resume support.
Security and privacy
- Centralizes PII processing; enforce least privilege per connector.
- Document data retention and redaction standards (TBD).
- Require explicit access reviews for platform admins.
Change management
- Treat connector changes as versioned releases.
- Add runbook updates alongside connector updates.
- Log architecture decisions in the Decision Log.
Open questions and TBD items
- Job runner tech and scheduling strategy.
- Secret storage and credential rotation process.
- Staging vs production environment separation.
- Minimum viable admin UI for operators.