Aether Integrator Platform Playbook

Aether Integrator Platform

Purpose

Provide a single, repeatable operational model for running district integrations with consistent auth, job execution, logging, and audit. This playbook documents how to operate the platform as it evolves from prototype to a stable integration kernel.

When to use this playbook

Before running or re-running connector jobs in production.
When onboarding a new integration connector or data contract.
When triaging failed runs, retries, or audit log gaps.

Signals to stop or escalate

Diff outputs exceed expected thresholds or include mass removals.
Auth or credential failures block access to source or vendor systems.
Schema drift or mapping mismatches invalidate the run.

Current maturity

Platform status: prototyping.
Many operational steps are provisional and should be confirmed as the kernel solidifies.

Audience and access

Primary operators: integration owner and platform maintainer.
Secondary reviewers: IT leadership and future maintainers.
Required access: platform admin credentials, database access (TBD), connector credentials (per integration).

Platform kernel goals (what must exist before production)

Job runner with retry, scheduling, and status reporting.
Connector interface contract (inputs, outputs, error handling).
Audit log model with immutable run summaries.
Admin UI for visibility into runs and errors.

Patterns in use

Connector lifecycle (standard operating pattern)

Define the data contract
- Source fields, transformation rules, and target schema.
- Mapping tables for IDs and vendor-specific values.
Implement connector logic
- Extract source data, normalize, and compute diff.
- Apply changes with idempotent safeguards.
Configure credentials and secrets
- Store vendor tokens securely (TBD: secret storage approach).
Run a dry-run / staging mode
- Validate diffs without applying.
Execute production runs
- Monitor logs and audit summaries.
Review outcomes and tune
- Adjust mapping rules and retry policies based on failure data.

Operational workflow (current and near-term)

Trigger runs manually via the admin UI (TBD) or CLI (TBD).
Validate job status, error counts, and run duration.
Collect audit reports and attach to integration records.
Report notable changes and failures in the decision log.

Inputs

SIS/HR source data feeds.
Vendor API endpoints and credentials.
Mapping tables and normalization rules.

Outputs

Vendor state changes.
Audit logs and run summaries.
Admin UI visibility and job status data.

Monitoring and observability

Required: per-run summary counts (planned/applied/failed).
Required: error rate and top failure categories.
Recommended: time per phase and job latency.

Failure modes and recovery

Schema drift: block apply and surface validation errors.
Vendor API throttling: apply backoff and retry policies.
Partial runs: enable checkpointing and resume support.

Security and privacy

Centralizes PII processing; enforce least privilege per connector.
Document data retention and redaction standards (TBD).
Require explicit access reviews for platform admins.

Change management

Treat connector changes as versioned releases.
Add runbook updates alongside connector updates.
Log architecture decisions in the Decision Log.

Open questions and TBD items

Job runner tech and scheduling strategy.
Secret storage and credential rotation process.
Staging vs production environment separation.
Minimum viable admin UI for operators.