Atlas project concept

Email Corpus Ingestion Pipeline (EML)

Transform raw email archives into a structured, searchable knowledge system with strict privacy controls. It ingests EML files export; attachment metadata (optional) and produces structured email dataset; search/index layer; assistant-ready knowledge base.

Internal-only entry. Do not publish externally without review.
Type
Experiment
Lifecycle
Idea
Last touched
2024-06-19 (concept surfaced)
Visibility
Internal

Purpose

Transform raw email archives into a structured, searchable knowledge system with strict privacy controls.

Current state

Intent captured; implementation is TBD pending governance decisions.

Next step

Write the data governance spec before any technical implementation.

Interfaces

Inputs
  • EML files export
  • attachment metadata (optional)
Outputs
  • structured email dataset
  • search/index layer
  • assistant-ready knowledge base

Reality to Action trace

Reality Ingestion

Contributes in this stage.

Canonical Storage

Contributes in this stage.

Automation Engines

Contributes in this stage.

Human Interfaces

Not in scope.

Operational Adoption

Not in scope.

Core workflow

TBD. Document the 5-10 steps that define the core workflow.

Operational notes

Reliability posture

Must handle messy formats with dedupe and threading.

Observability

  • ingest stats
  • parse error logs

Security and privacy

Extreme; requires redaction, minimization, and access control.

Dependencies

Upstream
  • email exports
  • data governance decisions
Downstream
  • custom assistants
  • search experiences

Ownership

Owners

Josh Barton

Users

you, Josh Barton (owner)