Atlas project development

gpt_kit

Scripts transform ticket EML exports into a normalized knowledge base with tagging and optional redaction. Make scripts executable, then run them with input EML directory, output KB directory, and optional tag map. Supports agent-only extraction, role filtering, merging prior KB runs, and PII redaction flags.

Type
Component
Lifecycle
Active
Last touched
2025-08-18
Visibility
Public

Purpose

Scripts transform ticket EML exports into a normalized knowledge base with tagging and optional redaction.

Current state

Last touched: 2025-08-18. Functionality and completeness: Core EML-to-KB pipeline exists; documentation and tests are pending.

Next step

Add baseline automated tests to cover critical flows; Add CI pipeline for build/test/lint; Document deployment/runtime environment (or add Dockerfile); Document interfaces (CLI flags, API endpoints, file formats); Add structured logging and basic health checks.

Interfaces

Inputs
  • EML files, tag-map CSV, optional org-domain filters
  • .eml files
  • .csv tag maps
Outputs
  • Markdown KB entries, CSV summary artifacts
  • Markdown knowledge base files
  • CSV artifacts

Reality to Action trace

Reality Ingestion

Contributes in this stage.

Canonical Storage

Contributes in this stage.

Automation Engines

Contributes in this stage.

Human Interfaces

Contributes in this stage.

Operational Adoption

Contributes in this stage.

Core workflow

TBD. Document the 5-10 steps that define the core workflow.

Artifacts

  • Tag map CSV defines keyword-to-tag mapping

Operational notes

Constraints and scars

  • Depends on consistent EML exports and tag-map CSV quality; large corpora can be slow to process.

Reliability posture

Failure modes and safe behavior: Malformed EMLs may be skipped; redaction flags reduce PII exposure. Idempotency / retries / batching behavior: Re-running with merge flags avoids duplicating prior KB entries.

Observability

  • Logs: Script output to stdout/stderr
  • Metrics/health checks: None documented
  • Logs: stdout/stderr from scripts; outputs written to knowledge base folders.

Security and privacy

Sensitive secret material detected in gpt_kit/emails/gpt/jb_email_exemplars.md; ensure it is excluded from docs and CI.

Dependencies

Upstream
  • None (file-based processing)

Ownership

Owners

Josh Barton

Users

Josh Barton (owner)

gpt_kit

Architecture & Major Components

  • High-level diagram (text):

    • Entry/trigger -> core logic -> outputs (details per docs below)
  • Entry points: gpt_kit/ticket_eml_to_kb*.sh, gpt_kit/eml_to_gpt_kit.sh

  • Top-level folders: emails, ticket_kb, tickets

  • Key abstractions: EML parser, tag mapping, KB merge logic, redaction filters

Setup / Build / Run

  • Build system(s): None (shell scripts).
  • Example usage: ./ticket_eml_to_kb.sh -i /path/to/emls -o ./ticket_kb --role all --scrub.