Atlas project production

csv_mapper

CSV Mapper and Transformer is a robust command-line tool written in Rust that allows you to map, transform, and filter CSV files using flexible, user-defined configurations. It is ideal for data migration, cleaning, and integration tasks where source CSV data must be reformatted and filtered before being loaded into another system. Run with a mapping JSON file (required) and optional global config; CLI flags override config values. Mapping supports transformations, cross-references to external CSVs, and rich filtering.

Type
Component
Lifecycle
Active
Last touched
2025-10-28
Visibility
Public

Purpose

CSV Mapper and Transformer is a robust command-line tool written in Rust that allows you to map, transform, and filter CSV files using flexible, user-defined configurations. It is ideal for data migration, cleaning, and integration tasks where source CSV data must be reformatted and filtered before being loaded into another system.

Current state

Last touched: 2025-10-28. Functionality and completeness: Features are documented; tests and CI coverage are still limited.

Next step

Add baseline automated tests to cover critical flows; Add CI pipeline for build/test/lint; Document deployment/runtime environment (or add Dockerfile); Document interfaces (CLI flags, API endpoints, file formats); Add structured logging and basic health checks.

Interfaces

Inputs
  • Mapping JSON, optional app_config JSON, source CSV, optional reference CSVs
  • Configuration files (TOML/YAML/JSON/INI/CONF)
  • CSV files
Outputs
  • CSV files

Reality to Action trace

Reality Ingestion

Contributes in this stage.

Canonical Storage

Not in scope.

Automation Engines

Not in scope.

Human Interfaces

Contributes in this stage.

Operational Adoption

Contributes in this stage.

Core workflow

TBD. Document the 5-10 steps that define the core workflow.

Artifacts

  • Mapping JSON schema defines source/target fields and transformations

Operational notes

Constraints and scars

  • Complex mappings and large reference CSVs can impact runtime; log files help diagnose slow transforms.

Reliability posture

Failure modes and safe behavior: Missing columns or invalid mapping config abort the run; logs describe offending fields. Idempotency / retries / batching behavior: Deterministic output for identical inputs; no retries needed.

Observability

  • Logs: Rust logging framework detected (log/tracing/env_logger).
  • Metrics/health checks: None documented
  • Logs: Console output with optional log file and configurable log level.

Security and privacy

Treat CSV inputs/outputs as sensitive if they contain PII; restrict file access and storage.

Dependencies

Upstream
  • None
  • file-based transforms only

Ownership

Owners

Josh Barton

Users

Josh Barton (owner)

csv_mapper

Architecture & Major Components

  • High-level diagram (text):

    • Entry/trigger -> core logic -> outputs (details per docs below)
  • Mapping config contains mappings (source/target/order/transformations) and optional filter rules.

  • Transformations include lookups, regex lookup, JSON extraction, hashing, and cross-reference to external CSVs.

  • Entry points: src/main.rs

  • Top-level folders: src

  • Key abstractions: Mapping engine, transformation pipeline, filter evaluator, CSV IO with custom quoting

Setup / Build / Run

  • Build system(s): Cargo.
  • Build with Cargo and run the binary with --config and CSV paths.
  • Use --app_config for global defaults (delimiter, log level, output defaults).