Atlas project production

BOUSD-PADC-Extract

Uses UnixODBC + Microsoft ODBC driver to query Aeries SQL Server. Uses a Google service account to update reporting sheets after each extract. It ingests Configuration files (TOML/YAML/JSON/INI/CONF), CSV files and produces CSV files.

Type
Field Tool
Lifecycle
Maintenance
Last touched
2025-02-10
Visibility
Public

Purpose

Uses UnixODBC + Microsoft ODBC driver to query Aeries SQL Server. Uses a Google service account to update reporting sheets after each extract.

Current state

Last touched: 2025-02-10. Functionality and completeness: Scripted pipeline exists; scheduling/monitoring are not documented.

Next step

Add baseline automated tests to cover critical flows; Add CI pipeline for build/test/lint; Document deployment/runtime environment (or add Dockerfile); Document interfaces (CLI flags, API endpoints, file formats); Add structured logging and basic health checks.

Interfaces

Inputs
  • Configuration files (TOML/YAML/JSON/INI/CONF), CSV files
Outputs
  • CSV files

Reality to Action trace

Reality Ingestion

Contributes in this stage.

Canonical Storage

Contributes in this stage.

Automation Engines

Not in scope.

Human Interfaces

Contributes in this stage.

Operational Adoption

Contributes in this stage.

Core workflow

TBD. Document the 5-10 steps that define the core workflow.

Artifacts

  • CSV columns defined by queries referenced in `conf.d/`; sheet ranges in `run.sh` must match destination tab schemas

Operational notes

Constraints and scars

  • Depends on fixed Google Sheet ranges and SQL output shape; sheet column changes require query/config updates.

Reliability posture

Failure modes and safe behavior: ODBC connection/auth failures, CSV merge errors, or Google API permission errors stop the script and leave prior sheet data intact. Idempotency / retries / batching behavior: Re-runs overwrite the same sheet ranges; no explicit retry/backoff beyond underlying tools.

Observability

  • Logs: stdout/stderr from `run.sh`, `mssql_query_to_csv`, and `google-sheet-download`
  • Metrics/health checks: None documented; rely on exit codes and CSV outputs
  • Logs: stdout/stderr from run.sh and bundled CLI tools; CSV artifacts in data/ and temp/ act as run outputs.

Security and privacy

Attendance/calendar data is processed; protect CSV outputs and logs accordingly. Service account JSON and DB credentials must be stored securely and excluded from git. README includes internal hostnames and service-account identifiers; keep those details private. Sensitive secret material detected in BOUSD-PADC-Extract/priv_key.json; ensure it is excluded from docs and CI.

Dependencies

Upstream
  • Aeries SQL Server via ODBC
  • Google Sheets API via service account

Ownership

Owners

Josh Barton

Users

Josh Barton (owner)

BOUSD-PADC-Extract

Architecture & Major Components

  • High-level diagram (text):

    • Entry/trigger -> core logic -> outputs (details per docs below)
  • Entry points: BOUSD-PADC-Extract/run.sh

  • Top-level folders: conf.d, data, temp, google-sheet-download, mssql_query_to_csv

  • Key abstractions: Config-driven query execution, CSV merge step, sheet range overwrite

Setup / Build / Run

  • Build system(s): Shell scripts plus Rust CLI binaries (Cargo in tool subdirectories).
  • Install UnixODBC and the Microsoft ODBC driver for SQL Server; configure odbcinst.ini and odbc.ini.
  • Provide per-extract TOML configs under conf.d/ (DSN, query, output paths).
  • Place the Google service-account key where the sheet tool expects it (default is priv_key.json).
  • Run ./run.sh to execute the extract and sync pipeline.