Purpose
Uses UnixODBC + Microsoft ODBC driver to query Aeries SQL Server. Uses a Google service account to update reporting sheets after each extract.
Current state
Last touched: 2025-08-18. Functionality and completeness: Scripted pipeline exists; scheduling/monitoring are not documented.
Next step
Add baseline automated tests to cover critical flows; Add CI pipeline for build/test/lint; Document deployment/runtime environment (or add Dockerfile); Document interfaces (CLI flags, API endpoints, file formats); Add structured logging and basic health checks.
Interfaces
Inputs- Configuration files (TOML/YAML/JSON/INI/CONF), Database tables (SQL scripts), CSV files
Outputs- Query results (reports/extracts), CSV files
Reality to Action trace
Reality IngestionContributes in this stage.
Canonical StorageContributes in this stage.
Automation EnginesNot in scope.
Human InterfacesContributes in this stage.
Operational AdoptionContributes in this stage.
Core workflow
TBD. Document the 5-10 steps that define the core workflow.
Artifacts
- CSV columns defined by the SQL in `sql/`; sheet ranges in `run.sh` must match destination tab schemas
Operational notes
Constraints and scars
- Depends on fixed Google Sheet ranges and SQL output shape; sheet column changes require query/config updates.
Reliability posture
Failure modes and safe behavior: ODBC connection/auth failures or Google API permission errors stop the script and leave prior sheet data intact. Idempotency / retries / batching behavior: Re-runs overwrite the same sheet ranges; no explicit retry/backoff beyond underlying tools.
Observability
- Logs: stdout/stderr from `run.sh`, `mssql_to_csv`, and `google-sheet-download`
- Metrics/health checks: None documented; rely on exit codes and CSV outputs
- Logs: stdout/stderr from run.sh and bundled CLI tools; CSV artifacts in data/ and temp/ act as run outputs.
Security and privacy
Student/class size data is processed; protect CSV outputs and logs accordingly. Service account JSON and DB credentials must be stored securely and excluded from git. README includes internal hostnames and service-account identifiers; keep those details private. Sensitive secret material detected in BOUSD-ClassSize-Extract/priv_key.json; ensure it is excluded from docs and CI.
Dependencies
Upstream- Aeries SQL Server via ODBC
- Google Sheets API via service account
Ownership
OwnersJosh Barton
UsersJosh Barton (owner)
Architecture & Major Components
High-level diagram (text):
- Entry/trigger -> core logic -> outputs (details per docs below)
Entry/trigger
→core logic
→outputs
Entry points: BOUSD-ClassSize-Extract/run.sh
Top-level folders: bin, conf.d, data, sql, temp
Key abstractions: Config-driven query execution, sheet range overwrite, per-extract CSV artifacts
Setup / Build / Run
- Build system(s): Shell scripts plus Rust CLI binaries (Cargo in tool subdirectories).
- Install UnixODBC and the Microsoft ODBC driver for SQL Server; configure
odbcinst.ini and odbc.ini. - Provide per-extract TOML configs under
conf.d/ (DSN, query, output paths). - Place the Google service-account key where the sheet tool expects it (default is
priv_key.json). - Run
./run.sh to execute the extract and sync pipeline.