Operator guide¶
This is the daily operating map for beampipe-core v2. Use it after installation to understand which process should run, what it owns, and where to look when work stalls.
Operating model¶
beampipe-core is one binary with multiple roles. All roles share PostgreSQL as the durable state store.
operator / API
|
v
+---------------+ +----------------+ +------------------+
| API process | ---> | PostgreSQL | <--- | worker replicas |
| /api/v2 | | configs/jobs | | claim/run jobs |
| auth/uploads | | events/ledger | | TAP/TM/DIM/Slurm |
+-------+-------+ +-------+--------+ +--------+---------+
| ^ ^
| | |
+--------------+-------+------------------------+
|
scheduler role
enqueue ticks
Run exactly one scheduler-enabled process per environment. Scale API processes for HTTP traffic and worker-only processes for queue throughput.
Role contract¶
| Role | Starts with | Owns | Scale rule |
|---|---|---|---|
| API | beampipe serve --worker false |
HTTP API, auth, uploads, readiness, metrics | Scale for request volume |
| Scheduler | beampipe serve --worker true or scheduler-enabled beampipe worker |
Recurring discovery, execution, DIM, and Slurm poll ticks | Run exactly one |
| Worker | BEAMPIPE_WORKER_SCHEDULER_ENABLED=false beampipe worker |
Queue claims, TAP calls, manifests, staging, translation, deployment, polling | Scale horizontally |
| Database | PostgreSQL | Configs, source metadata, jobs, executions, provenance | One logical primary |
Operator workflow¶
| Phase | Action | Command or API |
|---|---|---|
| Bootstrap | Apply migrations | beampipe migrate |
| Bootstrap | Create first operator | beampipe admin create-user --username admin --password ... --email ... |
| Config | Validate survey YAML | beampipe project validate -f config/wallaby_hires.v1.yaml |
| Config | Upload survey YAML | POST /api/v2/project-configs |
| Config | Upload deployment profile | POST /api/v2/deployment-profiles |
| Source load | Register sources | POST /api/v2/sources or POST /api/v2/sources/bulk |
| Discovery | Trigger or schedule discovery | POST /api/v2/sources/discover |
| Execution | Create execution intent | POST /api/v2/executions |
| Execution | Queue execution | POST /api/v2/executions/{id}/execute |
| Monitoring | Check readiness | GET /api/v2/ready |
| Monitoring | Inspect provenance | GET /api/v2/executions/{id}/events |
Use API workflow guide for concrete request examples.
Mock to real backend path¶
Start with mock backends to validate project config, discovery, manifest construction, and dry execution. Move to real backends only after the environment has credentials, TAP reachability, Translator Manager access, and a tested DIM or Slurm deployment profile.
export BEAMPIPE_USE_REAL_BACKENDS=true
export BEAMPIPE_ENV=production
export CASDA_USERNAME=...
export CASDA_PASSWORD_FILE=/run/secrets/casda_password
export SLURM_SSH_PRIVATE_KEY_FILE=/run/secrets/slurm_ssh_key
export SLURM_SSH_KNOWN_HOSTS_SOURCE=/run/slurm-ssh/known_hosts
Run beampipe security check before production startup and beampipe slurm ping --profile <name> before live Slurm submission.
What to watch¶
| Signal | Healthy shape | Where |
|---|---|---|
| Readiness | Database, queue, workers, and dependencies report ready | GET /api/v2/ready |
| Queue depth | Does not grow indefinitely after ticks | readiness payload, metrics |
| Oldest queued job | Stays near expected job time | metrics |
| Discovery metadata | Sources receive metadata and discovery flags | source events |
| Execution ledger | Runs move through stage, translate, submit, poll, terminal state | execution events |
| Backend debug fields | DIM or Slurm fields appear on execution responses | execution response |
First triage¶
| Symptom | First checks | Next page |
|---|---|---|
| API cannot start | DATABASE_URL, migrations, bind address, startup security gates |
Production runbook |
| Login fails | Admin user exists, BEAMPIPE_JWT_SECRET is stable |
First run |
| Discovery stalls | TAP health, source enabled state, queue depth, discovery caps | Workers and scheduling |
| Execution remains pending | Project config active, metadata ready, execution caps | Workers and scheduling |
| Slurm run stalls | Known hosts, SSH key, login node, poll events | Deployment profiles |
| Redoc stale | Export OpenAPI and copy it into docs assets | OpenAPI export |
Next: tune worker capacity in Workers and scheduling.