Resumable Execution Runbook
Resumable execution ensures long-running ACM workflows can pause, recover, and continue without losing deterministic guarantees.
Responsibilities
- Persist checkpoints after every task boundary with ledger correlation IDs.
- Store checkpoint payloads in durable storage (S3, Blob, GCS) with retention aligned to replay bundles.
- Track resume tokens inside the decision ledger (
TASK_RESUMED,TASK_WAITING).
Recovery flow
- Runtime detects failure or external pause signal.
- Operator inspects ledger entries and selects a checkpoint via CLI (
pnpm --filter @ddse/acm-runtime run resume --from <id>). - Policy and verification hooks re-evaluate before resuming downstream tasks.
- Replay bundles merge original and resumed segments for full audit coverage.
References
framework/node/docs/tdr/RUNBOOK_RESUMABLE.mdframework/node/docs/tdr/RUNBOOK_RESUMABLE.md#decision-ledgerfor event taxonomy