Error Handling & Retry Logic - Best Practices for State Machines

Handle errors gracefully using onResponse callbacks (sync or async) and context-based retry.

Error delivery

Errors are delivered immediately via onResponse and also cause stimulation.waitUntilComplete() to reject if any response listener (local or global) throws or rejects:

const stimulation = cns.stimulate(signal, {
  onResponse: async (response) => {
    if (response.error) {
      await errorsRepo.store({
        signal:
          response.outputSignal?.collateral || response.inputSignal?.collateral,
        error: String(response.error),
      });
      
      if (response.error instanceof ValidationError) {
        handleValidationError(response.error);
      }
    }
  }
});
await stimulation.waitUntilComplete();

If you do not await stimulation.waitUntilComplete(), the run still proceeds, but rejections from listeners won't be observed by the caller.

Error recovery with context

Save context for retry on failure:

let savedContext: ICNSStimulationContextStore | undefined;

const stimulation = cns.stimulate(signal, {
  onResponse: (response) => {
    if (response.error) {
      savedContext = response.stimulation.getContext(); // save for retry
    }
  }
});
await stimulation.waitUntilComplete();

// Retry with preserved context
if (savedContext) {
  const retryStimulation = cns.stimulate(retrySignal, { ctx: savedContext });
  await retryStimulation.waitUntilComplete();
}

Retry with backoff (self-loop)

Use a self-looping neuron with context to track per-neuron per-stimulation retry attempts (metadata), while business data flows through payloads:

import { withCtx, collateral } from '@cnstra/core';

const tryTask = collateral<{ taskId: string }>();
const completed = collateral<{ taskId: string }>();
const failed = collateral<{ taskId: string; reason: string }>();

const taskRunner = withCtx<{ attempt: number }>()
  .neuron({ tryTask, completed, failed })
  .dendrite({
    collateral: tryTask,
    response: async (payload, axon, ctx) => {
      // Context stores per-neuron per-stimulation metadata (retry attempts)
      const prev = ctx.get() ?? { attempt: 0 };
      const attempt = prev.attempt + 1;
      ctx.set({ attempt });

      try {
        // Business data (taskId) comes from payload
        await performTask(payload.taskId);
        return axon.completed.createSignal({ taskId: payload.taskId });
      } catch (err) {
        if (attempt < 5) {
          const backoff = Math.pow(2, attempt) * 100; // exponential backoff
          await new Promise(resolve => setTimeout(resolve, backoff));
          // Business data flows through payload, not context
          return axon.tryTask.createSignal(payload); // self-loop retry
        }
        return axon.failed.createSignal({ taskId: payload.taskId, reason: String(err) });
      }
    },
  });

Key point: Context stores per-neuron per-stimulation metadata (attempt count), while business data (taskId) flows through signal payloads.

Tips

Use onResponse for real-time error logging/monitoring; make it async if you need to persist.
Store minimal retry state in context (attempt count, correlation IDs).
For long-lived sagas, persist context to a DB/OIMDB and re-stimulate on external triggers.
Always set a max retry limit to avoid infinite loops.

Global listeners

Global listeners registered via addResponseListener run for every stimulation alongside the local onResponse. They also can be async; failures in any listener reject the stimulation.waitUntilComplete() Promise.

Best practices

Timeouts: wrap external I/O in timeouts inside dendrites and async onResponse to avoid hanging runs.
Idempotency: design onResponse persistence to be idempotent (e.g., upserts, unique keys) so retries are safe.
Retry policy: prefer bounded retries with exponential backoff; use context to track attempts; avoid hot loops.
Partial failure: emit explicit failure signals from dendrites when business errors occur; reserve thrown errors for exceptional cases.
Observability: tag your own run/correlation id (if you have one) and collaterals in logs/metrics; capture queueLength to identify bottlenecks.
Isolation: keep onResponse lightweight; move heavy processing to dedicated neurons/signals when possible.
Concurrency: if persisting from onResponse, consider batching or a queue to smooth spikes in traffic.
Ordering: if ordering matters, include sequence numbers in payloads or serialize writes per run/correlation id you provide.
Durability: when persisting context for retries, write before emitting downstream effects; verify on restart.

Error delivery​

Error recovery with context​

Retry with backoff (self-loop)​

Tips​

Global listeners​

Best practices​