Performance Optimization & Memory Management for State Machines

CNStra is designed to be memory-efficient and fast for reactive orchestration.

Memory-efficient design

Zero dependencies: No third-party packages, minimal bundle size.
No error storage: Errors are delivered via callbacks, not accumulated in memory.
Streaming responses: Signal traces are delivered via onResponse callbacks, not buffered.
Context on-demand: Context stores are created only when needed via withCtx().
No global state: Each stimulation starts with a clean slate; no ambient listeners.

Memory overhead per stimulation

Each active CNSStimulation instance carries some internal overhead, but most of it is allocated lazily — a minimal synchronous fire-and-forget run (no listener, no context, never awaited) allocates only the object itself plus a small ring-buffer array.

Base overhead (minimal context, ~2-3 keys)

CNSStimulation object: ~0.6-1.5 KB depending on what's actually used
- Activation queue: a small ring-buffer array (lazily allocated on first enqueue); the active-tasks Set is only created while an async task is in flight
- Context store (Map): lazily created — only when a neuron reads/writes ctx
- Pending/failed task tracking: lazily created — only on async onResponse or an actual failure
- Completion Promise: created only if you call waitUntilComplete()
Active tasks in queue: ~100-150 bytes per task (metadata only)
- ⚠️ Important: Each task stores the full input signal with payload
Task structure: { neuron, dendriteCollateral, input: { collateral, payload } }
- Typical stimulation has 5-10 tasks simultaneously: ~500-1500 bytes (metadata) + payload size
Context data: ~100-200 bytes (for minimal context with 2-3 simple values)

Total per stimulation (minimal context, small payloads): ~1.8-3.2 KB

Queue size and payload impact

Critical: The activation queue stores complete signal payloads in memory. Queue size directly multiplies payload memory usage.

Per-task memory breakdown:

Task metadata (object references): ~100-150 bytes
Signal payload: variable, can be any size (from 0 bytes to MBs)

Queue memory = (task metadata × queue length) + (payload size × queue length)

Examples:

Queue Length	Small Payloads (100 bytes)	Medium Payloads (1 KB)	Large Payloads (10 KB)	Very Large Payloads (100 KB)
10 tasks	~2.5 KB	~11.5 KB	~101.5 KB	~1 MB
100 tasks	~25 KB	~115 KB	~1 MB	~10 MB
1,000 tasks	~250 KB	~1.15 MB	~10 MB	~100 MB
10,000 tasks	~2.5 MB	~11.5 MB	~100 MB	~1 GB

At scale (1,000 concurrent stimulations):

Queue Length per Stimulation	Small Payloads	Medium Payloads	Large Payloads	Very Large Payloads
10 tasks	~2.5 MB	~11.5 MB	~100 MB	~1 GB
100 tasks	~25 MB	~115 MB	~1 GB	~10 GB
1,000 tasks	~250 MB	~1.15 GB	~10 GB	~100 GB

⚠️ Memory warning: If your payloads are large (e.g., full documents, images, large JSON objects) and queues grow (e.g., due to slow processing or high concurrency), memory usage can explode quickly.

How much can you queue before hitting 1 GB?

CNStra's queue is purely in-memory — a ring buffer that holds every pending task until a dendrite processes it. There is no spilling to disk, no backpressure from the queue itself. All pending work lives in RAM.

Each queued task occupies roughly:

~200 bytes (task metadata + ring buffer slot overhead) + payload size

From that, the maximum number of tasks you can hold in 1 GB of RAM:

Payload size	Max tasks in 1 GB	Typical scenario
~0 bytes (IDs only)	~5,000,000	Event IDs, simple triggers
~100 bytes (small record)	~3,300,000	Minimal user events
~1 KB (typical entity)	~830,000	Order, session, compact DTO
~10 KB (medium document)	~97,000	Product page, large form submission
~100 KB (large document)	~9,800	Full report, paginated result set
~1 MB (binary / large JSON)	~1,000	Image metadata, export chunk

These numbers are per-process totals across all concurrent stimulations. If you have 1,000 stimulations running in parallel and each queues 100 tasks with 10 KB payloads, that's already 100,000 tasks × 10 KB ≈ 1 GB consumed.

When not to use CNStra's queue for work items

The in-memory queue is designed for short fan-outs within a single business flow — a stimulation that spawns a handful of parallel tasks, each completing quickly. It is not a job queue.

Stop and reach for an external queue (BullMQ, SQS, RabbitMQ) when:

You don't control queue depth — work arrives faster than dendrites can drain it (e.g., webhook bursts, large imports)
The total item count is unbounded — processing "all users", "all orders", "all rows in a CSV" fans out into a queue that grows linearly with data size
Payloads are large — passing full documents or large objects through signals; at 100 KB per payload you hit 1 GB at fewer than 10,000 queued tasks
The process might restart — everything in the ring buffer is lost on crash; if durability matters, use an external queue
You need visibility and retries — external queues give you dead-letter queues, retry policies, and dashboards out of the box

The correct pattern for large workloads: let the external queue control depth, and have each worker create exactly one CNStra stimulation per job:

// ✅ External queue controls how many items are in-flight at once
new Worker('jobs', async (job) => {
  // One stimulation per job — CNStra queue depth stays tiny
  const stimulation = cns.stimulate(myCollateral.createSignal(job.data));
  await stimulation.waitUntilComplete();
});

This way CNStra's queue never holds more than the tasks for a single job, regardless of how many jobs are waiting.

Memory usage at scale

Concurrent Stimulations	Minimal Context (2-3 keys)	Growing Context (10 keys, objects)
1,000	~1.8-3.2 MB	~2.2-5 MB
10,000	~18-32 MB	~22-50 MB
1,000,000	~1.8-3.2 GB	~2.2-5 GB

Context size impact

Context growth significantly impacts memory usage:

Minimal context (2-3 keys, primitives): +100-200 bytes per stimulation
Small context (5-10 keys, primitives): +300-500 bytes per stimulation
Medium context (10-20 keys, small objects): +500-1500 bytes per stimulation
Large context (20+ keys, complex objects): +1.5-5 KB per stimulation

Example: If you store full user objects (5-10 KB each) in context instead of just IDs:

1,000 stimulations: +5-10 MB → ~7-15 MB total
10,000 stimulations: +50-100 MB → ~70-130 MB total
1,000,000 stimulations: +5-10 GB → ~7-15 GB total

Best practices for memory efficiency

Keep payloads small - this is the most critical factor:

// ✅ Good: small payload (~50 bytes)
return axon.output.createSignal({ userId: '123', action: 'created' });

// ❌ Bad: large payload (~50 KB+)
return axon.output.createSignal({ 
  user: fullUserObject, 
  history: largeArray, 
  metadata: hugeObject 
});

Use references instead of full data in signals:

// ✅ Good: pass only ID, fetch data when needed
return axon.process.createSignal({ documentId: 'doc-123' });

// ❌ Bad: pass entire document
return axon.process.createSignal({ document: fullDocumentObject });

Context stores per-neuron per-stimulation metadata only (retry attempts, debounce state), not business data:

// ✅ Good: Context stores metadata (~50 bytes)
ctx.set({ attempt: 2, startTime: Date.now() });

// ❌ Bad: Don't store business data in context
// Business data should flow through signal payloads
ctx.set({ user: fullUserObject, history: lotsOfData });

Monitor and limit queue size to prevent memory bloat. queueLength counts every activation the stimulation still owns — both waiting and in flight — so it is the right backpressure signal; split it when you need to know why it is high:
```
onResponse: (r) => {
  if (r.queueLength > 1000) {
    // Queue is growing - consider:
    // - Reducing concurrency
    // - Adding backpressure
    // - Investigating slow processing
    console.warn(
      `Queue ${r.queueLength} (${r.pendingActivations} waiting, ` +
      `${r.activeActivations} in flight)`
    );
  }
}
```
A high pendingActivations with a low activeActivations means the graph is producing work faster than the concurrency limit lets it start. The reverse means slow neuron bodies.

Set reasonable concurrency limits to prevent queue buildup:

// Limit concurrent operations to match your processing capacity
const stimulation = cns.stimulate(signal, {
  concurrency: 10 // Prevents queue from growing unbounded
});

Avoid storing large arrays or nested objects in context. Use external storage (DB, cache) and reference by ID.
Clean up completed stimulations promptly if you're tracking them externally. The stimulation object is garbage-collected when no longer referenced.

For large payloads, consider streaming or chunking:

// Instead of one large signal, split into smaller chunks
const chunks = splitIntoChunks(largeData, 1000);
return chunks.map(chunk => axon.process.createSignal({ chunk, index }));

Use external queue systems to control memory load:

⚠️ Critical: For production systems processing high volumes, use external queue systems (BullMQ, RabbitMQ, AWS SQS) to control memory usage instead of creating thousands of stimulations in memory.

// ✅ Good: Use external queue to control load
import { Queue, Worker } from 'bullmq';

const queue = new Queue('jobs', {
  limiter: { max: 100, duration: 1000 } // Rate limit
});

new Worker('jobs', async (job) => {
  // Process one job at a time, controlling memory
  const stimulation = cns.stimulate(
    myCollateral.createSignal(job.data),
    { concurrency: 10 }
  );
  await stimulation.waitUntilComplete();
});

// Enqueue work externally - doesn't consume memory until processed
await queue.add('process', { userId: '123' });

Benefits:

Work is persisted externally, not in memory
Rate limiting and backpressure handled by queue system
Survives process restarts
Better observability and retry mechanisms

See Integrations for examples.

Context is per-neuron per-stimulation and automatically cleaned up:

Each neuron in each stimulation has its own context instance. Contexts hold memory for the entire duration of a stimulation and are automatically cleaned up when the stimulation completes. Use context for metadata only, not business data:

      // Context stores per-neuron per-stimulation metadata (processing stats)
const processor = withCtx<{ processedCount: number; startTime: number }>()
  .neuron({ next })
  .dendrite({
    collateral: input,
    response: async (payload, axon, ctx) => {
      // Context stores per-neuron per-stimulation metadata
      const metadata = ctx.get() ?? { processedCount: 0, startTime: Date.now() };
      ctx.set({ ...metadata, processedCount: metadata.processedCount + 1 });
      
      // Business data flows through payloads
      const batch = await fetchBatch(payload.batchId);
      const results = await processBatch(batch);
      
      return axon.next.createSignal({ 
        batchId: payload.nextBatchId,
        results // Business data in payload
      });
    }
  });

Best practice: Context is automatically cleaned up when stimulation completes. Store only metadata in context, pass business data through signal payloads.

Use batch processing with recursive self-calls (instead of fan-out):

❌ Bad: Creating 10,000 signals with large payloads floods memory:

// This creates 10,000 tasks in queue, each with full payload
const items = await db.fetchAll(10000);
return items.map(item => 
  axon.process.createSignal({ fullItem: item }) // 10KB each = 100MB in queue!
);

✅ Good: Process in batches with recursive self-calls. Pass offset through payload, not context:

const BATCH_SIZE = 20;

const batchProcessor = neuron({ processBatch, nextBatch })
  .dendrite({
    collateral: processBatch,
    response: async (payload, axon) => {
      // Offset comes from payload, not context
      const offset = payload.offset ?? 0;
      
      // Fetch only one batch from DB
      const batch = await db.fetchBatch(offset, BATCH_SIZE);
      
      // Process this batch (small memory footprint)
      await processItems(batch);
      
      // If more items exist, recursively call self with next offset in payload
      if (batch.length === BATCH_SIZE) {
        // Recursive self-call - pass offset through payload
        return axon.nextBatch.createSignal({ 
          offset: offset + BATCH_SIZE 
        });
      }
      
      // Done
      ctx.delete('offset');
      return undefined;
    }
  });

// Start processing
cns.stimulate(processBatch.createSignal({ offset: 0 }));

Benefits:

Only one batch (20 items) in memory at a time
Queue length stays at 1-2 tasks instead of 10,000
Memory usage: ~200KB instead of ~100MB
Natural backpressure: next batch only starts after current completes
Works perfectly with per-neuron concurrency limits

Pattern: Fetch → Process → Recurse (if more) → Cleanup

Performance characteristics

Sync-first: Synchronous neuron chains execute in a single tick without extra Promise overhead.
Minimal async overhead: Async responses only schedule a microtask; not inherently slower. Promises are created only when a neuron returns an async result.
Stack-safe: Deep chains are handled via an internal queue, avoiding stack overflow.
Bounded execution: maxNeuronHops prevents runaway processing in cyclic graphs.

Best practices

Keep context data minimal

Store only essential data (IDs, counters, flags) in context. Avoid large objects or full entities.

// ✅ Good: minimal context
ctx.set({ userId: '123', attempt: 2 });

// ❌ Bad: bloated context
ctx.set({ user: fullUserObject, history: lotsOfData });

Use synchronous responses when possible

If a neuron doesn't perform I/O, return the next signal synchronously:

// ✅ Sync response (fast)
.dendrite({
  collateral: input,
  response: (p, axon) => axon.output.createSignal({ value: p.value * 2 })
});

// ⚠️ Async response (schedules a microtask; use when doing I/O)
.dendrite({
  collateral: input,
  response: async (p, axon) => {
    const result = await fetch('/api');
    return axon.output.createSignal(result);
  }
});

Set reasonable `maxNeuronHops`

Default: undefined (disabled). If you need a safety cap for cyclic graphs, set a lower limit:

const stimulation = cns.stimulate(signal, {
  maxNeuronHops: 10 // stop after 10 hops (optional, disabled by default)
});
await stimulation.waitUntilComplete();

Implement proper error handling

Use onResponse to log errors without blocking the flow:

const stimulation = cns.stimulate(signal, {
  onResponse: (r) => {
    if (r.error) logger.error(r.error);
    if (r.queueLength === 0) logger.info('done'); // terminal response
  }
});
await stimulation.waitUntilComplete();

Avoid `autoCleanupContexts` in production

The CNS autoCleanupContexts option adds overhead:

Initialization cost: the SCC (Strongly Connected Components) analysis is built lazily on first use — so it is only paid when this option is enabled (or you read the SCC APIs), not on every new CNS(). Building it is O(V + E) for the graph/SCC pass plus the SCC-ancestor transitive closure (up to O(V²) for densely connected graphs).
O(1 + A) runtime cost per cleanup check (where A = number of SCC ancestors)
Memory overhead for storing SCC graphs and ancestor relationships

Use only when:

Memory leaks are a critical issue
You have a small to medium-sized neuron graph (< 1000 neurons)
Performance is less critical than memory management

For production systems, prefer manual context cleanup or custom cleanup strategies.

Measuring performance

Use onResponse to track signal flow timing:

const start = Date.now();
const stimulation = cns.stimulate(signal, {
  onResponse: (r) => {
    if (r.queueLength === 0) {
      console.log(`Completed in ${Date.now() - start}ms, ${r.hops} hops`);
    }
  }
});
await stimulation.waitUntilComplete();

To measure the synchronous batches inside a run rather than the run as a whole, use onDrain — it fires once per synchronous turn, so counting drains tells you how many times the stimulation had to yield to the event loop:

let turns = 0;
const stimulation = cns.stimulate(signal, {
  onDrain: (d) => {
    turns++;
    if (d.queueLength === 0) {
      console.log(`Completed in ${Date.now() - start}ms across ${turns} turns`);
    }
  }
});

Or integrate with your APM/tracing tool (e.g., OpenTelemetry):

const stimulation = cns.stimulate(signal, {
  onResponse: (r) => {
    span.addEvent('neuron', { collateral: r.outputSignal?.collateral });
    if (r.error) span.recordException(r.error);
  }
});
await stimulation.waitUntilComplete();

Memory-efficient design​

Memory overhead per stimulation​

Base overhead (minimal context, ~2-3 keys)​

Queue size and payload impact​

How much can you queue before hitting 1 GB?​

When not to use CNStra's queue for work items​

Memory usage at scale​

Context size impact​

Best practices for memory efficiency​

Performance characteristics​

Best practices​

Keep context data minimal​

Use synchronous responses when possible​

Set reasonable maxNeuronHops​

Implement proper error handling​

Avoid autoCleanupContexts in production​

Measuring performance​