Performance Optimization & Memory Management for State Machines
CNStra is designed to be memory-efficient and fast for reactive orchestration.
Memory-efficient design
- Zero dependencies: No third-party packages, minimal bundle size.
- No error storage: Errors are delivered via callbacks, not accumulated in memory.
- Streaming responses: Signal traces are delivered via
onResponsecallbacks, not buffered. - Context on-demand: Context stores are created only when needed via
withCtx(). - No global state: Each stimulation starts with a clean slate; no ambient listeners.
Memory overhead per stimulation
Each active CNSStimulation instance carries some internal overhead, but most
of it is allocated lazily — a minimal synchronous fire-and-forget run
(no listener, no context, never awaited) allocates only the object itself plus a
small ring-buffer array.
Base overhead (minimal context, ~2-3 keys)
-
CNSStimulation object: ~0.6-1.5 KB depending on what's actually used
- Activation queue: a small ring-buffer array (lazily allocated on first enqueue); the active-tasks
Setis only created while an async task is in flight - Context store (
Map): lazily created — only when a neuron reads/writesctx - Pending/failed task tracking: lazily created — only on async
onResponseor an actual failure - Completion
Promise: created only if you callwaitUntilComplete()
- Activation queue: a small ring-buffer array (lazily allocated on first enqueue); the active-tasks
-
Active tasks in queue: ~100-150 bytes per task (metadata only)
- ⚠️ Important: Each task stores the full input signal with payload
-
Task structure:
{ neuron, dendriteCollateral, input: { collateral, payload } }- Typical stimulation has 5-10 tasks simultaneously: ~500-1500 bytes (metadata) + payload size
-
Context data: ~100-200 bytes (for minimal context with 2-3 simple values)
Total per stimulation (minimal context, small payloads): ~1.8-3.2 KB
Queue size and payload impact
Critical: The activation queue stores complete signal payloads in memory. Queue size directly multiplies payload memory usage.
Per-task memory breakdown:
- Task metadata (object references): ~100-150 bytes
- Signal payload: variable, can be any size (from 0 bytes to MBs)
Queue memory = (task metadata × queue length) + (payload size × queue length)
Examples:
| Queue Length | Small Payloads (100 bytes) | Medium Payloads (1 KB) | Large Payloads (10 KB) | Very Large Payloads (100 KB) |
|---|---|---|---|---|
| 10 tasks | ~2.5 KB | ~11.5 KB | ~101.5 KB | ~1 MB |
| 100 tasks | ~25 KB | ~115 KB | ~1 MB | ~10 MB |
| 1,000 tasks | ~250 KB | ~1.15 MB | ~10 MB | ~100 MB |
| 10,000 tasks | ~2.5 MB | ~11.5 MB | ~100 MB | ~1 GB |
At scale (1,000 concurrent stimulations):
| Queue Length per Stimulation | Small Payloads | Medium Payloads | Large Payloads | Very Large Payloads |
|---|---|---|---|---|
| 10 tasks | ~2.5 MB | ~11.5 MB | ~100 MB | ~1 GB |
| 100 tasks | ~25 MB | ~115 MB | ~1 GB | ~10 GB |
| 1,000 tasks | ~250 MB | ~1.15 GB | ~10 GB | ~100 GB |
⚠️ Memory warning: If your payloads are large (e.g., full documents, images, large JSON objects) and queues grow (e.g., due to slow processing or high concurrency), memory usage can explode quickly.
How much can you queue before hitting 1 GB?
CNStra's queue is purely in-memory — a ring buffer that holds every pending task until a dendrite processes it. There is no spilling to disk, no backpressure from the queue itself. All pending work lives in RAM.
Each queued task occupies roughly:
~200 bytes (task metadata + ring buffer slot overhead) + payload size
From that, the maximum number of tasks you can hold in 1 GB of RAM:
| Payload size | Max tasks in 1 GB | Typical scenario |
|---|---|---|
| ~0 bytes (IDs only) | ~5,000,000 | Event IDs, simple triggers |
| ~100 bytes (small record) | ~3,300,000 | Minimal user events |
| ~1 KB (typical entity) | ~830,000 | Order, session, compact DTO |
| ~10 KB (medium document) | ~97,000 | Product page, large form submission |
| ~100 KB (large document) | ~9,800 | Full report, paginated result set |
| ~1 MB (binary / large JSON) | ~1,000 | Image metadata, export chunk |
These numbers are per-process totals across all concurrent stimulations. If you have 1,000 stimulations running in parallel and each queues 100 tasks with 10 KB payloads, that's already 100,000 tasks × 10 KB ≈ 1 GB consumed.
When not to use CNStra's queue for work items
The in-memory queue is designed for short fan-outs within a single business flow — a stimulation that spawns a handful of parallel tasks, each completing quickly. It is not a job queue.
Stop and reach for an external queue (BullMQ, SQS, RabbitMQ) when:
- You don't control queue depth — work arrives faster than dendrites can drain it (e.g., webhook bursts, large imports)
- The total item count is unbounded — processing "all users", "all orders", "all rows in a CSV" fans out into a queue that grows linearly with data size
- Payloads are large — passing full documents or large objects through signals; at 100 KB per payload you hit 1 GB at fewer than 10,000 queued tasks
- The process might restart — everything in the ring buffer is lost on crash; if durability matters, use an external queue
- You need visibility and retries — external queues give you dead-letter queues, retry policies, and dashboards out of the box
The correct pattern for large workloads: let the external queue control depth, and have each worker create exactly one CNStra stimulation per job:
// ✅ External queue controls how many items are in-flight at once
new Worker('jobs', async (job) => {
// One stimulation per job — CNStra queue depth stays tiny
const stimulation = cns.stimulate(myCollateral.createSignal(job.data));
await stimulation.waitUntilComplete();
});
This way CNStra's queue never holds more than the tasks for a single job, regardless of how many jobs are waiting.
Memory usage at scale
| Concurrent Stimulations | Minimal Context (2-3 keys) | Growing Context (10 keys, objects) |
|---|---|---|
| 1,000 | ~1.8-3.2 MB | ~2.2-5 MB |
| 10,000 | ~18-32 MB | ~22-50 MB |
| 1,000,000 | ~1.8-3.2 GB | ~2.2-5 GB |
Context size impact
Context growth significantly impacts memory usage:
- Minimal context (2-3 keys, primitives): +100-200 bytes per stimulation
- Small context (5-10 keys, primitives): +300-500 bytes per stimulation
- Medium context (10-20 keys, small objects): +500-1500 bytes per stimulation
- Large context (20+ keys, complex objects): +1.5-5 KB per stimulation
Example: If you store full user objects (5-10 KB each) in context instead of just IDs:
- 1,000 stimulations: +5-10 MB → ~7-15 MB total
- 10,000 stimulations: +50-100 MB → ~70-130 MB total
- 1,000,000 stimulations: +5-10 GB → ~7-15 GB total
Best practices for memory efficiency
-
Keep payloads small - this is the most critical factor:
// ✅ Good: small payload (~50 bytes)return axon.output.createSignal({ userId: '123', action: 'created' });// ❌ Bad: large payload (~50 KB+)return axon.output.createSignal({user: fullUserObject,history: largeArray,metadata: hugeObject}); -
Use references instead of full data in signals:
// ✅ Good: pass only ID, fetch data when neededreturn axon.process.createSignal({ documentId: 'doc-123' });// ❌ Bad: pass entire documentreturn axon.process.createSignal({ document: fullDocumentObject }); -
Context stores per-neuron per-stimulation metadata only (retry attempts, debounce state), not business data:
// ✅ Good: Context stores metadata (~50 bytes)ctx.set({ attempt: 2, startTime: Date.now() });// ❌ Bad: Don't store business data in context// Business data should flow through signal payloadsctx.set({ user: fullUserObject, history: lotsOfData }); -
Monitor and limit queue size to prevent memory bloat:
onResponse: (r) => {if (r.queueLength > 1000) {// Queue is growing - consider:// - Reducing concurrency// - Adding backpressure// - Investigating slow processingconsole.warn(`Queue length: ${r.queueLength}`);}} -
Set reasonable concurrency limits to prevent queue buildup:
// Limit concurrent operations to match your processing capacityconst stimulation = cns.stimulate(signal, {concurrency: 10 // Prevents queue from growing unbounded}); -
Avoid storing large arrays or nested objects in context. Use external storage (DB, cache) and reference by ID.
-
Clean up completed stimulations promptly if you're tracking them externally. The stimulation object is garbage-collected when no longer referenced.
-
For large payloads, consider streaming or chunking:
// Instead of one large signal, split into smaller chunksconst chunks = splitIntoChunks(largeData, 1000);return chunks.map(chunk => axon.process.createSignal({ chunk, index })); -
Use external queue systems to control memory load:
⚠️ Critical: For production systems processing high volumes, use external queue systems (BullMQ, RabbitMQ, AWS SQS) to control memory usage instead of creating thousands of stimulations in memory.
// ✅ Good: Use external queue to control loadimport { Queue, Worker } from 'bullmq';const queue = new Queue('jobs', {limiter: { max: 100, duration: 1000 } // Rate limit});new Worker('jobs', async (job) => {// Process one job at a time, controlling memoryconst stimulation = cns.stimulate(myCollateral.createSignal(job.data),{ concurrency: 10 });await stimulation.waitUntilComplete();});// Enqueue work externally - doesn't consume memory until processedawait queue.add('process', { userId: '123' });Benefits:
- Work is persisted externally, not in memory
- Rate limiting and backpressure handled by queue system
- Survives process restarts
- Better observability and retry mechanisms
See Integrations for examples.
-
Context is per-neuron per-stimulation and automatically cleaned up:
Each neuron in each stimulation has its own context instance. Contexts hold memory for the entire duration of a stimulation and are automatically cleaned up when the stimulation completes. Use context for metadata only, not business data:
// Context stores per-neuron per-stimulation metadata (processing stats)const processor = withCtx<{ processedCount: number; startTime: number }>().neuron({ next }).dendrite({collateral: input,response: async (payload, axon, ctx) => {// Context stores per-neuron per-stimulation metadataconst metadata = ctx.get() ?? { processedCount: 0, startTime: Date.now() };ctx.set({ ...metadata, processedCount: metadata.processedCount + 1 });// Business data flows through payloadsconst batch = await fetchBatch(payload.batchId);const results = await processBatch(batch);return axon.next.createSignal({batchId: payload.nextBatchId,results // Business data in payload});}});Best practice: Context is automatically cleaned up when stimulation completes. Store only metadata in context, pass business data through signal payloads.
-
Use batch processing with recursive self-calls (instead of fan-out):
❌ Bad: Creating 10,000 signals with large payloads floods memory:
// This creates 10,000 tasks in queue, each with full payloadconst items = await db.fetchAll(10000);return items.map(item =>axon.process.createSignal({ fullItem: item }) // 10KB each = 100MB in queue!);✅ Good: Process in batches with recursive self-calls. Pass offset through payload, not context:
const BATCH_SIZE = 20;const batchProcessor = neuron({ processBatch, nextBatch }).dendrite({collateral: processBatch,response: async (payload, axon) => {// Offset comes from payload, not contextconst offset = payload.offset ?? 0;// Fetch only one batch from DBconst batch = await db.fetchBatch(offset, BATCH_SIZE);// Process this batch (small memory footprint)await processItems(batch);// If more items exist, recursively call self with next offset in payloadif (batch.length === BATCH_SIZE) {// Recursive self-call - pass offset through payloadreturn axon.nextBatch.createSignal({offset: offset + BATCH_SIZE});}// Donectx.delete('offset');return undefined;}});// Start processingcns.stimulate(processBatch.createSignal({ offset: 0 }));Benefits:
- Only one batch (20 items) in memory at a time
- Queue length stays at 1-2 tasks instead of 10,000
- Memory usage: ~200KB instead of ~100MB
- Natural backpressure: next batch only starts after current completes
- Works perfectly with per-neuron concurrency limits
Pattern: Fetch → Process → Recurse (if more) → Cleanup
Performance characteristics
- Sync-first: Synchronous neuron chains execute in a single tick without extra Promise overhead.
- Minimal async overhead: Async responses only schedule a microtask; not inherently slower. Promises are created only when a neuron returns an async result.
- Stack-safe: Deep chains are handled via an internal queue, avoiding stack overflow.
- Bounded execution:
maxNeuronHopsprevents runaway processing in cyclic graphs.
Best practices
Keep context data minimal
Store only essential data (IDs, counters, flags) in context. Avoid large objects or full entities.
// ✅ Good: minimal context
ctx.set({ userId: '123', attempt: 2 });
// ❌ Bad: bloated context
ctx.set({ user: fullUserObject, history: lotsOfData });
Use synchronous responses when possible
If a neuron doesn't perform I/O, return the next signal synchronously:
// ✅ Sync response (fast)
.dendrite({
collateral: input,
response: (p, axon) => axon.output.createSignal({ value: p.value * 2 })
});
// ⚠️ Async response (schedules a microtask; use when doing I/O)
.dendrite({
collateral: input,
response: async (p, axon) => {
const result = await fetch('/api');
return axon.output.createSignal(result);
}
});
Set reasonable maxNeuronHops
Default: undefined (disabled). If you need a safety cap for cyclic graphs, set a lower limit:
const stimulation = cns.stimulate(signal, {
maxNeuronHops: 10 // stop after 10 hops (optional, disabled by default)
});
await stimulation.waitUntilComplete();
Implement proper error handling
Use onResponse to log errors without blocking the flow:
const stimulation = cns.stimulate(signal, {
onResponse: (r) => {
if (r.error) logger.error(r.error);
if (r.queueLength === 0) logger.info('done');
}
});
await stimulation.waitUntilComplete();
Avoid autoCleanupContexts in production
The CNS autoCleanupContexts option adds overhead:
- Initialization cost: the SCC (Strongly Connected Components) analysis is built lazily on first use — so it is only paid when this option is enabled (or you read the SCC APIs), not on every
new CNS(). Building it isO(V + E)for the graph/SCC pass plus the SCC-ancestor transitive closure (up toO(V²)for densely connected graphs). - O(1 + A) runtime cost per cleanup check (where A = number of SCC ancestors)
- Memory overhead for storing SCC graphs and ancestor relationships
Use only when:
- Memory leaks are a critical issue
- You have a small to medium-sized neuron graph (< 1000 neurons)
- Performance is less critical than memory management
For production systems, prefer manual context cleanup or custom cleanup strategies.
Measuring performance
Use onResponse to track signal flow timing:
const start = Date.now();
const stimulation = cns.stimulate(signal, {
onResponse: (r) => {
if (r.queueLength === 0) {
console.log(`Completed in ${Date.now() - start}ms, ${r.hops} hops`);
}
}
});
await stimulation.waitUntilComplete();
Or integrate with your APM/tracing tool (e.g., OpenTelemetry):
const stimulation = cns.stimulate(signal, {
onResponse: (r) => {
span.addEvent('neuron', { collateral: r.outputSignal?.collateral });
if (r.error) span.recordException(r.error);
}
});
await stimulation.waitUntilComplete();