Performance Optimization & Memory Management for State Machines
CNStra is designed to be memory-efficient and fast for reactive orchestration.
Memory-efficient design
- Zero dependencies: No third-party packages, minimal bundle size.
- No error storage: Errors are delivered via callbacks, not accumulated in memory.
- Streaming responses: Signal traces are delivered via
onResponsecallbacks, not buffered. - Context on-demand: Context stores are created only when needed via
withCtx(). - No global state: Each stimulation starts with a clean slate; no ambient listeners.
Memory overhead per stimulation
Each active CNSStimulation instance has a fixed overhead for internal data structures:
Base overhead (minimal context, ~2-3 keys)
-
CNSStimulation object: ~1.2-1.5 KB
- Context store (Map): ~200 bytes
- Task queue (array + Set): ~500 bytes
- Pending/failed tasks tracking: ~200 bytes
- Promise and metadata: ~300 bytes
-
Active tasks in queue: ~100-150 bytes per task (metadata only)
- ⚠️ Important: Each task stores the full input signal with payload
- Task structure:
{ stimulationId, neuronId, dendriteCollateralName, input: { collateralName, payload } } - Typical stimulation has 5-10 tasks simultaneously: ~500-1500 bytes (metadata) + payload size
-
Context data: ~100-200 bytes (for minimal context with 2-3 simple values)
Total per stimulation (minimal context, small payloads): ~1.8-3.2 KB
Queue size and payload impact
Critical: The activation queue stores complete signal payloads in memory. Queue size directly multiplies payload memory usage.
Per-task memory breakdown:
- Task metadata (IDs, names): ~100-150 bytes
- Signal payload: variable, can be any size (from 0 bytes to MBs)
Queue memory = (task metadata × queue length) + (payload size × queue length)
Examples:
| Queue Length | Small Payloads (100 bytes) | Medium Payloads (1 KB) | Large Payloads (10 KB) | Very Large Payloads (100 KB) |
|---|---|---|---|---|
| 10 tasks | ~2.5 KB | ~11.5 KB | ~101.5 KB | ~1 MB |
| 100 tasks | ~25 KB | ~115 KB | ~1 MB | ~10 MB |
| 1,000 tasks | ~250 KB | ~1.15 MB | ~10 MB | ~100 MB |
| 10,000 tasks | ~2.5 MB | ~11.5 MB | ~100 MB | ~1 GB |
At scale (1,000 concurrent stimulations):
| Queue Length per Stimulation | Small Payloads | Medium Payloads | Large Payloads | Very Large Payloads |
|---|---|---|---|---|
| 10 tasks | ~2.5 MB | ~11.5 MB | ~100 MB | ~1 GB |
| 100 tasks | ~25 MB | ~115 MB | ~1 GB | ~10 GB |
| 1,000 tasks | ~250 MB | ~1.15 GB | ~10 GB | ~100 GB |
⚠️ Memory warning: If your payloads are large (e.g., full documents, images, large JSON objects) and queues grow (e.g., due to slow processing or high concurrency), memory usage can explode quickly.
Memory usage at scale
| Concurrent Stimulations | Minimal Context (2-3 keys) | Growing Context (10 keys, objects) |
|---|---|---|
| 1,000 | ~1.8-3.2 MB | ~2.2-5 MB |
| 10,000 | ~18-32 MB | ~22-50 MB |
| 1,000,000 | ~1.8-3.2 GB | ~2.2-5 GB |
Context size impact
Context growth significantly impacts memory usage:
- Minimal context (2-3 keys, primitives): +100-200 bytes per stimulation
- Small context (5-10 keys, primitives): +300-500 bytes per stimulation
- Medium context (10-20 keys, small objects): +500-1500 bytes per stimulation
- Large context (20+ keys, complex objects): +1.5-5 KB per stimulation
Example: If you store full user objects (5-10 KB each) in context instead of just IDs:
- 1,000 stimulations: +5-10 MB → ~7-15 MB total
- 10,000 stimulations: +50-100 MB → ~70-130 MB total
- 1,000,000 stimulations: +5-10 GB → ~7-15 GB total
Best practices for memory efficiency
-
Keep payloads small - this is the most critical factor:
// ✅ Good: small payload (~50 bytes)
return axon.output.createSignal({ userId: '123', action: 'created' });
// ❌ Bad: large payload (~50 KB+)
return axon.output.createSignal({
user: fullUserObject,
history: largeArray,
metadata: hugeObject
}); -
Use references instead of full data in signals:
// ✅ Good: pass only ID, fetch data when needed
return axon.process.createSignal({ documentId: 'doc-123' });
// ❌ Bad: pass entire document
return axon.process.createSignal({ document: fullDocumentObject }); -
Context stores per-neuron per-stimulation metadata only (retry attempts, debounce state), not business data:
// ✅ Good: Context stores metadata (~50 bytes)
ctx.set({ attempt: 2, startTime: Date.now() });
// ❌ Bad: Don't store business data in context
// Business data should flow through signal payloads
ctx.set({ user: fullUserObject, history: lotsOfData }); -
Monitor and limit queue size to prevent memory bloat:
onResponse: (r) => {
if (r.queueLength > 1000) {
// Queue is growing - consider:
// - Reducing concurrency
// - Adding backpressure
// - Investigating slow processing
console.warn(`Queue length: ${r.queueLength}`);
}
} -
Set reasonable concurrency limits to prevent queue buildup:
// Limit concurrent operations to match your processing capacity
const stimulation = cns.stimulate(signal, {
concurrency: 10 // Prevents queue from growing unbounded
}); -
Avoid storing large arrays or nested objects in context. Use external storage (DB, cache) and reference by ID.
-
Clean up completed stimulations promptly if you're tracking them externally. The stimulation object is garbage-collected when no longer referenced.
-
For large payloads, consider streaming or chunking:
// Instead of one large signal, split into smaller chunks
const chunks = splitIntoChunks(largeData, 1000);
return chunks.map(chunk => axon.process.createSignal({ chunk, index })); -
Use external queue systems to control memory load:
⚠️ Critical: For production systems processing high volumes, use external queue systems (BullMQ, RabbitMQ, AWS SQS) to control memory usage instead of creating thousands of stimulations in memory.
// ✅ Good: Use external queue to control load
import { Queue, Worker } from 'bullmq';
const queue = new Queue('jobs', {
limiter: { max: 100, duration: 1000 } // Rate limit
});
new Worker('jobs', async (job) => {
// Process one job at a time, controlling memory
const stimulation = cns.stimulate(
myCollateral.createSignal(job.data),
{ concurrency: 10 }
);
await stimulation.waitUntilComplete();
});
// Enqueue work externally - doesn't consume memory until processed
await queue.add('process', { userId: '123' });Benefits:
- Work is persisted externally, not in memory
- Rate limiting and backpressure handled by queue system
- Survives process restarts
- Better observability and retry mechanisms
See Integrations for examples.
-
Context is per-neuron per-stimulation and automatically cleaned up:
Each neuron in each stimulation has its own context instance. Contexts hold memory for the entire duration of a stimulation and are automatically cleaned up when the stimulation completes. Use context for metadata only, not business data:
// Context stores per-neuron per-stimulation metadata (processing stats)
const processor = withCtx<{ processedCount: number; startTime: number }>()
.neuron('processor', { next })
.dendrite({
collateral: input,
response: async (payload, axon, ctx) => {
// Context stores per-neuron per-stimulation metadata
const metadata = ctx.get() ?? { processedCount: 0, startTime: Date.now() };
ctx.set({ ...metadata, processedCount: metadata.processedCount + 1 });
// Business data flows through payloads
const batch = await fetchBatch(payload.batchId);
const results = await processBatch(batch);
return axon.next.createSignal({
batchId: payload.nextBatchId,
results // Business data in payload
});
}
});Best practice: Context is automatically cleaned up when stimulation completes. Store only metadata in context, pass business data through signal payloads.
-
Use batch processing with recursive self-calls (instead of fan-out):
❌ Bad: Creating 10,000 signals with large payloads floods memory:
// This creates 10,000 tasks in queue, each with full payload
const items = await db.fetchAll(10000);
return items.map(item =>
axon.process.createSignal({ fullItem: item }) // 10KB each = 100MB in queue!
);✅ Good: Process in batches with recursive self-calls. Pass offset through payload, not context:
const BATCH_SIZE = 20;
const batchProcessor = neuron('batch-processor', { processBatch, nextBatch })
.dendrite({
collateral: processBatch,
response: async (payload, axon) => {
// Offset comes from payload, not context
const offset = payload.offset ?? 0;
// Fetch only one batch from DB
const batch = await db.fetchBatch(offset, BATCH_SIZE);
// Process this batch (small memory footprint)
await processItems(batch);
// If more items exist, recursively call self with next offset in payload
if (batch.length === BATCH_SIZE) {
// Recursive self-call - pass offset through payload
return axon.nextBatch.createSignal({
offset: offset + BATCH_SIZE
});
}
// Done
ctx.delete('offset');
return undefined;
}
});
// Start processing
cns.stimulate(processBatch.createSignal({ offset: 0 }));Benefits:
- Only one batch (20 items) in memory at a time
- Queue length stays at 1-2 tasks instead of 10,000
- Memory usage: ~200KB instead of ~100MB
- Natural backpressure: next batch only starts after current completes
- Works perfectly with per-neuron concurrency limits
Pattern: Fetch → Process → Recurse (if more) → Cleanup
Performance characteristics
- Sync-first: Synchronous neuron chains execute in a single tick without extra Promise overhead.
- Minimal async overhead: Async responses only schedule a microtask; not inherently slower. Promises are created only when a neuron returns an async result.
- Stack-safe: Deep chains are handled via an internal queue, avoiding stack overflow.
- Bounded execution:
maxNeuronHopsprevents runaway processing in cyclic graphs.
Best practices
Keep context data minimal
Store only essential data (IDs, counters, flags) in context. Avoid large objects or full entities.
// ✅ Good: minimal context
ctx.set({ userId: '123', attempt: 2 });
// ❌ Bad: bloated context
ctx.set({ user: fullUserObject, history: lotsOfData });
Use synchronous responses when possible
If a neuron doesn't perform I/O, return the next signal synchronously:
// ✅ Sync response (fast)
.dendrite({
collateral: input,
response: (p, axon) => axon.output.createSignal({ value: p.value * 2 })
});
// ⚠️ Async response (schedules a microtask; use when doing I/O)
.dendrite({
collateral: input,
response: async (p, axon) => {
const result = await fetch('/api');
return axon.output.createSignal(result);
}
});
Set reasonable maxNeuronHops
Default: undefined (disabled). If you need a safety cap for cyclic graphs, set a lower limit:
const stimulation = cns.stimulate(signal, {
maxNeuronHops: 10 // stop after 10 hops (optional, disabled by default)
});
await stimulation.waitUntilComplete();
Implement proper error handling
Use onResponse to log errors without blocking the flow:
const stimulation = cns.stimulate(signal, {
onResponse: (r) => {
if (r.error) logger.error(r.error);
if (r.queueLength === 0) logger.info('done');
}
});
await stimulation.waitUntilComplete();
Avoid autoCleanupContexts in production
The CNS autoCleanupContexts option adds significant overhead:
- O(V²) initialization cost: building SCC (Strongly Connected Components) structures
- O(1 + A) runtime cost per cleanup check (where A = number of SCC ancestors)
- Memory overhead for storing SCC graphs and ancestor relationships
Use only when:
- Memory leaks are a critical issue
- You have a small to medium-sized neuron graph (< 1000 neurons)
- Performance is less critical than memory management
For production systems, prefer manual context cleanup or custom cleanup strategies.
Measuring performance
Use onResponse to track signal flow timing:
const start = Date.now();
const stimulation = cns.stimulate(signal, {
onResponse: (r) => {
if (r.queueLength === 0) {
console.log(`Completed in ${Date.now() - start}ms, ${r.hops} hops`);
}
}
});
await stimulation.waitUntilComplete();
Or integrate with your APM/tracing tool (e.g., OpenTelemetry):
const stimulation = cns.stimulate(signal, {
onResponse: (r) => {
span.addEvent('neuron', { collateral: r.outputSignal?.collateralName });
if (r.error) span.recordException(r.error);
}
});
await stimulation.waitUntilComplete();