Skip to main content

ADR-020: IPC Architecture - Single-Writer Channels

Status

Accepted

Context

The catnet overlay system requires high-performance inter-process communication (IPC) between the main process and overlay processes. Initial designs considered bidirectional channels to simplify the API, but this approach had fundamental issues:

  1. Race Conditions: Multiple writers to the same ring buffer cause data corruption
  2. Performance: Synchronization mechanisms (locks, CAS loops) destroy performance
  3. Complexity: Bidirectional channels hide the multiple-writer problem
  4. Lock-Free Goals: Our sub-microsecond latency target requires lock-free design

Extensive testing revealed that lock-free ring buffers cannot safely support multiple writers without severe performance penalties. Three implementations were tested:

  • RingBuffer V1: Race conditions with concurrent writers
  • RingBuffer V2: Attempted fix with broken space accounting
  • RingBuffer V3: Correct but with 50-100x performance degradation

Decision

We will use single-writer, multiple-reader channels as the fundamental IPC primitive.

Design Principles

  1. One Writer Per Channel

    • Each channel has exactly one writer (producer)
    • Multiple readers are supported via atomic operations
    • Bidirectional communication uses two separate channels
  2. Environment Variable Channel Discovery

    • Parent process creates channels before spawning children
    • Channel names passed via environment variables:
      • OVERLAY_CONTROL_CHANNEL - Control messages to overlay
      • OVERLAY_ID - Unique overlay identifier
      • CATNET_DISABLE_IPC_PERMISSIONS - Permission bypass for trusted processes
  3. Multi-Producer Patterns

    • Recommended: One channel per producer
    • Alternative: Aggregator thread pattern
    • Advanced: Sharded channels with hash-based routing
  4. Permission System

    • Channels protected by default
    • Parent sets CATNET_DISABLE_IPC_PERMISSIONS=1 for children
    • Prevents unauthorized processes from accessing channels

Consequences

Positive

  • Performance: Sub-microsecond latency (0.6-4.9µs typical)
  • Correctness: No race conditions or data corruption
  • Simplicity: Clear ownership model (one writer per channel)
  • Scalability: Lock-free readers scale perfectly

Negative

  • API Complexity: Users must manage channel pairs for bidirectional communication
  • Channel Proliferation: Multi-producer scenarios require multiple channels
  • Discovery: Applications must implement their own channel naming/discovery

Neutral

  • Explicit Design: Forces users to think about data flow direction
  • Compatibility: Works with any serialization format
  • Memory Usage: Each channel requires separate shared memory segment

Implementation Notes

The single-writer design is not a limitation but a fundamental requirement for lock-free performance. Key implementation details:

// Parent process
let channel = TypedChannel::create("control_channel")?;
env::set_var("OVERLAY_CONTROL_CHANNEL", "control_channel");
env::set_var("CATNET_DISABLE_IPC_PERMISSIONS", "1");
let child = Command::new("overlay.exe").spawn()?;

// Child process
let channel_name = env::var("OVERLAY_CONTROL_CHANNEL")?;
let channel = TypedChannel::<Message>::open(&channel_name)?;

For bidirectional communication:

  • Parent creates two channels: overlay_X_control and overlay_X_status
  • Parent writes to control, reads from status
  • Child reads from control, writes to status

This architecture ensures maximum performance while maintaining correctness and security.