Docs-Builder

# BranchPy Server Lifecycle Management

Version: 1.1.1
Last Updated: January 23, 2026
Status: Production
Audience: Developers, Extension Maintainers


Overview

The BranchPy daemon features fully automated lifecycle management requiring zero manual intervention:

  • ✅ Auto-Start - Daemon starts automatically when needed
  • ✅ Auto-Diagnose - Startup failures provide actionable diagnostics
  • ✅ Auto-Recovery - Crashes trigger automatic restart and reload
  • ✅ Auto-Stop - Clean shutdown when VS Code closes

Core Philosophy: Users should never think about server management. Everything “just works.”

For comprehensive observability and monitoring details, see Technical/backend/observability.md. For logging configuration and troubleshooting, see Technical/logging/README.md.


Auto-Start

When It Happens

  • User opens any BranchPy feature (PILOT, Compare Lanes, Semantics)
  • Extension detects daemon is not running
  • Daemon spawns automatically in background
  • Health check confirms server ready
  • Feature loads seamlessly

Implementation Flow

User Action (Click PILOT)
    ↓
Extension: ensureRunning()
    ↓
Check metadata file: ~/.branchpy/default/wsd.meta.json
    ↓ (if not found or PID not alive)
Start daemon: python branchpy/ws/run.py
    ↓
Wait for health check: GET /pilot/health
    ↓ (max 10 seconds)
Server ready → proceed with UI

Code Example (TypeScript)

async ensureRunning(): Promise<boolean> {
  // 1. Quick check: already running?
  if (this.isRunning()) {
    return true;
  }

  // 2. Start daemon if not running
  const started = await this.start();
  if (!started) {
    return false;
  }

  // 3. Wait for health check (max 10 seconds)
  const healthy = await this.waitForHealth(10000);
  return healthy;
}

Configuration

{
  "branchpy.daemon.autoStart": true,  // default
  "branchpy.daemon.host": "127.0.0.1",
  "branchpy.daemon.port": 8765
}

Auto-Diagnose

Purpose

When daemon startup fails, auto-diagnose runs a series of checks to identify the root cause and provide actionable guidance. Diagnostic results are automatically logged for analysis. For detailed logging configuration, see Technical/logging/README.md.

Diagnostic Checks

1. Python Availability

async checkPython(): Promise<DiagnosticResult> {
  const result = await exec('python --version');
  const version = result.stdout.match(/Python (\d+\.\d+\.\d+)/)?.[1];
  
  if (!version) {
    return { status: 'error', message: 'Python not found in PATH' };
  }
  
  const [major, minor] = version.split('.').map(Number);
  if (major < 3 || (major === 3 && minor < 8)) {
    return { 
      status: 'error', 
      message: `Python ${version} too old (need ≥ 3.8)` 
    };
  }
  
  return { status: 'ok', message: `Python ${version}` };
}

2. BranchPy Installation

async checkBranchPy(): Promise<DiagnosticResult> {
  try {
    const result = await exec('python -m branchpy --version');
    return { status: 'ok', message: `BranchPy ${result.stdout.trim()}` };
  } catch {
    return { 
      status: 'error', 
      message: 'BranchPy not installed (run: pip install branchpy)' 
    };
  }
}

3. Port Availability

async checkPort(port: number): Promise<DiagnosticResult> {
  try {
    const server = net.createServer();
    await new Promise((resolve, reject) => {
      server.listen(port, () => resolve(true));
      server.on('error', reject);
    });
    server.close();
    return { status: 'ok', message: `Port ${port} available` };
  } catch {
    const pid = await findProcessOnPort(port);
    return { 
      status: 'error', 
      message: `Port ${port} in use by PID ${pid}`,
      fix: `Run: taskkill /F /PID ${pid}` 
    };
  }
}

4. Startup Logs Analysis

async checkStartupLogs(): Promise<DiagnosticResult> {
  const logs = this.daemonProcess?.stdout?.slice(-50) || [];
  
  if (logs.some(line => line.includes('ModuleNotFoundError'))) {
    return { 
      status: 'error', 
      message: 'Missing Python dependencies',
      fix: 'Run: pip install -r requirements.txt'
    };
  }
  
  if (logs.some(line => line.includes('SyntaxError'))) {
    return { 
      status: 'error', 
      message: 'Syntax error in BranchPy code',
      fix: 'Check recent code changes'
    };
  }
  
  return { status: 'ok', message: 'No startup errors' };
}

User Experience

╔══════════════════════════════════════════════════════╗
║        🩺 BranchPy Daemon Diagnostics                ║
╠══════════════════════════════════════════════════════╣
║ ✅ Python: 3.11.5                                    ║
║ ✅ BranchPy: 0.9.0                                   ║
║ ❌ Port: 8766 in use by PID 12345                    ║
â•‘    Fix: taskkill /F /PID 12345                       â•‘
║ ✅ Startup Logs: No errors                           ║
╠══════════════════════════════════════════════════════╣
â•‘ Recommendation: Kill conflicting process or use      â•‘
â•‘                 a different port (e.g., 8767)        â•‘
╚══════════════════════════════════════════════════════╝

[Kill Process]  [Use Port 8767]  [Show Full Logs]

Auto-Recovery

Problem Solved

Before: User sees “Can’t reach this page” → must manually restart daemon → refresh webview

After: Webview detects failure → auto-restarts daemon → auto-reloads → user sees brief notification

Architecture

Webview Health Monitor (JavaScript)
  • Polls GET /pilot/health every 10 seconds
  • Tracks consecutive failures (threshold: 2)
    ↓ (2 failures detected)
vscode.postMessage({ type: 'restartDaemon', reason: 'HTTP 500' })
    ↓
Extension: _restartDaemonAndNotify()
  • Show notification: "🔄 Auto-restarting server..."
  • Execute: daemonService.restart()
  • Wait 2 seconds for startup
  • Send 'daemonRestarted' message to webview
    ↓
Webview: window.location.reload()
  • Fresh page load with new daemon

For detailed health check implementation and monitoring, see Technical/backend/observability.md.

Health Monitor Implementation (JavaScript)

function setupHealthMonitor() {
  let failureCount = 0;
  const MAX_FAILURES = 2;
  const CHECK_INTERVAL = 10000; // 10 seconds
  let isRecovering = false;

  async function checkHealth() {
    if (isRecovering) return;
    
    try {
      const response = await fetch('http://127.0.0.1:8766/pilot/health');
      if (response.ok) {
        failureCount = 0; // Reset on success
      } else {
        handleFailure(`HTTP ${response.status}`);
      }
    } catch (error) {
      handleFailure(error.message);
    }
  }

  function handleFailure(reason) {
    failureCount++;
    
    if (failureCount >= MAX_FAILURES) {
      attemptAutoRecovery(reason);
    }
  }

  function attemptAutoRecovery(reason) {
    isRecovering = true;
    
    // Notify extension to restart daemon
    vscode.postMessage({ type: 'restartDaemon', reason });
    
    // Lock recovery for 15 seconds (prevents loops)
    setTimeout(() => {
      isRecovering = false;
      failureCount = 0;
    }, 15000);
  }

  setInterval(checkHealth, CHECK_INTERVAL);
}

Restart Handler (TypeScript)

private async _restartDaemonAndNotify(reason: string): Promise<void> {
  try {
    // 1. User notification
    void vscode.window.showInformationMessage(
      `🔄 Auto-restarting BranchPy server (${reason})...`
    );

    // 2. Execute restart
    await vscode.commands.executeCommand('bpy.ws.restart');
    
    // 3. Wait for startup
    await new Promise(resolve => setTimeout(resolve, 2000));
    
    // 4. Notify webview to reload
    this._panel.webview.postMessage({ type: 'daemonRestarted' });
    
    // 5. Success notification
    void vscode.window.showInformationMessage(
      '✅ BranchPy server restarted successfully'
    );
  } catch (error) {
    void vscode.window.showErrorMessage(
      `❌ Failed to restart: ${error}`
    );
  }
}

Recovery Timeline

[00:00] User working normally
[00:05] Daemon crashes
[00:10] Health check #1 fails
[00:12] Status: "Server issue (1/2)"
[00:20] Health check #2 fails
[00:22] Auto-recovery triggered
[00:22] Notification: "🔄 Auto-restarting..."
[00:23] Daemon stops (old process killed)
[00:24] Daemon starts (new process spawned)
[00:25] Webview reloads
[00:25] Notification: "✅ Restarted successfully"
[00:26] User continues working ✨

Restart Policies

The daemon implements the following restart policies to ensure system stability:

Restart Threshold Policy

  • Maximum restarts: 3 attempts
  • Time window: 5 minutes
  • Action on threshold: Disable auto-restart, notify user

Backoff Policy

  • First restart: Immediate (0s delay)
  • Second restart: 5s delay
  • Third restart: 15s delay
  • After threshold: Manual intervention required

Recovery Lock

  • Duration: 15 seconds
  • Purpose: Prevent restart loops during ongoing failures
  • Behavior: Ignore health check failures while locked

Configuration

{
  "branchpy.autoRecovery.enabled": true,
  "branchpy.autoRecovery.maxFailures": 2,
  "branchpy.autoRecovery.checkInterval": 10000,
  "branchpy.autoRecovery.recoveryLockTimeout": 15000,
  "branchpy.autoRecovery.maxRestarts": 3,
  "branchpy.autoRecovery.restartWindow": 300000
}

Auto-Stop

Purpose

Gracefully stop daemon when VS Code closes to prevent orphaned processes.

Implementation Flow

User closes VS Code
    ↓
VS Code: extension.deactivate()
    ↓
daemonService.stop()
  • Send SIGTERM to daemon process
  • Wait 5 seconds for graceful exit
  • Send SIGKILL if still running (force)
  • Clean metadata files
    ↓
Daemon exits cleanly

Graceful Stop (TypeScript)

async stop(): Promise<void> {
  if (!this.daemonProcess) {
    return; // Already stopped
  }

  const pid = this.daemonProcess.pid;
  
  try {
    // 1. Send SIGTERM (graceful shutdown)
    process.kill(pid, 'SIGTERM');
    
    // 2. Wait up to 5 seconds for exit
    const exited = await this.waitForExit(pid, 5000);
    
    if (!exited) {
      // 3. Force kill if still running
      console.warn(`PID ${pid} didn't exit gracefully, forcing...`);
      process.kill(pid, 'SIGKILL');
    }
    
    // 4. Clean metadata
    await this.cleanMetadata();
    
    // 5. Update UI
    this.updateStatusBar('stopped');
  } catch (error) {
    console.error(`Error stopping PID ${pid}:`, error);
  } finally {
    this.daemonProcess = null;
  }
}

Daemon SIGTERM Handler (Python)

import signal
import sys

def sigterm_handler(signum, frame):
    """Graceful shutdown on SIGTERM"""
    logger.info("[Daemon] Received SIGTERM, shutting down...")
    
    # 1. Close active connections
    if app:
        app.shutdown()
    
    # 2. Flush logs
    logging.shutdown()
    
    # 3. Exit cleanly
    sys.exit(0)

# Register handler
signal.signal(signal.SIGTERM, sigterm_handler)

State Machine

┌──────────┐
│ STOPPED  │ ← Initial state
└──────────┘
     │ [User opens feature]
     â–¼
┌──────────┐
│ STARTING │ ← Auto-start triggered
└──────────┘
     │ [Health check passes]
     â–¼
┌──────────┐
│ RUNNING  │ ← Normal operation
└──────────┘
     │         │
     │         │ [Crash detected]
     │         ▼
     │    ┌──────────┐
     │    │RESTARTING│ ← Auto-recovery
     │    └──────────┘
     │         │
     │         ▼
     │    ┌──────────┐
     │    │ RUNNING  │
     │    └──────────┘
     │
     │ [VS Code closes]
     â–¼
┌──────────┐
│ STOPPING │ ← Auto-stop triggered
└──────────┘
     │
     â–¼
┌──────────┐
│ STOPPED  │ ← Clean exit
└──────────┘

Diagnostic Checks and Health Monitoring

Health Check Flow

The daemon implements comprehensive health checks at multiple levels:

Startup Health Checks

  1. Python environment validation

    • Python version ≥ 3.8
    • Required packages installed
    • Virtual environment activated (if configured)
  2. Network availability

    • Port binding successful
    • No conflicting processes
    • Firewall rules allow local connections
  3. Resource availability

    • Sufficient memory (minimum 100MB free)
    • Disk space for logs and metadata
    • File descriptor limits not exceeded

Runtime Health Checks

  • Endpoint: GET /pilot/health
  • Frequency: Every 10 seconds (configurable)
  • Timeout: 5 seconds
  • Expected response:
    {
      "status": "healthy",
      "uptime": 3600,
      "version": "1.1.1",
      "pid": 12345
    }

Deep Health Checks (Periodic)

  • Frequency: Every 60 seconds
  • Checks:
    • Memory usage within limits
    • No deadlocked threads
    • Database connections healthy
    • Cache hit rates acceptable
    • Request queue not backing up

For comprehensive observability metrics and monitoring dashboards, see Technical/backend/observability.md.


Troubleshooting

Issue: Daemon Won’t Start

Symptoms: Webview shows “Can’t reach this page”, status bar shows red

Auto-Diagnose Checks:

  1. ✅ Python installed? → python --version
  2. ✅ BranchPy installed? → python -m branchpy --version
  3. ✅ Port available? → Check for conflicts
  4. ✅ Startup logs? → .branchpy/logs/ws_daemon.log

Manual Fix:

Command Palette → "BranchPy: Run Diagnostics"

For detailed logging configuration and log analysis, see Technical/logging/README.md.


Issue: Daemon Keeps Crashing

Symptoms: Auto-restart notifications every 30 seconds

Auto-Recover Behavior:

  • After 3 failed restarts in 5 minutes → Disables auto-restart
  • Shows: “❌ Auto-restart disabled (too many failures)”

Manual Fix:

  1. Check Output panel: “BranchPy Server” logs
  2. Identify error (syntax error, missing dependency, etc.)
  3. Fix error in code
  4. Command Palette → "BranchPy: Restart Daemon"

Common Causes:

  • Python syntax errors in recent changes
  • Missing or incompatible dependencies
  • Port conflicts with other applications
  • Insufficient system resources (memory, disk space)
  • Corrupted metadata files

For comprehensive error analysis and debugging, see Technical/logging/README.md.


Issue: Orphaned Daemon After Crash

Symptoms: VS Code crashed, daemon still running, new window can’t start

Auto-Detect: On extension activation, checks for orphaned daemons:

const orphanedPid = await findOrphanedDaemon();

if (orphanedPid) {
  const action = await vscode.window.showWarningMessage(
    `Found orphaned daemon (PID ${orphanedPid})`,
    'Kill Process', 'Adopt Process', 'Ignore'
  );
  
  if (action === 'Kill Process') {
    process.kill(orphanedPid, 'SIGKILL');
  }
}

Manual Fix:

# Windows
Get-Process python | Where-Object { $_.CommandLine -like '*branchpy*' } | Stop-Process -Force

# Linux/macOS
pkill -f "python.*branchpy"

Performance Targets

Latency Budget

Operation Target Actual P99
Health check <10ms 5-8ms 12ms
Auto-start <3s 1.5-2s 2.8s
Auto-recovery <5s 3-4s 5.5s
Auto-stop <2s 1s 1.8s

Resource Limits

Resource Minimum Recommended Maximum
Memory 100MB 256MB 512MB
CPU (idle) 0-1% <2% 5%
CPU (active) 5-15% <25% 50%
Disk I/O Minimal <10MB/s 50MB/s

Metrics & Monitoring

Tracked Metrics

The daemon tracks comprehensive operational metrics for monitoring and diagnostics. For full metrics specification and dashboard implementation, see Technical/backend/observability.md.

Lifecycle Metrics

  • Uptime: Seconds since last start
  • Restart count: Total restarts in current session
  • Failed health checks: Count in last 24 hours
  • Recovery attempts: Successful vs. failed recoveries
  • Graceful stops: vs. forced terminations

Performance Metrics

  • Health check latency: P50, P95, P99
  • Memory usage: RSS, heap size, resident set
  • Request count: Total, per endpoint, per hour
  • Error rate: 4xx and 5xx responses
  • Active connections: Current WebSocket/HTTP connections

Diagnostic Metrics

  • Startup failures: Count and reasons
  • Port conflicts: Frequency and resolution
  • Orphaned processes: Detection and cleanup rate
  • Log volume: Bytes written per minute
  • Exception rate: Unhandled exceptions per hour

Dashboard (Future UI)

╔══════════════════════════════════════════════════════╗
║         🔍 BranchPy Daemon Health Dashboard          ║
╠══════════════════════════════════════════════════════╣
║ Status: 🟢 RUNNING                                   ║
â•‘ Uptime: 2 hours 34 minutes                           â•‘
â•‘ Restarts: 0 (last 24 hours)                          â•‘
â•‘ Health: 100% (0 failed checks)                       â•‘
â•‘ Response Time: 12ms (avg)                            â•‘
â•‘ Memory: 145 MB / 512 MB (28%)                        â•‘
â•‘ Active Connections: 3 WebSocket, 0 HTTP              â•‘
â•‘ Request Rate: 45 req/min                             â•‘
╚══════════════════════════════════════════════════════╝


Changelog

v1.1.1 (January 23, 2026)

  • 📝 Updated documentation to version 1.1.1
  • 🔗 Added cross-references to observability and logging documentation
  • 📊 Enhanced metrics tracking specification
  • 🔧 Clarified restart policies and thresholds
  • ✨ Added deep health checks and resource limits
  • 📈 Updated performance targets with P99 latencies

v1.0.0 (2025-11-07)

  • ✅ Auto-start on feature use
  • ✅ Auto-diagnose on startup failure
  • ✅ Auto-recovery from crashes
  • ✅ Auto-stop on VS Code shutdown
  • ✅ Health monitoring with 10s intervals
  • ✅ User notifications for all lifecycle events
  • ✅ Orphaned daemon detection

Status: ✅ Production Ready
Last Updated: January 23, 2026


Source Reference

This document is a consolidated version of the lifecycle management documentation from BranchPy v1.1.0.

Original Source: docs/v1.1.0/Server/LIFECYCLE.md