Version: 1.1.1
Last Updated: January 23, 2026
Status: Production
Audience: Developers, Extension Maintainers
Overview
The BranchPy daemon features fully automated lifecycle management requiring zero manual intervention:
- β Auto-Start - Daemon starts automatically when needed
- β Auto-Diagnose - Startup failures provide actionable diagnostics
- β Auto-Recovery - Crashes trigger automatic restart and reload
- β Auto-Stop - Clean shutdown when VS Code closes
Core Philosophy: Users should never think about server management. Everything βjust works.β
For comprehensive observability and monitoring details, see Technical/backend/observability.md. For logging configuration and troubleshooting, see Technical/logging/README.md.
Auto-Start
When It Happens
- User opens any BranchPy feature (PILOT, Compare Lanes, Semantics)
- Extension detects daemon is not running
- Daemon spawns automatically in background
- Health check confirms server ready
- Feature loads seamlessly
Implementation Flow
User Action (Click PILOT)
β
Extension: ensureRunning()
β
Check metadata file: ~/.branchpy/default/wsd.meta.json
β (if not found or PID not alive)
Start daemon: python branchpy/ws/run.py
β
Wait for health check: GET /pilot/health
β (max 10 seconds)
Server ready β proceed with UI
Code Example (TypeScript)
async ensureRunning(): Promise<boolean> {
// 1. Quick check: already running?
if (this.isRunning()) {
return true;
}
// 2. Start daemon if not running
const started = await this.start();
if (!started) {
return false;
}
// 3. Wait for health check (max 10 seconds)
const healthy = await this.waitForHealth(10000);
return healthy;
}
Configuration
{
"branchpy.daemon.autoStart": true, // default
"branchpy.daemon.host": "127.0.0.1",
"branchpy.daemon.port": 8765
}
Auto-Diagnose
Purpose
When daemon startup fails, auto-diagnose runs a series of checks to identify the root cause and provide actionable guidance. Diagnostic results are automatically logged for analysis. For detailed logging configuration, see Technical/logging/README.md.
Diagnostic Checks
1. Python Availability
async checkPython(): Promise<DiagnosticResult> {
const result = await exec('python --version');
const version = result.stdout.match(/Python (\d+\.\d+\.\d+)/)?.[1];
if (!version) {
return { status: 'error', message: 'Python not found in PATH' };
}
const [major, minor] = version.split('.').map(Number);
if (major < 3 || (major === 3 && minor < 8)) {
return {
status: 'error',
message: `Python ${version} too old (need β₯ 3.8)`
};
}
return { status: 'ok', message: `Python ${version}` };
}
2. BranchPy Installation
async checkBranchPy(): Promise<DiagnosticResult> {
try {
const result = await exec('python -m branchpy --version');
return { status: 'ok', message: `BranchPy ${result.stdout.trim()}` };
} catch {
return {
status: 'error',
message: 'BranchPy not installed (run: pip install branchpy)'
};
}
}
3. Port Availability
async checkPort(port: number): Promise<DiagnosticResult> {
try {
const server = net.createServer();
await new Promise((resolve, reject) => {
server.listen(port, () => resolve(true));
server.on('error', reject);
});
server.close();
return { status: 'ok', message: `Port ${port} available` };
} catch {
const pid = await findProcessOnPort(port);
return {
status: 'error',
message: `Port ${port} in use by PID ${pid}`,
fix: `Run: taskkill /F /PID ${pid}`
};
}
}
4. Startup Logs Analysis
async checkStartupLogs(): Promise<DiagnosticResult> {
const logs = this.daemonProcess?.stdout?.slice(-50) || [];
if (logs.some(line => line.includes('ModuleNotFoundError'))) {
return {
status: 'error',
message: 'Missing Python dependencies',
fix: 'Run: pip install -r requirements.txt'
};
}
if (logs.some(line => line.includes('SyntaxError'))) {
return {
status: 'error',
message: 'Syntax error in BranchPy code',
fix: 'Check recent code changes'
};
}
return { status: 'ok', message: 'No startup errors' };
}
User Experience
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π©Ί BranchPy Daemon Diagnostics β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β β
Python: 3.11.5 β
β β
BranchPy: 0.9.0 β
β β Port: 8766 in use by PID 12345 β
β Fix: taskkill /F /PID 12345 β
β β
Startup Logs: No errors β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β Recommendation: Kill conflicting process or use β
β a different port (e.g., 8767) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[Kill Process] [Use Port 8767] [Show Full Logs]
Auto-Recovery
Problem Solved
Before: User sees βCanβt reach this pageβ β must manually restart daemon β refresh webview
After: Webview detects failure β auto-restarts daemon β auto-reloads β user sees brief notification
Architecture
Webview Health Monitor (JavaScript)
β’ Polls GET /pilot/health every 10 seconds
β’ Tracks consecutive failures (threshold: 2)
β (2 failures detected)
vscode.postMessage({ type: 'restartDaemon', reason: 'HTTP 500' })
β
Extension: _restartDaemonAndNotify()
β’ Show notification: "π Auto-restarting server..."
β’ Execute: daemonService.restart()
β’ Wait 2 seconds for startup
β’ Send 'daemonRestarted' message to webview
β
Webview: window.location.reload()
β’ Fresh page load with new daemon
For detailed health check implementation and monitoring, see Technical/backend/observability.md.
Health Monitor Implementation (JavaScript)
function setupHealthMonitor() {
let failureCount = 0;
const MAX_FAILURES = 2;
const CHECK_INTERVAL = 10000; // 10 seconds
let isRecovering = false;
async function checkHealth() {
if (isRecovering) return;
try {
const response = await fetch('http://127.0.0.1:8766/pilot/health');
if (response.ok) {
failureCount = 0; // Reset on success
} else {
handleFailure(`HTTP ${response.status}`);
}
} catch (error) {
handleFailure(error.message);
}
}
function handleFailure(reason) {
failureCount++;
if (failureCount >= MAX_FAILURES) {
attemptAutoRecovery(reason);
}
}
function attemptAutoRecovery(reason) {
isRecovering = true;
// Notify extension to restart daemon
vscode.postMessage({ type: 'restartDaemon', reason });
// Lock recovery for 15 seconds (prevents loops)
setTimeout(() => {
isRecovering = false;
failureCount = 0;
}, 15000);
}
setInterval(checkHealth, CHECK_INTERVAL);
}
Restart Handler (TypeScript)
private async _restartDaemonAndNotify(reason: string): Promise<void> {
try {
// 1. User notification
void vscode.window.showInformationMessage(
`π Auto-restarting BranchPy server (${reason})...`
);
// 2. Execute restart
await vscode.commands.executeCommand('bpy.ws.restart');
// 3. Wait for startup
await new Promise(resolve => setTimeout(resolve, 2000));
// 4. Notify webview to reload
this._panel.webview.postMessage({ type: 'daemonRestarted' });
// 5. Success notification
void vscode.window.showInformationMessage(
'β
BranchPy server restarted successfully'
);
} catch (error) {
void vscode.window.showErrorMessage(
`β Failed to restart: ${error}`
);
}
}
Recovery Timeline
[00:00] User working normally
[00:05] Daemon crashes
[00:10] Health check #1 fails
[00:12] Status: "Server issue (1/2)"
[00:20] Health check #2 fails
[00:22] Auto-recovery triggered
[00:22] Notification: "π Auto-restarting..."
[00:23] Daemon stops (old process killed)
[00:24] Daemon starts (new process spawned)
[00:25] Webview reloads
[00:25] Notification: "β
Restarted successfully"
[00:26] User continues working β¨
Restart Policies
The daemon implements the following restart policies to ensure system stability:
Restart Threshold Policy
- Maximum restarts: 3 attempts
- Time window: 5 minutes
- Action on threshold: Disable auto-restart, notify user
Backoff Policy
- First restart: Immediate (0s delay)
- Second restart: 5s delay
- Third restart: 15s delay
- After threshold: Manual intervention required
Recovery Lock
- Duration: 15 seconds
- Purpose: Prevent restart loops during ongoing failures
- Behavior: Ignore health check failures while locked
Configuration
{
"branchpy.autoRecovery.enabled": true,
"branchpy.autoRecovery.maxFailures": 2,
"branchpy.autoRecovery.checkInterval": 10000,
"branchpy.autoRecovery.recoveryLockTimeout": 15000,
"branchpy.autoRecovery.maxRestarts": 3,
"branchpy.autoRecovery.restartWindow": 300000
}
Auto-Stop
Purpose
Gracefully stop daemon when VS Code closes to prevent orphaned processes.
Implementation Flow
User closes VS Code
β
VS Code: extension.deactivate()
β
daemonService.stop()
β’ Send SIGTERM to daemon process
β’ Wait 5 seconds for graceful exit
β’ Send SIGKILL if still running (force)
β’ Clean metadata files
β
Daemon exits cleanly
Graceful Stop (TypeScript)
async stop(): Promise<void> {
if (!this.daemonProcess) {
return; // Already stopped
}
const pid = this.daemonProcess.pid;
try {
// 1. Send SIGTERM (graceful shutdown)
process.kill(pid, 'SIGTERM');
// 2. Wait up to 5 seconds for exit
const exited = await this.waitForExit(pid, 5000);
if (!exited) {
// 3. Force kill if still running
console.warn(`PID ${pid} didn't exit gracefully, forcing...`);
process.kill(pid, 'SIGKILL');
}
// 4. Clean metadata
await this.cleanMetadata();
// 5. Update UI
this.updateStatusBar('stopped');
} catch (error) {
console.error(`Error stopping PID ${pid}:`, error);
} finally {
this.daemonProcess = null;
}
}
Daemon SIGTERM Handler (Python)
import signal
import sys
def sigterm_handler(signum, frame):
"""Graceful shutdown on SIGTERM"""
logger.info("[Daemon] Received SIGTERM, shutting down...")
# 1. Close active connections
if app:
app.shutdown()
# 2. Flush logs
logging.shutdown()
# 3. Exit cleanly
sys.exit(0)
# Register handler
signal.signal(signal.SIGTERM, sigterm_handler)
State Machine
ββββββββββββ
β STOPPED β β Initial state
ββββββββββββ
β [User opens feature]
βΌ
ββββββββββββ
β STARTING β β Auto-start triggered
ββββββββββββ
β [Health check passes]
βΌ
ββββββββββββ
β RUNNING β β Normal operation
ββββββββββββ
β β
β β [Crash detected]
β βΌ
β ββββββββββββ
β βRESTARTINGβ β Auto-recovery
β ββββββββββββ
β β
β βΌ
β ββββββββββββ
β β RUNNING β
β ββββββββββββ
β
β [VS Code closes]
βΌ
ββββββββββββ
β STOPPING β β Auto-stop triggered
ββββββββββββ
β
βΌ
ββββββββββββ
β STOPPED β β Clean exit
ββββββββββββ
Diagnostic Checks and Health Monitoring
Health Check Flow
The daemon implements comprehensive health checks at multiple levels:
Startup Health Checks
-
Python environment validation
- Python version β₯ 3.8
- Required packages installed
- Virtual environment activated (if configured)
-
Network availability
- Port binding successful
- No conflicting processes
- Firewall rules allow local connections
-
Resource availability
- Sufficient memory (minimum 100MB free)
- Disk space for logs and metadata
- File descriptor limits not exceeded
Runtime Health Checks
- Endpoint:
GET /pilot/health - Frequency: Every 10 seconds (configurable)
- Timeout: 5 seconds
- Expected response:
{ "status": "healthy", "uptime": 3600, "version": "1.1.1", "pid": 12345 }
Deep Health Checks (Periodic)
- Frequency: Every 60 seconds
- Checks:
- Memory usage within limits
- No deadlocked threads
- Database connections healthy
- Cache hit rates acceptable
- Request queue not backing up
For comprehensive observability metrics and monitoring dashboards, see Technical/backend/observability.md.
Troubleshooting
Issue: Daemon Wonβt Start
Symptoms: Webview shows βCanβt reach this pageβ, status bar shows red
Auto-Diagnose Checks:
- β
Python installed? β
python --version - β
BranchPy installed? β
python -m branchpy --version - β Port available? β Check for conflicts
- β
Startup logs? β
.branchpy/logs/ws_daemon.log
Manual Fix:
Command Palette β "BranchPy: Run Diagnostics"
For detailed logging configuration and log analysis, see Technical/logging/README.md.
Issue: Daemon Keeps Crashing
Symptoms: Auto-restart notifications every 30 seconds
Auto-Recover Behavior:
- After 3 failed restarts in 5 minutes β Disables auto-restart
- Shows: ββ Auto-restart disabled (too many failures)β
Manual Fix:
- Check Output panel: βBranchPy Serverβ logs
- Identify error (syntax error, missing dependency, etc.)
- Fix error in code
Command Palette β "BranchPy: Restart Daemon"
Common Causes:
- Python syntax errors in recent changes
- Missing or incompatible dependencies
- Port conflicts with other applications
- Insufficient system resources (memory, disk space)
- Corrupted metadata files
For comprehensive error analysis and debugging, see Technical/logging/README.md.
Issue: Orphaned Daemon After Crash
Symptoms: VS Code crashed, daemon still running, new window canβt start
Auto-Detect: On extension activation, checks for orphaned daemons:
const orphanedPid = await findOrphanedDaemon();
if (orphanedPid) {
const action = await vscode.window.showWarningMessage(
`Found orphaned daemon (PID ${orphanedPid})`,
'Kill Process', 'Adopt Process', 'Ignore'
);
if (action === 'Kill Process') {
process.kill(orphanedPid, 'SIGKILL');
}
}
Manual Fix:
# Windows
Get-Process python | Where-Object { $_.CommandLine -like '*branchpy*' } | Stop-Process -Force
# Linux/macOS
pkill -f "python.*branchpy"
Performance Targets
Latency Budget
| Operation | Target | Actual | P99 |
|---|---|---|---|
| Health check | <10ms | 5-8ms | 12ms |
| Auto-start | <3s | 1.5-2s | 2.8s |
| Auto-recovery | <5s | 3-4s | 5.5s |
| Auto-stop | <2s | 1s | 1.8s |
Resource Limits
| Resource | Minimum | Recommended | Maximum |
|---|---|---|---|
| Memory | 100MB | 256MB | 512MB |
| CPU (idle) | 0-1% | <2% | 5% |
| CPU (active) | 5-15% | <25% | 50% |
| Disk I/O | Minimal | <10MB/s | 50MB/s |
Metrics & Monitoring
Tracked Metrics
The daemon tracks comprehensive operational metrics for monitoring and diagnostics. For full metrics specification and dashboard implementation, see Technical/backend/observability.md.
Lifecycle Metrics
- Uptime: Seconds since last start
- Restart count: Total restarts in current session
- Failed health checks: Count in last 24 hours
- Recovery attempts: Successful vs. failed recoveries
- Graceful stops: vs. forced terminations
Performance Metrics
- Health check latency: P50, P95, P99
- Memory usage: RSS, heap size, resident set
- Request count: Total, per endpoint, per hour
- Error rate: 4xx and 5xx responses
- Active connections: Current WebSocket/HTTP connections
Diagnostic Metrics
- Startup failures: Count and reasons
- Port conflicts: Frequency and resolution
- Orphaned processes: Detection and cleanup rate
- Log volume: Bytes written per minute
- Exception rate: Unhandled exceptions per hour
Dashboard (Future UI)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β π BranchPy Daemon Health Dashboard β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ£
β Status: π’ RUNNING β
β Uptime: 2 hours 34 minutes β
β Restarts: 0 (last 24 hours) β
β Health: 100% (0 failed checks) β
β Response Time: 12ms (avg) β
β Memory: 145 MB / 512 MB (28%) β
β Active Connections: 3 WebSocket, 0 HTTP β
β Request Rate: 45 req/min β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Related Documentation
- ARCHITECTURE.md - System design and components
- PROTOCOL.md - Protocol specification
- PORT_MANAGEMENT.md - Port selection and conflicts
- VS_CODE_INTEGRATION_GUIDE.md - Extension integration
- Technical/backend/observability.md - Observability and monitoring
- Technical/logging/README.md - Logging configuration and analysis
Changelog
v1.1.1 (January 23, 2026)
- π Updated documentation to version 1.1.1
- π Added cross-references to observability and logging documentation
- π Enhanced metrics tracking specification
- π§ Clarified restart policies and thresholds
- β¨ Added deep health checks and resource limits
- π Updated performance targets with P99 latencies
v1.0.0 (2025-11-07)
- β Auto-start on feature use
- β Auto-diagnose on startup failure
- β Auto-recovery from crashes
- β Auto-stop on VS Code shutdown
- β Health monitoring with 10s intervals
- β User notifications for all lifecycle events
- β Orphaned daemon detection
Status: β
Production Ready
Last Updated: January 23, 2026
Source Reference
This document is a consolidated version of the lifecycle management documentation from BranchPy v1.1.0.
Original Source: docs/v1.1.0/Server/LIFECYCLE.md