# BranchPy Server Lifecycle Management
Version: 1.1.1
Last Updated: January 23, 2026
Status: Production
Audience: Developers, Extension Maintainers
Overview
The BranchPy daemon features fully automated lifecycle management requiring zero manual intervention:
- ✅ Auto-Start - Daemon starts automatically when needed
- ✅ Auto-Diagnose - Startup failures provide actionable diagnostics
- ✅ Auto-Recovery - Crashes trigger automatic restart and reload
- ✅ Auto-Stop - Clean shutdown when VS Code closes
Core Philosophy: Users should never think about server management. Everything “just works.”
For comprehensive observability and monitoring details, see Technical/backend/observability.md. For logging configuration and troubleshooting, see Technical/logging/README.md.
Auto-Start
When It Happens
- User opens any BranchPy feature (PILOT, Compare Lanes, Semantics)
- Extension detects daemon is not running
- Daemon spawns automatically in background
- Health check confirms server ready
- Feature loads seamlessly
Implementation Flow
User Action (Click PILOT)
↓
Extension: ensureRunning()
↓
Check metadata file: ~/.branchpy/default/wsd.meta.json
↓ (if not found or PID not alive)
Start daemon: python branchpy/ws/run.py
↓
Wait for health check: GET /pilot/health
↓ (max 10 seconds)
Server ready → proceed with UI
Code Example (TypeScript)
async ensureRunning(): Promise<boolean> {
// 1. Quick check: already running?
if (this.isRunning()) {
return true;
}
// 2. Start daemon if not running
const started = await this.start();
if (!started) {
return false;
}
// 3. Wait for health check (max 10 seconds)
const healthy = await this.waitForHealth(10000);
return healthy;
}
Configuration
{
"branchpy.daemon.autoStart": true, // default
"branchpy.daemon.host": "127.0.0.1",
"branchpy.daemon.port": 8765
}
Auto-Diagnose
Purpose
When daemon startup fails, auto-diagnose runs a series of checks to identify the root cause and provide actionable guidance. Diagnostic results are automatically logged for analysis. For detailed logging configuration, see Technical/logging/README.md.
Diagnostic Checks
1. Python Availability
async checkPython(): Promise<DiagnosticResult> {
const result = await exec('python --version');
const version = result.stdout.match(/Python (\d+\.\d+\.\d+)/)?.[1];
if (!version) {
return { status: 'error', message: 'Python not found in PATH' };
}
const [major, minor] = version.split('.').map(Number);
if (major < 3 || (major === 3 && minor < 8)) {
return {
status: 'error',
message: `Python ${version} too old (need ≥ 3.8)`
};
}
return { status: 'ok', message: `Python ${version}` };
}
2. BranchPy Installation
async checkBranchPy(): Promise<DiagnosticResult> {
try {
const result = await exec('python -m branchpy --version');
return { status: 'ok', message: `BranchPy ${result.stdout.trim()}` };
} catch {
return {
status: 'error',
message: 'BranchPy not installed (run: pip install branchpy)'
};
}
}
3. Port Availability
async checkPort(port: number): Promise<DiagnosticResult> {
try {
const server = net.createServer();
await new Promise((resolve, reject) => {
server.listen(port, () => resolve(true));
server.on('error', reject);
});
server.close();
return { status: 'ok', message: `Port ${port} available` };
} catch {
const pid = await findProcessOnPort(port);
return {
status: 'error',
message: `Port ${port} in use by PID ${pid}`,
fix: `Run: taskkill /F /PID ${pid}`
};
}
}
4. Startup Logs Analysis
async checkStartupLogs(): Promise<DiagnosticResult> {
const logs = this.daemonProcess?.stdout?.slice(-50) || [];
if (logs.some(line => line.includes('ModuleNotFoundError'))) {
return {
status: 'error',
message: 'Missing Python dependencies',
fix: 'Run: pip install -r requirements.txt'
};
}
if (logs.some(line => line.includes('SyntaxError'))) {
return {
status: 'error',
message: 'Syntax error in BranchPy code',
fix: 'Check recent code changes'
};
}
return { status: 'ok', message: 'No startup errors' };
}
User Experience
â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
║ 🩺 BranchPy Daemon Diagnostics ║
â• â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•£
║ ✅ Python: 3.11.5 ║
║ ✅ BranchPy: 0.9.0 ║
║ ⌠Port: 8766 in use by PID 12345 ║
â•‘ Fix: taskkill /F /PID 12345 â•‘
║ ✅ Startup Logs: No errors ║
â• â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•£
â•‘ Recommendation: Kill conflicting process or use â•‘
â•‘ a different port (e.g., 8767) â•‘
╚â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•
[Kill Process] [Use Port 8767] [Show Full Logs]
Auto-Recovery
Problem Solved
Before: User sees “Can’t reach this page” → must manually restart daemon → refresh webview
After: Webview detects failure → auto-restarts daemon → auto-reloads → user sees brief notification
Architecture
Webview Health Monitor (JavaScript)
• Polls GET /pilot/health every 10 seconds
• Tracks consecutive failures (threshold: 2)
↓ (2 failures detected)
vscode.postMessage({ type: 'restartDaemon', reason: 'HTTP 500' })
↓
Extension: _restartDaemonAndNotify()
• Show notification: "🔄 Auto-restarting server..."
• Execute: daemonService.restart()
• Wait 2 seconds for startup
• Send 'daemonRestarted' message to webview
↓
Webview: window.location.reload()
• Fresh page load with new daemon
For detailed health check implementation and monitoring, see Technical/backend/observability.md.
Health Monitor Implementation (JavaScript)
function setupHealthMonitor() {
let failureCount = 0;
const MAX_FAILURES = 2;
const CHECK_INTERVAL = 10000; // 10 seconds
let isRecovering = false;
async function checkHealth() {
if (isRecovering) return;
try {
const response = await fetch('http://127.0.0.1:8766/pilot/health');
if (response.ok) {
failureCount = 0; // Reset on success
} else {
handleFailure(`HTTP ${response.status}`);
}
} catch (error) {
handleFailure(error.message);
}
}
function handleFailure(reason) {
failureCount++;
if (failureCount >= MAX_FAILURES) {
attemptAutoRecovery(reason);
}
}
function attemptAutoRecovery(reason) {
isRecovering = true;
// Notify extension to restart daemon
vscode.postMessage({ type: 'restartDaemon', reason });
// Lock recovery for 15 seconds (prevents loops)
setTimeout(() => {
isRecovering = false;
failureCount = 0;
}, 15000);
}
setInterval(checkHealth, CHECK_INTERVAL);
}
Restart Handler (TypeScript)
private async _restartDaemonAndNotify(reason: string): Promise<void> {
try {
// 1. User notification
void vscode.window.showInformationMessage(
`🔄 Auto-restarting BranchPy server (${reason})...`
);
// 2. Execute restart
await vscode.commands.executeCommand('bpy.ws.restart');
// 3. Wait for startup
await new Promise(resolve => setTimeout(resolve, 2000));
// 4. Notify webview to reload
this._panel.webview.postMessage({ type: 'daemonRestarted' });
// 5. Success notification
void vscode.window.showInformationMessage(
'✅ BranchPy server restarted successfully'
);
} catch (error) {
void vscode.window.showErrorMessage(
`⌠Failed to restart: ${error}`
);
}
}
Recovery Timeline
[00:00] User working normally
[00:05] Daemon crashes
[00:10] Health check #1 fails
[00:12] Status: "Server issue (1/2)"
[00:20] Health check #2 fails
[00:22] Auto-recovery triggered
[00:22] Notification: "🔄 Auto-restarting..."
[00:23] Daemon stops (old process killed)
[00:24] Daemon starts (new process spawned)
[00:25] Webview reloads
[00:25] Notification: "✅ Restarted successfully"
[00:26] User continues working ✨
Restart Policies
The daemon implements the following restart policies to ensure system stability:
Restart Threshold Policy
- Maximum restarts: 3 attempts
- Time window: 5 minutes
- Action on threshold: Disable auto-restart, notify user
Backoff Policy
- First restart: Immediate (0s delay)
- Second restart: 5s delay
- Third restart: 15s delay
- After threshold: Manual intervention required
Recovery Lock
- Duration: 15 seconds
- Purpose: Prevent restart loops during ongoing failures
- Behavior: Ignore health check failures while locked
Configuration
{
"branchpy.autoRecovery.enabled": true,
"branchpy.autoRecovery.maxFailures": 2,
"branchpy.autoRecovery.checkInterval": 10000,
"branchpy.autoRecovery.recoveryLockTimeout": 15000,
"branchpy.autoRecovery.maxRestarts": 3,
"branchpy.autoRecovery.restartWindow": 300000
}
Auto-Stop
Purpose
Gracefully stop daemon when VS Code closes to prevent orphaned processes.
Implementation Flow
User closes VS Code
↓
VS Code: extension.deactivate()
↓
daemonService.stop()
• Send SIGTERM to daemon process
• Wait 5 seconds for graceful exit
• Send SIGKILL if still running (force)
• Clean metadata files
↓
Daemon exits cleanly
Graceful Stop (TypeScript)
async stop(): Promise<void> {
if (!this.daemonProcess) {
return; // Already stopped
}
const pid = this.daemonProcess.pid;
try {
// 1. Send SIGTERM (graceful shutdown)
process.kill(pid, 'SIGTERM');
// 2. Wait up to 5 seconds for exit
const exited = await this.waitForExit(pid, 5000);
if (!exited) {
// 3. Force kill if still running
console.warn(`PID ${pid} didn't exit gracefully, forcing...`);
process.kill(pid, 'SIGKILL');
}
// 4. Clean metadata
await this.cleanMetadata();
// 5. Update UI
this.updateStatusBar('stopped');
} catch (error) {
console.error(`Error stopping PID ${pid}:`, error);
} finally {
this.daemonProcess = null;
}
}
Daemon SIGTERM Handler (Python)
import signal
import sys
def sigterm_handler(signum, frame):
"""Graceful shutdown on SIGTERM"""
logger.info("[Daemon] Received SIGTERM, shutting down...")
# 1. Close active connections
if app:
app.shutdown()
# 2. Flush logs
logging.shutdown()
# 3. Exit cleanly
sys.exit(0)
# Register handler
signal.signal(signal.SIGTERM, sigterm_handler)
State Machine
┌──────────â”
│ STOPPED │ ↠Initial state
└──────────┘
│ [User opens feature]
â–¼
┌──────────â”
│ STARTING │ ↠Auto-start triggered
└──────────┘
│ [Health check passes]
â–¼
┌──────────â”
│ RUNNING │ ↠Normal operation
└──────────┘
│ │
│ │ [Crash detected]
│ ▼
│ ┌──────────â”
│ │RESTARTING│ ↠Auto-recovery
│ └──────────┘
│ │
│ ▼
│ ┌──────────â”
│ │ RUNNING │
│ └──────────┘
│
│ [VS Code closes]
â–¼
┌──────────â”
│ STOPPING │ ↠Auto-stop triggered
└──────────┘
│
â–¼
┌──────────â”
│ STOPPED │ ↠Clean exit
└──────────┘
Diagnostic Checks and Health Monitoring
Health Check Flow
The daemon implements comprehensive health checks at multiple levels:
Startup Health Checks
-
Python environment validation
- Python version ≥ 3.8
- Required packages installed
- Virtual environment activated (if configured)
-
Network availability
- Port binding successful
- No conflicting processes
- Firewall rules allow local connections
-
Resource availability
- Sufficient memory (minimum 100MB free)
- Disk space for logs and metadata
- File descriptor limits not exceeded
Runtime Health Checks
- Endpoint:
GET /pilot/health - Frequency: Every 10 seconds (configurable)
- Timeout: 5 seconds
- Expected response:
{ "status": "healthy", "uptime": 3600, "version": "1.1.1", "pid": 12345 }
Deep Health Checks (Periodic)
- Frequency: Every 60 seconds
- Checks:
- Memory usage within limits
- No deadlocked threads
- Database connections healthy
- Cache hit rates acceptable
- Request queue not backing up
For comprehensive observability metrics and monitoring dashboards, see Technical/backend/observability.md.
Troubleshooting
Issue: Daemon Won’t Start
Symptoms: Webview shows “Can’t reach this page”, status bar shows red
Auto-Diagnose Checks:
- ✅ Python installed? →
python --version - ✅ BranchPy installed? →
python -m branchpy --version - ✅ Port available? → Check for conflicts
- ✅ Startup logs? →
.branchpy/logs/ws_daemon.log
Manual Fix:
Command Palette → "BranchPy: Run Diagnostics"
For detailed logging configuration and log analysis, see Technical/logging/README.md.
Issue: Daemon Keeps Crashing
Symptoms: Auto-restart notifications every 30 seconds
Auto-Recover Behavior:
- After 3 failed restarts in 5 minutes → Disables auto-restart
- Shows: “⌠Auto-restart disabled (too many failures)”
Manual Fix:
- Check Output panel: “BranchPy Server” logs
- Identify error (syntax error, missing dependency, etc.)
- Fix error in code
Command Palette → "BranchPy: Restart Daemon"
Common Causes:
- Python syntax errors in recent changes
- Missing or incompatible dependencies
- Port conflicts with other applications
- Insufficient system resources (memory, disk space)
- Corrupted metadata files
For comprehensive error analysis and debugging, see Technical/logging/README.md.
Issue: Orphaned Daemon After Crash
Symptoms: VS Code crashed, daemon still running, new window can’t start
Auto-Detect: On extension activation, checks for orphaned daemons:
const orphanedPid = await findOrphanedDaemon();
if (orphanedPid) {
const action = await vscode.window.showWarningMessage(
`Found orphaned daemon (PID ${orphanedPid})`,
'Kill Process', 'Adopt Process', 'Ignore'
);
if (action === 'Kill Process') {
process.kill(orphanedPid, 'SIGKILL');
}
}
Manual Fix:
# Windows
Get-Process python | Where-Object { $_.CommandLine -like '*branchpy*' } | Stop-Process -Force
# Linux/macOS
pkill -f "python.*branchpy"
Performance Targets
Latency Budget
| Operation | Target | Actual | P99 |
|---|---|---|---|
| Health check | <10ms | 5-8ms | 12ms |
| Auto-start | <3s | 1.5-2s | 2.8s |
| Auto-recovery | <5s | 3-4s | 5.5s |
| Auto-stop | <2s | 1s | 1.8s |
Resource Limits
| Resource | Minimum | Recommended | Maximum |
|---|---|---|---|
| Memory | 100MB | 256MB | 512MB |
| CPU (idle) | 0-1% | <2% | 5% |
| CPU (active) | 5-15% | <25% | 50% |
| Disk I/O | Minimal | <10MB/s | 50MB/s |
Metrics & Monitoring
Tracked Metrics
The daemon tracks comprehensive operational metrics for monitoring and diagnostics. For full metrics specification and dashboard implementation, see Technical/backend/observability.md.
Lifecycle Metrics
- Uptime: Seconds since last start
- Restart count: Total restarts in current session
- Failed health checks: Count in last 24 hours
- Recovery attempts: Successful vs. failed recoveries
- Graceful stops: vs. forced terminations
Performance Metrics
- Health check latency: P50, P95, P99
- Memory usage: RSS, heap size, resident set
- Request count: Total, per endpoint, per hour
- Error rate: 4xx and 5xx responses
- Active connections: Current WebSocket/HTTP connections
Diagnostic Metrics
- Startup failures: Count and reasons
- Port conflicts: Frequency and resolution
- Orphaned processes: Detection and cleanup rate
- Log volume: Bytes written per minute
- Exception rate: Unhandled exceptions per hour
Dashboard (Future UI)
â•”â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•—
║ 🔠BranchPy Daemon Health Dashboard ║
â• â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•£
║ Status: 🟢 RUNNING ║
â•‘ Uptime: 2 hours 34 minutes â•‘
â•‘ Restarts: 0 (last 24 hours) â•‘
â•‘ Health: 100% (0 failed checks) â•‘
â•‘ Response Time: 12ms (avg) â•‘
â•‘ Memory: 145 MB / 512 MB (28%) â•‘
â•‘ Active Connections: 3 WebSocket, 0 HTTP â•‘
â•‘ Request Rate: 45 req/min â•‘
╚â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•
Related Documentation
- ARCHITECTURE.md - System design and components
- PROTOCOL.md - Protocol specification
- PORT_MANAGEMENT.md - Port selection and conflicts
- VS_CODE_INTEGRATION_GUIDE.md - Extension integration
- Technical/backend/observability.md - Observability and monitoring
- Technical/logging/README.md - Logging configuration and analysis
Changelog
v1.1.1 (January 23, 2026)
- 📠Updated documentation to version 1.1.1
- 🔗 Added cross-references to observability and logging documentation
- 📊 Enhanced metrics tracking specification
- 🔧 Clarified restart policies and thresholds
- ✨ Added deep health checks and resource limits
- 📈 Updated performance targets with P99 latencies
v1.0.0 (2025-11-07)
- ✅ Auto-start on feature use
- ✅ Auto-diagnose on startup failure
- ✅ Auto-recovery from crashes
- ✅ Auto-stop on VS Code shutdown
- ✅ Health monitoring with 10s intervals
- ✅ User notifications for all lifecycle events
- ✅ Orphaned daemon detection
Status: ✅ Production Ready
Last Updated: January 23, 2026
Source Reference
This document is a consolidated version of the lifecycle management documentation from BranchPy v1.1.0.
Original Source: docs/v1.1.0/Server/LIFECYCLE.md