Learn / Dev Notes /

How the Ren'Py media scanner is structured

March 2026 7 min read Architecture
Architecture Media scanning Ren'Py Classification

The media scanner in BranchPy can be described in four words: scan, classify, protect, review. This note explains what each step actually does, why it comes before the next, and what the consequences would be of doing them in a different order.

The short version: the ordering is the safety mechanism. It's not organisational tidiness.

Step 1 — Scan: build a complete file inventory

The first step has one job: enumerate every media file present in the project tree. It does not judge, rank, or filter. It produces a flat inventory: this file exists, at this path, with these properties.

At the same time, a second sub-process walks all .rpy files and builds a reference map — every path string found in image declarations, show, scene, play music, play sound, voice, and equivalent Python API calls. This reference set is separate from the file inventory. Neither depends on the other yet.

Why two separate passes Keeping the file inventory and the reference map separate means each can be used independently. A missing-file report doesn't need the full scan to finish. A reference that points to a file that doesn't exist yet is still a valid reference. Mixing the two passes would collapse information that's useful to keep distinct.

Step 2 — Classify: assign a status to every file

With both the file inventory and the reference map available, classification assigns a status to every known file. The possible statuses are:

  • Referenced — the file appears in the reference map.
  • Unreferenced — the file is in the inventory but not in the reference map.
  • Missing — the file appears in the reference map but not in the inventory (it's used but not present on disk).

Classification is intentionally mechanical at this stage. It doesn't yet ask whether an unreferenced file is safe to delete — that's the next step's job. The classifier just answers: does what's on disk match what the scripts expect?

What classification doesn't cover Script-level reference scanning has known limits. Assets loaded through dynamically constructed path strings, assets referenced only in config variables or style blocks, and assets loaded by built-in engine systems don't appear in the reference map. This is expected — they're handled in the protection step.

Step 3 — Protect: apply protection rules before results are shown

Protection is the step that makes the difference between a useful tool and a dangerous one. Before any result is presented to the user, every file in the unreferenced set is checked against a protection ruleset.

Files that match a protected path receive a distinct classification — PROTECTED — and are removed from the actionable set entirely. They appear in the results, but without a checkbox and without a reason to act on them.

Protected paths cover two categories:

  • Engine-managed paths: game/gui/, renpy/, common/, launcher/ — paths where Ren'Py's engine itself loads files, bypassing the script system.
  • Policy-protected paths: game/fonts/ and fonts/ — paths containing assets that may be loaded indirectly through config or style definitions, where the loading mechanism is less predictable.

Beyond the built-in rules, users can extend the protection list through branchpy.config.json using the protectedPaths key. If your project has a custom directory structure that needs to be excluded from cleanup candidates, this is how you tell the scanner.

Why protection happens before presentation If protection were applied at the UI layer — only hiding the checkbox for certain files — a bug in the UI could expose protected files as selectable. By stripping protected files from the actionable set before building the UI at all, a UI-layer failure can't escalate into an accidental deletion of an engine asset.

Step 4 — Review: present findings for human judgement

What reaches the Review step is already filtered. The user sees:

  • Files classified as Unreferenced that aren't protected — these are candidates for review.
  • Files classified as Missing — referenced in scripts but absent from disk.
  • Files classified as Protected — visible but not actionable, with a reason string.

Each unreferenced file shows the path, the file type, and — where available — contextual information about why it might be unreferenced (for example, GUI assets that look like alternates for an existing referenced file). This context exists to support the human reviewer, not to replace them.

Nothing is deleted from this view without a second confirmation. Selection and confirmation are separate steps: selecting files marks them as candidates; a distinct confirmation dialog lists them again before any filesystem operation occurs.

Why the order is the safety mechanism

The sequence matters precisely because each step's output is the next step's input. An error contained in one step does not cascade into the steps that follow:

  • A classification error (a protected file incorrectly marked unreferenced) is caught by the protect step — the file is reclassified before it reaches the UI.
  • A protect-step miss — a file that should be protected but isn't caught by the ruleset — is a known limitation category, not a silent failure. It surfaces in review where a human is present.
  • A UI bug that rendered the wrong files as selectable is caught by the delete handler, which independently validates the file set against the protection list before any operation proceeds.

Three independent layers must each fail in the same direction simultaneously to produce a false deletion. That's intentional.

The principle behind the design Each stage should be able to fail without making the next stage dangerous. Safety in a destructive pipeline isn't one good check — it's redundant checks at each transition point, each unaware of whether the previous one worked correctly.

Summary

  • Scan builds a complete file inventory and a reference map separately, then compares them.
  • Classify assigns a mechanical status to every known file based on that comparison.
  • Protect removes engine-managed and policy-protected files from the actionable set before any UI is rendered.
  • Review presents filtered, classified, contextual results for human decision — with a confirmation gate before any deletion.
  • The ordering is the safety mechanism. Each step's output shapes what the next step can do.