Releases: doutsis/vmbackup
Releases · doutsis/vmbackup
vmbackup 0.6.0 — Unification release
Unification release. vmbackup and vmrestore now ship from one source tree as one Debian package containing two binaries. Both binaries always carry the same version, share a single lib/ of cross-tool helpers, and read from a single SQLite catalogue. Existing flags, invocations, systemd units, and operator scripts continue to work unchanged. The old standalone vmrestore package is replaced cleanly on upgrade.
Added
- Unified package —
vmbackupandvmrestoreship from a single source tree, build via oneMakefile, and install from one.deb. The package declaresProvides: vmrestore,Replaces: vmrestore (<< 0.6.0), andConflicts: vmrestore (<< 0.6.0), soaptremoves the old standalonevmrestorepackage automatically on upgrade. Thevmrestorebinary continues to live at/usr/local/bin/vmrestore(symlink to/opt/vmbackup/vmrestore.sh). - Shared
lib/consumed by both binaries — 16 libraries now provide one canonical implementation of behaviour that was previously duplicated or divergent across the two tools: logging, exit codes, versioning, per-VM locking, signal traps, config-instance resolution, period handling, the backup-tree walker, the read-only SQLite reader, path and VM-name helpers, TPM artefact reading, and thevirtnbdbackup/virtnbdrestoreandvirshwrappers. Where a behaviour exists in both binaries it now comes from exactly one place, so they can no longer disagree by accident. vmrestoreis catalogue-aware —vmrestore --listreads the same SQLite catalogue that drivesvmbackup --status --chainsand appends a per-VMChains: <N> active, <N> broken, last backup <ISO>line. Falls back to walker-only output when the catalogue is unavailable, preserving the standalone DR contract.vmrestorewrites restore-session rows to the catalogue — schema bumped to v2.2 with a newrestore_sessionstable. Every invocation records start, end, VM, restore type, and final outcome. A newvmbackup --status --restoressubcommand reads the table, so backup and restore history live in one place. Catalogue failures degrade to a singleWARNand never block the restore;--dry-runwrites no row.vmrestoregains per-VM locking and signal handlers — restoring a VM now takes the same lock a backup of that VM takes, so backup and restore can no longer race each other. SIGINT and SIGTERM clean up staging directories and partial disk files.vmrestore --restore-pathoverlap guard — refuses any path that equals or sits inside a configuredvmbackupBACKUP_PATH(checked across all config instances), preventing accidental restores into the live backup tree.- Broken-chain detector for
vmrestore— incomplete chains (truncated by an interrupted backup, or partially archived) are no longer offered as the defaultlatestrestore target. The reason for skipping is logged so the operator can override with--include-incomplete(forensic use only). - In-session re-entry guard for chain archival —
vmbackuprefuses to archive the same VM twice within one invocation, eliminating collision-suffixed.archives/chain-<date>.1directories. - Misplaced-database guard —
vmbackuprefuses to create the SQLite catalogue inside.archives/or under a period directory, closing a class of bugs where a misconfigured backup path could spawn a second catalogue that silently diverged from the canonical one. vmbackup --cleanup-stale-manifests— one-shot subcommand that removes leftover per-VMchain-manifest.jsonfiles from the backup tree. Invoked automatically bydebian/postinston package upgrade and safe to re-run manually.
Changed
chain_health.archive_size_bytespopulated at archive transition — the retention-path archive caller now writes the archive size immediately, matching the active-path caller. Previously the column stayed at 0 until manual reconciliation.- TPM-restore reporting is now truthful — when disks restore successfully but TPM/BitLocker unlock fails,
vmrestoreno longer reports overall success. The summary line carries aTPM ✓/TPM ✗ (manual unlock required)token (omitted on VMs without TPM). - SharePoint replication verify logs actionable diagnostics on mismatch — when the post-upload
rclone checkreports a difference, the cloud transport now logs therclone checkexit code, elapsed time, and the specific differing/missing files, replacing the previous opaqueFound differencesmessage. Transient SharePoint verify warnings are now diagnosable instead of mysterious.
Fixed
- False-positive backup failures from substring
ERRORmatches in thevirtnbdbackuplog — the post-run guard used a case-insensitive substring match forERROR, which mis-flagged successful runs whenever the log mentionedinternal error,ERROR — trim not supported, or carried ANSI colour codes. False positives recorded the chain as failed and promoted the next monthly backup from incremental to full, inflating destination write volume. Now anchored tovirtnbdbackup's own log-line format and ANSI-stripped. Originally reported and proposed by @hostarts with co-author @houssamchergui. - Email notifier "intentionally skipped" return value logged as a delivery failure — all four
send_backup_reportcall sites collapsed the notifier's three return values (delivered / transport failure / intentionally skipped) into pass-or-fail, so operators who setEMAIL_ON_SUCCESS=nosaw a misleading "Failed to send email report" WARN on every successful run. A new_handle_notifier_rcdispatcher distinguishes the three cases and is wired into all four call sites. Originally proposed by @hostarts (email-only scope adopted). get_last_backup_timestamp()blind to archived chains — the probe'sfind -maxdepthwas too shallow to see archived data after the chain-archive layout change, so offline-unchanged VMs were treated as having no prior backup and re-ran a full backup nightly. Probe depth corrected; the offline-skip path now fires as intended.- False "incomplete backup" WARN on clean shutdown —
cleanup_on_exitemitted a misleading WARN on every clean exit because its duplicate-call gate was an in-memory flag the success path could not clear before the trap fired. The gate now uses thesqlite_session_end()return code itself. Independently reported by @hostarts in PR #4. - TPM artefact validation accepted empty bundles —
validate_tpm_backup()was-s-testing thetpm2/directory instead of its files, so an empty TPM bundle passed validation. Replaced with a per-file size check and a minimum-size floor ontpm2-00.permall. xmllintlisted as required but never invoked — phantom dependency removed.- Dead
restore_vm_tpm()body removed — had different semantics fromvmrestore'srestore_tpm()and would have corrupted a recovery if ever called. Already marked# DEAD CODE; now physically gone. - Undefined VM-name sanitisation helper in prune paths —
vmbackup's prune code paths called a helper that had never been defined, so the call was a silent no-op. Replaced by the canonical helper inlib/vm_name_utils.sh. - NVRAM/disk coherency on restore (
BdsDxe: No mappingboot failure) —vmrestorepaired restored disks with the live host NVRAM instead of the NVRAM captured at the backed-up checkpoint, so restoring an older period over a VM that had since rebooted left UEFI variables (SecureBoot keys,BootOrder, MOK) out of step with the disks and the guest failed to boot. It now pairs each restore with the matching checkpoint's NVRAM — clones and in-place alike — backing up the live NVRAM to<path>.before-restore.<timestamp>first. chain_check_completefalse-positive on chains containing CD-ROM devices — the completeness check treated every<disk>in the libvirt checkpoint XML as a data disk, but that XML carries nodevice=attribute, so CD-ROMs (whichvirtnbdbackupcorrectly skips) were indistinguishable from genuinely missing disks — flagging healthy chains⚠ INCOMPLETEin--list-restore-pointsand refusing them without--include-incomplete. The check now consults the per-checkpoint domain XML snapshot, which preservesdevice='cdrom', and skips those phantom targets. Chains without that snapshot keep prior behaviour.- Stale
chain_idrecorded on SIGTERM / SIGINT — interrupted backups wrote achain_healthrow whosechain_idwas derived from an in-memory index that had never been committed to disk, so the interrupted-chain entry could not be correlated with anything retention or restore could see. The id is now derived from the on-disk chain layout, so the row matches the chain that actually exists. vmrestoreskipped valid restore points on large backup trees (SIGPIPE underpipefail) — the chain-presence probe usedfind … | grep -q .; withset -o pipefailnow globally enabled,grep -qclosed the pipe on the first match and the resulting SIGPIPE madefindexit non-zero, sohas_backup_data()wrongly returned false. Rewritten tofind … -print -quit. The same pipefail-vulnerable idiom was audited and fixed everywhere it occurred (acrossvmbackup.sh,lib/chain_validation.sh, and an integration test)._state/logs/rotation never ran; central logs grew unbounded — the rotation routine was gated behind a directory that no code path ever created, so it had been dead since the early-2026 modular refactor: per-VM, replication, and email logs accumulated indefinitely, andvmbackup.log/vmprune.loggrew append-only forever. Rotation now runs at most once per calendar day from the pre-backup hook, and the central logs are size-capped by a newLOG_MAX_BYTESknob (default 50 MiB) — an oversized file is rolled to<name>.<epoch>and aged out under the existingLOG_KEEP_DAYSrule. Deployed installs inherit the default with no config change; the first post-upgrade session clears...
v0.5.6
Changed
- Structured exit codes — categorised non-zero exits (config / lock / storage / VM / tool / CLI / dependency) let monitoring distinguish why a run failed without parsing logs. Symmetric with vmrestore.
Fixed
- Retention not enforced for skipped or excluded VMs — Retention was wired only to the post-backup success path, so any VM that was skipped (
SKIP_OFFLINE_UNCHANGED_BACKUPS=true) or excluded (policy=never) accumulated period directories indefinitely with no rotation. The same code path also created the period directory viamkdir -pbefore deciding whether the backup would run, leaving an empty stub on disk every time a VM was skipped, excluded, or failed before first write. Combined effect on production: VMs atRETENTION_WEEKS=4carrying 8+ weekly directories, including pure stubs that no later session would ever clean up. Retention is now invoked from the skip and exclude paths via the newrun_retention_for_unbacked_vmwrapper, which orders stub cleanup before retention so the period count is correct before the limit check runs. Excluded VMs (policy=never) have stubs removed but their populated periods are preserved by the policy short-circuit. Failed-path retention remains intentionally suppressed; failed-path stubs are reaped on the next non-failed session.
Added
- Stub-aware retention pipeline for unbacked VMs — A new
run_retention_for_unbacked_vmwrapper inmodules/retention_module.shruns stub cleanup → retention → orphan retention in that order whenever a VM is skipped or excluded, so the on-disk period count is correct before the limit check fires. Stub cleanup is performed by the new_remove_empty_period_dirshelper, which removes pure stub directories (zero*.data, no.archives/) and is anchored toBACKUP_PATHwith a path-shape regex guard, deliberately bypassing_remove_period's keep-last, replication, and protected-period guards (all inappropriate for empty directories). Stub deletions in SQLite go through a new UPDATE-only library functionsqlite_mark_chain_deleted_if_exists(inlib/sqlite_module.sh) to avoid injecting phantomactive-then-deletedchain_healthrows when a pure stub never had a row to begin with.
Changed
retention_eventsaudit attribution — Field 12 (triggered_by) no longer carries hardcoded function-name literals; it now records the high-level event that drove the prune, with new enum valuesskipped,excluded, andorphan_dirjoining the existingpost_backup,prune, andorphan_retention. Theactioncolumn also gainsremove_stubfor the new stub-cleanup path. Internally,_remove_period,_remove_orphan_period,_remove_archive_chain,_remove_archives_in_period,_remove_vm_all,run_retention_for_vm, andrun_orphan_retention_for_vmall gain a newtriggerparameter so the originating event propagates through the call chain into the audit row — making it possible to attribute retention activity to skipped-VM and excluded-VM sessions for the first time.
v0.5.5
Added
- Configurable backup-destination space thresholds — Four new optional
vmbackup.confsettings (DISK_ABORT_PCT,DISK_WARN_PCT,DISK_ABORT_GB,DISK_WARN_GB) letcheck_disk_space()be tuned per instance. Percent and absolute thresholds are evaluated together so either can fire independently; setting any threshold to0disables it. Defaults (20%/30% and 10 GB/50 GB) preserve previous behaviour. - Disk-space snapshot per session (schema v2.1) —
sessionstable gainsdisk_free_bytesanddisk_total_bytescolumns, populated bysqlite_session_end()from adfcapture againstBACKUP_PATH. Migration from v2.0 is automatic, idempotent and additive. --statusreporting command — Seven report modes: sessions, VM history, failures, replication, chains, storage, policies. Terminal tables by default,--csvfor export,--days Nfor time window,--all-instancesto span every config instance. Sessions output is job-type-aware (backup / prune / replicate-only / mixed) and scoped to the activeCONFIG_INSTANCEby default. The storage report includes per-VM size trends and a destination-growth projection that names the configuredDISK_ABORT_PCTthreshold.- Post-upgrade config advisory in
postinst— On dpkg upgrade, lists.dpkg-distfiles awaiting merge with per-filediff -ucommands and points custom config instances atconfig/template/vmbackup.conffor new variables. Visible only on upgrade. vmbackup --config-prune-removed— Cleanup helper that comments out configuration variables removed in the running release. Idempotent; supports--dry-run. Operates ondefault/and all custom instances; skipstemplate/. Per-name allowlist keyed to release version, designed to be extended by future config-pruning ENHs.
Fixed
- Pre-flight aborts failed silently with no email —
check_backup_destination(),check_scratch_path()andcheck_disk_space()exit beforemain()reaches its normal email send, so destination/scratch/space failures left no notification (only a journal entry).cleanup_on_exit()now sends a failure report on any non-zero exit once a SQLite session has been registered, gated by_EMAIL_SENTso the existing success/failure path remains the single source of truth on normal runs. SKIP_OFFLINE_UNCHANGED_BACKUPSis now honoured — Previously the variable was defined and validated but never read; offline-unchanged VMs were always skipped regardless of the setting. The change-detection call inbackup_vm()is now gated by this flag.
Removed
OFFLINE_CHANGE_DETECTION_THRESHOLD— Was never read by the change-detection code (which uses strictmtime > last_backup). The variable inverted the safe default and would have introduced false negatives if implemented. Existing values in operator configs are inert; runvmbackup --config-prune-removedto clean them up.EMAIL_INCLUDE_REPLICATION— Was never read. Hiding replication results from the email is operator-hostile (silent on success, dangerous on failure). The empty-section logic already handles the no-replication case.EMAIL_INCLUDE_DISK_SPACE— Was never read; gated a section that was never built. A real disk-usage email section is tracked as ENH-16.
v0.5.4
Fixed
- SQLite session not finalised on normal exit — Sessions could be left permanently "in progress" in the database. Now finalised unconditionally in
cleanup_on_exit()with idempotency guard. - Silent permission failures on backup path —
chown/chmodfailures now log warnings instead of being silently suppressed with|| true. - Stale lock cleanup could delete active locks — Now validates PID liveness before deletion and uses correct
vmbackup-*.lockglob. - Session PID lock race condition — Replaced non-atomic check-then-write with
noclobberpattern. - Double email on SIGTERM — Added
_EMAIL_SENTguard flag. - virtnbdbackup not confirmed dead before retry — Added
pgrep/pkillcleanup andvirsh domjobabortbefore retry. - Reorder config-instance validation before session lock —
--config-instance nonexistentnow fails immediately at startup.
Install
wget https://github.com/doutsis/vmbackup/releases/download/v0.5.4/vmbackup_0.5.4_all.deb
sudo dpkg -i vmbackup_0.5.4_all.debFull changelog: CHANGELOG.md
v0.5.3
Added
--runflag required to start backups — explicit mode for all operations--vmtargeted backup mode — back up specific VMs on demand- Unknown flag detection and
--cancel-replicationconflict guards - Root privilege check with clear error message
- Global session lock to prevent concurrent invocations
session_typecolumn in SQLite sessions table (schema v2.0)
Changed
SKIP_OFFLINE_UNCHANGED_BACKUPSdefault changed totrue--helpoutput restructured- Systemd service updated with
--runflag - Documentation rewritten — condensed from ~4,500 to ~2,900 lines
Removed
- Host Configuration Backup feature
Fixed
RETENTION_ORPHAN_DRY_RUNconfig setting was being ignored
See CHANGELOG.md for full details.
v0.5.2
See CHANGELOG.md for details.
v0.5.1
Fixed
- chain_health off-by-one —
total_checkpointsandrestorable_countwere 0 after first backup instead of 1 - restore_points counted per-disk instead of per-backup — multi-disk VMs reported 3× the correct count
- csv_ variable name remnants — 25 stale variable names and dead CSV cleanup code removed
- Archived chains missing vmconfig XML and TPM marker — chain archives were incomplete; now self-contained
See CHANGELOG.md for details.