Can TrueNAS backup a Proxmox host using ZFS replication?
As part of my series exploring backup options, I’m exploring the options for pulling a backup of a Proxmox Virtual Environment (PVE) host to TrueNAS SCALE server. In this case, PVE host has local ZFS storage, and the TrueNAS system is acting as the backup server. Ideally, PVE would snapshot in ZFS and we could sync those snapshots with a TrueNAS Data Replication task, but PVE doesn’t use the ZFS snapshot features by default. This complicates our setup somewhat. This test is part of the Backup Software Project, click there for other related projects.
There’s a video for this article! Click the thumbnail below to watch it.
Proxmox Internal Backup Scheduler⌗
Proxmox’s internal backup system relies on a qemu feature to export the VM disk coherently. When using the snapshot feature in the backup command, it will send a command to the qemu guest agent to tell the guest to synchronize disk activity (so ideally there aren’t in-progress writes on the VM disk), snapshot the ZFS zvol, then dump the contents of the snapshot to a compressed img file and delete the snapshot. This means that, even though the image is stored on a zvol in ZFS, we end up with a compressed img file in a dataset in ZFS, instead of a snapshot of the zvol in ZFS. Not ideal. Proxmox DOES create ZFS snapshots for the ‘snapshot’ feature (not the ‘backup’ method ‘snapshot’), but there doesn’t seem to be a way to schedule snapshots like you can schedule backups. But, I’d like ZFS snapshots to be created so TrueNAS can pull from them with zfs send/recv.
Using ZFS Snapshots on Proxmox⌗
While the GUI won’t help us, there’s no reason we can’t execute the same commands to sync the VM and then take a zfs snapshot. In fact, someone has already written a python script to do this. So, I’m going to use his script. I first tested the script standalone and confirmed that it does in fact snapshot the zvol, then added the hourly/daily/monthly cronjobs and ensured those worked as well. Thank you to apprell for this.
Pulling the snapshots from TrueNAS⌗
Now that we have PVE creating ZFS snapshots instead of files in the backup storage location, we can tell TrueNAS to pull them over SSH (using zfs send/recv). TrueNAS automates this almost entirely, which is super handy.
The first step is to create an SSH key pair which can be used to log in to the PVE system from the TrueNAS system. To do this, on TrueNAS, go in to
Credentials -> Backup Credentials -> SSH Connections and click
Add. Set the connection to
Manual since this isn’t a remote TrueNAS system, add the host, port, username, and generate a new private key. Then, you can discover the remote host key. Save. Now, go to
SSH Keypairs on the same screen, open the new keypair you generated, and copy the public key. I changed the name (truenas.local) for the IP address of the system, then copied the key.
We need to enter this key in PVE. Open the shell in the PVE web gui and run
nano /etc/pve/priv/authorized_keys. Add the new key on a new line to this file and save it. It should look like the lines before it.
Now, we can add the replication task in TrueNAS, to pull the snapshots which are automatically being taken on PVE and copy them to the ZFS system on TrueNAS. To create this, go to
Data Protection -> Replication Tasks -> Add in TrueNAS, and follow the wizard. Here are the important fields:
- ‘Load Previous Replication Task’ none
- Source Location = On A Different System
- SSH Connection = select the one you created earlier
- Source = rpool/data (the zfs parent where PVE creates all of the VMs / CTs)
- Check ‘Recursive’ under Source to copy all of the VMs / CTs
- Destination = a new dataset under an existing dataset on your existing pool - type the name in, it will create a new dataset with the correct settings for you
- Naming Convention = Snapshot Name Regular Expression
- Regular Expression = .* (meaning all)
- Task Name = something to help you remember
- Setup the schedule of your choosing (at least as fast as snapshots are being taken, if no new snapshots exist it will copy nothing)
- Deletion = Same as Source (the destination will delete a snapshot that no longer exists on the source)
One quirk I found is that I tried to replicate the rpool/data dataset in PVE (the parent dataset), after I had created snapshots of the VMs using the autosnap script above. However, TrueNAS refused to replicate recursively from rpool/data since it found no snapshots of rpool/data itself. To fix this, I created a single snapshot of rpool/data (
zfs snapshot rpool/data@truenas) and found that TrueNAS was able to successfully copy the snapshots for the child datasets/zvols, even when they were newer than my manually created data snapshot. Since ZFS snapshots only affect the snapshotted item (not children), this snapshot of an empty dataset shouldn’t contain any actual data and thus I can leave it around forever just to make TrueNAS happy.
This backup method is functional, but incomplete. It only backs up the VM / CT disks, not the PVE configuration and qemu config file. So, the minimum process to restore a VM to a brand new Proxmox system is:
- Create a new VM and configure it, then delete the hard drive. The configuration is lost with the ZFS backup, so you’ll need to recreate the entire VM configuration. Alternatively, you could restore an older Proxmox vzdump backup and then replace the hard drive with a newer copy from TrueNAS’s zfs snapshots, which gets you the VM configuration at the older point when the backup was taken and the VM disk at the newer point when the snapshot was taken.
- Create a new replication task to push from TrueNAS to the new Proxmox system, including only the VM disk you want to restore, with the destination of rpool/data/vm-xxx-disk-y, where xxx is the VM ID from Proxmox that you are restoring to (which can be different from the existing name) and where y is the disk number within Proxmox (usually 0).
- Force Proxmox to rescan the storage using
- You will find an ‘unused disk’ magically appears in the VM which matches the zvol you named in step 2, and you can add it as a hard disk and configure it
This method is functional, but incomplete.
It backs up the VM disks and CT datasets as snapshots in ZFS, and replicates the ZFS snapshots to the TrueNAS host. However, we don’t backup the Proxmox configuration metadata (including the qemu configuration file), so we would need to recreate the VM manually and replace the virtual drive with the archived zvol to retore it. You could mix this feature with Proxmox side backups, pushing compressed image files to TrueNAS over NFS/SMB, so you have periodic backups of the configuration in addition to more frequent snapshots of the disk images, but then you have two different types of backups managed in two places (pushed from Proxmox and pulled from TrueNAS) to deal with.
Like with the TrueNAS -> PBS setup, we had to do some minor console work to get this set up, as the feature isn’t accessible from the GUI. In theory it should work with TrueNAS CORE or SCALE (although I only tested this on SCALE).
How does it handle HA / clustered setups? You need to create a separate ssh connection for each node in the system, and a separate replication task for each node as well. Backups from each node will be in separate hierarchies on TrueNAS, and if VMs are moved between cluster nodes, the zvols on the old node will be orphaned in TrueNAS and need to be manually deleted. Likewise, if you outright delete a VM, the zvols will be orphaned in TrueNAS and also need to be manually deleted. So it could be worse, but it’s not a perfectly smooth experience.