Backup Implementation

I figured it's been a few weeks since I put my thoughts to keyboard regarding various backup solutions and since I've decided on a one and it's been in use for a week or two now it's time to write up what I went with.

The short answer is Restic, Arq and Rclone for software and Backblaze B2 for cloud storage.

Systems

I have multiple systems that need backed up:

  • My parents' laptop (at their house)
  • My parents' desktop (at their house)
  • My local fileserver
  • My laptop
  • My wife's surface pro
  • My two Linode servers

I also have a server that is used to store backups at my parents house. It doesn't do any backups itself, but will be used for syncing.

The ultimate goal for all backups is to have three backup locations for critical data in addition to the original source:

  1. Fileserver at my house
  2. Fileserver at my parents
  3. Cloud storage

Arq

This is probably the simplest to describe as it's almost an all in one solution.
Configuration:

  • sftp destination to the local fileserver at each location, using a per-laptop user account and strong password.
  • Email notification on errors using a gmail account I have on my domain for such purposes.
  • For my parents' systems - every 4 hours
  • For my wife's every hour
  • Prune hourly/daily/weekly/monthly option to hopefully limit the amount of space empty backups use.
  • All systems backup c:\users and c:\programdata. I spent a little bit of time removing things that generate a bunch changes even when the system isn't in use (e.g. Cache, Antivirus)

While using the local fileserver as a backup destination means the systems will not be backed up when outside of the house, it is very rare that those systems are used outside of the house.

Restic

I chose restic because it seemed to be the fastest solution that supported cloud destinations that I could find during my trials.

Originally I performed a full backup of my fileserver to Backblaze B2 but upon thinking about how I wanted to sync I decided to backup to a local path instead. So I spent a few hours downloading the archive using rclone. I then configured my laptop to backup to the same location over sftp. This means that the deduplication is effective when I move data from my laptop to my fileserver for permanent storage.

I then configured the restic rest-server for my two Linode servers. I made a simple modification the the server to enable append-only mode, meaning while the servers can create new backups they cannot remove or modify existing backups; providing "hack" protection of sorts as the credentials to access the backup are on the servers. To provide a simple implementation I added a sub-domain to one of my domains for backup and configured one of the servers to reverse proxy the connection over a vpn to my local fileserver. This means that the backups are stored on my local fileserver, but the destination is accessible at a fixed hostname over ssl (thank's letsencrypt).

After a few days I ran into an issue with my local fileserver and laptop backups due to linux permissions. The fileserver was using the root user for backup (yah, yah.. I'm lazy) but the laptop used a non-root user for stfp. I eventually configured two different rest-servers; one that listed on localhost and another that listens on the local interface in append only mode.

Scheduling

As restic is a simple app that does not implement its own scheduling I fell back on a simple cron script to do the backup. I first configured postfix to use mailgun using a tutorial on digitalocean. I then installed cronic to curb the amount of email as I only need notification when something fails. For my fileserver I just added a new line in cron to run my backup script, but for my Linode servers and laptop, I wanted to just drop a script into /etc/cron.daily to run. In order for the script to run through cronic I had to do a bit of magic at the start

in_cronic=$1
if [[ "$in_cronic" != 'in_cronic' ]]; then
	exec cronic $0 in_cronic
fi

set -xeu

rclone

The last feature I wanted was to sync the backups to the other locations. Using rclone I configured a Backblaze B2 endpoint, and then used the copy command to sync the backups. I might eventually switch to the sync command for the Arq destinations if I notice the space growing significantly.

Another feature of rclone I took advantage of is the ability to sync Google Drive. I configured endpoints for both my wife and my accounts and now sync them on the fileserver prior to performing the backup; never hurts to have extra copies of stuff.

One thing I ran into while trying to push the backup to my parents house was their slow internet speed; so I spent $70 on an external hard drive to copy the backup to and then snail-mail to my parents. This turned out to be way faster than doing the sync over the internet as the speed would be limited to approx 1Mbit/s. With my 1.3TB of data it would have taken over 100 days. Whereas I was able to get the data transferred to the hard drive and mailed and transferred off in less than a week. I then copied their data onto the hard drive and then they mailed it back to me.

Show Comments