Backup software strategies

I am curious about your backups and software used.

Is everybody in an rsync world?

What about data in motion? such as databases and VM’s?

Few options out of the box:

rsync - tried, true, works!

Veeam - Community edition allows backups of 10 targets. I use this for my “core” VM’s that I really would hate to build from scratch (looking at you M$oft)

SNAPSHOTS!!! - 10000% something you should familiarize yourself with if using ZFS. They’re wonderful, can be scheduled, and have saved my butt a few times. If you don’t have snapshots configured, do you really have ZFS?!

Popular Open Source:

Duplicati - I’ve heard praises on this one. Alex (The Badger), from Jupiter Broadcasting’s
“Self-Hosted” Podcast, constantly speaks highly of it. IIRC he’s using it backup from England to the US. (If you don’t know of the Self-Hosted podcast and the stories of Lady Jupes, you’re missing out!)

SyncThing - One I need to get my head around. I believe this one is a really nice toolset and front end for rsync.

Alternative Solutions:
Don’t Backup!!!
Yup, you heard me right! Your precious VM’s can and really should be ephemeral. Let em die! Save your container configs and Docker Compose files to a Git repo of your choosing and just backup your actual data you care about!
Or, take a peek at a few guides and videos from our very own @geerlingguy and learn some Ansible! I spent a week or so learning it, wrote some playbooks, and used it to image, configure, and deploy almost two dozen Dell servers and their ESXi config. From the OOBM configs, ESXi install, and joining of our VMware environment. I now can use that again whenever one blows up or we add a new branch!

1 Like

https://zrepl.github.io/

I utilize a mixture of zrepl and rclone. Most of my data is stored on ZFS but a few cloud services are used like Microsoft’s OneDrive. For cloud services, I have rclone running in a cronjob which synchronizes the contents down into ZFS meaning my pools contain all of our online data.

From there -

  1. All of my “Tier 1” content exists locally on two different NAS systems (via zrepl), is snapshotted regularly and point-in-time recoverable (typically my “RPO”, or recovery point objective is a few hours of data loss maximum, and I can whack “Snapshot Now!” if I do something very consequential). Everything is also encrypted and copied offsite via rclone, typically into Dropbox. (This could mean I download from OneDrive, store in ZFS, and then copy back up into Dropbox!). Additionally, I’ve got an LTO-9 drive, and I keep tape-based copies of this data but I usually only bust out a tape and write to it every 2-3 months, I’m really relying on all the other layers of protection :slight_smile: In terms of scale this is 10TiB of data today, grows ~1TiB per year.

  2. All of my “Tier 2” content exists locally and has two copies on different NAS systems (more zrepl) but I don’t ship this offsite to Dropbox. It’s snapshotted. I do also keep this on LTO. This is about 30TiB data and Tier 2 tends to grow unpredictably but usually 2-5TiB more per year.

  3. I have a “Tier 3” which is data that never changes, i.e. rips of movies. These sit on RAID-Z2 vdevs but no replication between NAS, no offsite story. For much of this, I’ll move the rip onto an LTO tape whenever I get around to it. This is 400TiB today and tends to grow by about 100TiB yearly.

That’s it, and Tier 3 is mostly why I’m building 3x HL15s each with 6x 6.4TB NVMe and 15x 20TB SATA. Not having to think too much about storage expansion until 2026 is a goal! <3

All our Macs are using Time Machine and fall into Tier 2 as far as how the data is managed. I don’t use Ansible on these, and I’m really picky about how my Mac is setup, but recovery from Time Machine (even onto new hardware) is super straightforward.

There’s no complicated backup story like Veeam because outside Macs, I tend to just containerize everything (no virtual machines) and it’s all extremely recoverable – some container volumes like Plex’s database is managed as Tier 2, and something like Loki and ElasticSearch is managed as Tier 3
because losing that is no big deal to me, it’s mostly only useful for hours/days after if I need to go look at a firewall log or see why something crashed/rebooted.

Oh, tapes are in a 120-minute fireproof safe, and we have remotely monitored fire alarms, so the lack of offsite story for some of our data doesn’t cause me to lose too much sleep.

3 Likes

Any thoughts on Proxmox Backup Server?

I only* have a local 20TiB hdd to rsync to sitting in an old machine :-), with no offsite backup (which scares me sometimes); and a bunch of ‘as code’ containers and VMs as ansible or pulumi programs that more often than not call ansible :slight_smile:

A while back I had a setup to p2p backup with friends, encrypted; not sure what the space of that kind of tooling is today. I forget what the app was called, they went out of business.

If some of us want to store each others’ data, what tools could we use for that? IPFS comes to mind, but I digress… :smiley:

I forgot to mention proxmox backup server. I’ve only spun proxmox up a couple times as I have a VMUG license and use ESXi all day at work. But that’s a good option!

I have a few different methods I use.

Proxmox VM’s → PBS → Backblaze Bucket via Rclone
PBS → Off-Site PBS

My “Vault” and “Editing” shares → TrueNAS sync job to backblaze bucket

And finally, I don’t backup some of my data. All my media I have “ripped” for my Plex server does not get backed up. Where am I even going to store 50TB of data in the cloud that doesn’t drive me broke? Even storing that much data on Backblaze really isn’t that cheap.

I just choose what is the important stuff to have and make sure I back that up!

I have a local MySQL database that I run locally that has snapshots taken and stored on my PBS every day. Snapshots are the way.

1 Like

Wait a minute, everyone with an HL15 is “the cloud”, feel free to ship a DAS over, I’ll attach it to my cluster.

Though I wonder how much lower can we get relative to this 1 PiB reference architecture, and still have everything “proof of storage”'d: https://docs.filecoin.io/storage-providers/infrastructure/reference-architectures

The problem with p2p file sharing is that you don’t know when I delete your files or format your drives. This seems to solve for that.

Are you suggesting that I could use something like the IPFS to backup my Plex Media?

Hm, didn’t think it was possible, looks like at least one person tried it: Backups with IPFS - Kevin Cox

Wait a minute, everyone with an HL15 is “the cloud”, feel free to ship a DAS over, I’ll attach it to my cluster.

After briefly reading about “Storj” or whatever it’s called, I’m inclined to agree!

So how do we join this? Storj - How It Works

You have about as much information as I do now on the matter I’m afraid. It was a brief read over a cup of coffee one morning. I need my storage space for me! hahahahah.

In all honesty, it seems like a really neat project, but I would have to delve into the risks, both security and legality, before getting into it. I’m not sure what the TOS/vetting looks like for the storage consumers. For example, if someone were to sign up and use MY local storage and ran into legal trouble that resulted in their data being investigated, what does that mean for my equipment and me?

Bacula is a great open source suite.

It’s more geared toward command line and text file config. Allows to backup to different systems such as tape, hard drives or cloud.

1 Like

I ran into this on YouTube the other day. The first thing that came to my mind was:

“I wonder how long it would take to backup my media to Discord?”

lol

Why would you discord to do that? I do not associate Discord with a backup storage.

I mean I would not backup anything important using that method. It was more a fun project that this person did rather than a proper backup solution. The method that was used COULD totally be leveraged. So if it’s done right, you’re looking at free storage. I don’t think that anyone associates Discord as a serious backup storage.

Sure there are many things that can be done technically.

When Amazon first offered their Drive, Amazon was offering deals for people to have free space with their purchased, such as 6 months unlimited space. I took advantage the deal to rack up free unlimited tier for about 3 years. The deal was honored and I took advantage of it.

As the timeframe was back in 2013/2014, I backup my 8 bay DroboPro to Amazon’s drive.

These providers had to realize unlimited drive is not a profitable business model, Amazon emailed (in 2018) and I was given 6 months to either pay $700 a month for 24 TB of storage or reduce the space for the Amazon Prime tier I qualified with no additional cost.

Microsoft had the same problem with their $99 unlimited plan (offered to corporate customers) as well. I put 4 terabytes (mostly test data) within my One Drive account. It was free for many years, but finally I was emailed asking to reduce the space I am using or pay the tier rather that replaced the unlimited tier.

I am sure the same will happen with Discord.

Have you ever used Discord? Did you watch the video I posted at all? With all due respect, you’re incredibly hypercritical about something that is pretty much actually a joke.

Discord is NOT a cloud storage provider. They also don’t offer and free direct storage. It’s basically a fancy messaging app with some social media features. Someone just found a clever way to take advantage of their CDN and use it as a way to backup files.

Part of the reason I said nobody is taking that seriously.

I work for the parent company of where Discord is hosting their stuff, and I can tell you i3dnet is very price aware

Nice hack though :slight_smile:

2 Likes

I was not trying to be hypercritical.

I shared the Amazon and One Drive stories as these companies did not do a lot of research on these types of offerings. This month Amazon will finish its removal of their Drive product.

While I have logged into to Discord, I do not frequently use. When Discord or a Forum are offered, I tend to use the forum.

Your original post was not as clear to your intention - joking or if the post was “what if”.

I am sorry if you feel that way as well.