Making your storage highly available

I am in a good place with my proxmox cluster but want to take the next step with my networked storage. I want the ability to take take my HL-15 down for maintenance and tinkering without affecting the rest of my cluster or people watching plex on my storage.

What is the recommended next step? Do I just buy another HL-15 and mirror things in someway, I am not very sure about the mechanics!

Appreciate the help!

context:

  • 1x HL15 - Fully Built & Burned In
  • 3x compute nodes proxmox cluster (setup so I can tinker with an individual node and not effect the kubernetes cluster)

current storage pools:

Just saw this video on the 45drive’s youtube channel: https://www.youtube.com/watch?v=vT5zzOO8HNc

I think the gist was… ceph and gonna need at least 3 nodes, like my proxmox cluster?

unsure if I am in it for TWO more HL-15s, but I will keep researching

You could look into something like zrep for the “buy another HL-15 and mirror things in some way” route;

but I think some of the approach depends on how much writing is going on to your pool(s) and how seamless you would expect a failover to be (ie, do you expect a plex watcher to experience zero dropped frames and stutter) for the fail over and fail back.

expectations:

  • for write loads on the cluster, some unavailability is fine but the k8s containers should be able to heal as if there was a blip in the network
  • for plex, stuttering is fine as long as it steadies out in the minutes

overall, nothing too stringent, I just want to be able to pull one hl15 for routine maintenance or tinkering without having to declare an outtage window

Great example of architecture tying your hands up.

Running a second HL15 so you can do maintenance or tinker seems borderline insane.

Ceph is complete overkill and fully insane unless you are protecting a workload worth a lot, where downtime costs more. Plex media is pretty easy to replace in a catastrophic failure. Ideally your clients have cached enough of the episode/movie they are currently watching for your maintenance to finish, but this is client/player dependent. Ultimately I have to ask, how often are you bringing the thing down for maintenance?

What is the size of the working set for your containers? If you need 100% uptime on specific services, setup a server for them. You are running into the limitations of a monolithic architecture. I run my NVR separately from my HL15 for this reason.

Consider a dev box for tinkering, doesn’t need to perfectly match.

The fewer dependencies, the better. Figure out where you have created dependencies that make it difficult to maintain this system.

1 Like