HL15 for intensive compute + memory

pcuci · October 6, 2023, 4:03pm

Hi everyone, could use your feedback on my selected hardware, since I’m not building this for myself

I initially signed up for the full HL15 build, but then realized that 6 threads and 16 GiB ram won’t cut it for the work my partner needs to do for her job, she’s a neuroscience researcher, basically looking at pictures like this all day:

(which I myself don’t understand)

The basic short-term need is to run GitHub - ANTsX/ANTs: Advanced Normalization Tools (ANTs) and similar neuroimaging workflows like GitHub - scilus/tractoflow: TractoFlow: A robust, efficient and reproducible diffusion MRI pipeline leveraging Nextflow & Singularity

Computing this kind of result for 200 patients takes about 3 days on one machine 12 threads with 128GiB ram (anything less ends up in OOM errors, so she needs to start over, which royally sucks)

The longer-term need is to collaborate with the rest of her lab, at least 3 other researchers doing similar kind of work. For that, they use Compute Canada but space is limited to some obnoxiously low 20TiB, which is why they all have external drives physically moving patient data around town when not working from home(!). And with https://www.science.org/content/article/white-house-requires-immediate-public-access-all-u-s--funded-research-papers-2025 going into effect in 2 years, well, they’re just not going to be able to accomplish anything at that scale, say 2000 patients, or 10 x today’s workload…

So the longer term idea is each peer researcher gets 3+ HL15s at home, in an HA configuration, so they can back up each others data/results. Which would be a 1/10 of the cost of using AWS S3, last I checked. Welcome to the other meaning of “home lab”, i.e.: bring your lab at home

Here’s what I have so far: https://pcpartpicker.com/user/pcuci/saved/LD4ckL

Feel free to share feedback either here or on pcpartpicker

The reason I went the pcpartpicker route is that researchers, being not that rich, need to make do with what they have, so my reasoning is that readily available consumer hardware would be a much better long-term reproducible choice(?)

nick · October 6, 2023, 9:49pm

I don’t see any problem with using consumer parts. I personally value having IPMI built into the motherboard but that’s probably less important to a person using running an application vs managing the box. Plus you could get something like a Pi-based KVM as an add-on to consumer hardware. Your part choices seem reasonable to me. The HL15* packaged CPU and RAM are smaller than I’d want even just for storage and light compute. Depending on your budget (grant funding?) you could go with a Threadripper and process that much quicker if that would be beneficial.

I know it’s not helpful to recommend going in a totally different direction but going through a centralized, high-performance computing service would allow the lab members more time to work on their research and less on learning/doing system administration. That said, if HPC was available to your partner I suspect you might not be looking for a home solution! My partner is a researcher too so I understand trying to help where you can. Good luck with your project.

Glitch3dPenguin · October 7, 2023, 3:08am

If I were you I’d be looking into a HL15 server setup but offload the compute to a separate server. Using the HL15 for what it’s intended for - storage. And then look into buying a compute blade better suited to handle the workload you’re talking about.

You could get the pre-built HL15s and use it just for it’s storage and then maybe looking at a more professional compute server from supermicro. They will have server solutions designed with enterprise hardware with the Memory and computer power you need. Maybe even dual socket motherboards.

Fossil · October 7, 2023, 11:13am

Is this very CPU intensive? I would guess it is.

Because then I would definitely look at the Threadripper if not Epyc line of CPUs. Which also gives more PCI-E lanes for adding network cards, GPU acceleration or more NVME storage etc.

pcuci · October 7, 2023, 3:16pm

Following up on some of the ideas you guys shared, it appears on a raw Performance-Watt per dollar basis, a Threadripper configuration would be most cost effective, iff used 24x7

Then I ChatGPT’d my way through an analysis: https://chat.openai.com/share/15e2f953-cb05-4e6a-953e-301aab7e2b01

These neuro homelab systems won’t be used 24x7, and so getting anything too powerful seems to be an exercise in creating e-waste(?)

On the other hand, it is a production system that needs to “just work” when needed, much like an AWS spot instance, but at a fraction of cloud computing cost.

If there were a way to share the downtime compute/mem/storage/bandwidth with others, via say https://www.golem.network/ https://rendernetwork.com/ or https://filecoin.io/ (?) then that might change the economics(?) - haven’t played with these yet to be able to reason about sharing resources with the wider community.

Going back to the original needs, I think I’ll need to cap the initial purchase price to make it affordable to the average “mac-user researcher”, some of them buy NASes, but most use external USB bound HDDs (from what I see today), and a small minority build their own computers; and like @nick pointed out, they still have the HPC platform to do some scientific collaboration.

The dedicated compute option @Glitch3dPenguin is suggesting also seems viable, but I’m afraid that would introduces 2 points of failure: if either the storage or the compute machine dies, my partner won’t be able to do any work. Though it could work if we have 4 machines, 2 of each. The highest frustrating I’ve seen them experience is when infra tech doesn’t work, for whatever reason.

Going the route of professional redundant power supplies seems overkill and expensive(?)

There is one thing that bothers me though, and that is non-ECC memory on these consumer systems, whereas the HPC has this presumably covered(?) With an estimated random bit flip per GiB of RAM per month, at 128GiB, the “neuro homelab HL15” would corrupt data 4 times per day(!) (at hypothetical 24x7 usage), which would render the scientific results non-reproducible, i.e.: a peer researcher running the same pipeline against the same data on the exact same HL15 setup would get different results(!) Not that it stops them from publishing today… I just have a hard time believing their computed-at-home-results are as reproducible or even replicable.

Thanks for your feedback, much appreciated!

Feel free to weigh in to guide my thinking, as I mostly feel like I don’t know what I’m doing, haha.

Brandon · October 7, 2023, 3:52pm

Check out Reddit - Dive into anything - it sounds like some of the AM5 motherboards may have unofficial support for ECC memory.

Also it sounds like Zen 4 Threadrippers are going to be announced “real soon now” ™, see AMD Ryzen Threadripper PRO 7000 "Storm Peak" with up to 96 Zen4 cores to launch on October 19 - VideoCardz.com.

If I were you I’d probably wait for that announcement before making any final decisions.

Fossil · October 8, 2023, 7:14am

You might not want this. But you can score great deals on eBay for used AMD Epyc or Threadripper CPU/Motherboard combos. Could keep cost down a lot. These also support ECC memory.

jamesdwi · October 8, 2023, 2:53pm

sounds like she needs something more powerful than consumer gear can offer, Check out EYPC
milan cpus a lower cost option than the latest ones that come at a large premium, they are availilble on systems with 8x dimm slots for single socket systems and accept 64GB dimms, and a upto 64 cores they aren’t cheap. But cheaper than EPYC genoa. milan motherboards and cpu are availbile on ebay for a few hundred dollars, the higher core count ones cost more, as our dimms.

You may want to just get another the chasis and powersupply and build your own or use the current system as the NAS upgrade its ram and possibly CPU, and then add a second EYPC milan system, and see if any GPU’s will help the EPYC systems have lots of slots and pci-e lanes, great for graphics processing, ML and AI work loads

andy · October 10, 2023, 9:53am

ASRock Rack motherboards support ECC ram with consumer Ryzen chips, though they are a bit more expensive. For example this one, which also includes 10gb ethernet, and you can use ECC udimm ram such as this.

(You’d have to use last years 5950x with that motherboard, but that also saves you a couple hundred bucks which helps offset the pricier motherboard).

That board would leave your (very limited on consumer cpus) pice slots available for things other than networking, like a HBA or a GPU.

etoel · October 10, 2023, 10:08am

I use that board paired with a 5900x - it has been great. Leaves room for the HBA and a graphics card (for plex as it has IPMI built in).
I would take a second look at the cooler however, in the FAQ they said it might become an issue (the included one is 108mm, and most 4U coolers are 120. I don’t know how tall the motherboard sits in the case).

pcuci · October 10, 2023, 1:26pm

Just learned that all DDR5 seems to have some ECC chip built in? One thing I'd like to understand better about DDR5 is how well the built-in ECC ... | Hacker News

Phew, it’s pretty wild out there…

“The reliability characterization of fabricated 14nm DDR5 DRAMs with On-die Error Correction Code (ECC) and EUV process is presented for the first time. Intrinsic reliability of FEOL and BEOL WLR showed well above 10yrs of lifetime, 125°C. The products demonstrated no fails in high temperature operating lifetime (HTOL) of 1000hrs. The On-Die ECC design improved the single bit error rate by 10−6 times (refresh time >4x ). The failure rate, ppm of manufacturing burn-in process confirmed the healthiness of the baseline material and also effectively screen out and monitor any random defects. The presented 14nm DDR5 DRAMs are well in production for the PC segments and have been shipping and qualified for the Server segments.”

While the memory itself might be sound, I don’t think I understand yet where “full-ECC” comes in?

What would be a home-grown way to test DDR5 actual IRL error rates?

Would a monthly Memtest86 test help in spotting issues?

Since this would be a neuro home lab on the cheap, I’m thinking we could run the workloads twice, to see if we get the same bit-by-bit result(?)

daemon1001 · October 10, 2023, 8:08pm

I’ve been chewing on this for a couple of days, so it may not be complete or in order.

Regarding full HL15 build consideration and your concerns for 6 core / 6 Thread CPU and 16Gb ram:
intel bronze 3204 well suited for storage appliances and low-compute use cases
- no turbo boost or hyper-threading; not designed for performance but fine for home lab
- great cheap CPU for increasing node count in clusters
- easily and relatively cheap CPU upgrades from second hand marketplace especially if last generation (4100, 5100 or 6100 series CPU is used)
- LGA 3647 has a lot of pins, easy to damage if not installed correctly. Can kill motherboard from damage. I haven’t killed a motherboard from this series, but I plan on CPU swap at some time in this boards life.
RAM at 16Gb
- I am guessing this is just to certify that a working board was delivered. It sounds like different configurations will be available from the 45Homelab Store. This is a home lab for me, I got 6x32Gb dimm for under $50 ea.

Regarding use case:
I see two competing use cases here - Storage and processing power
For a plain storage server, I feel full HL15 build is overkill. I see the appeal of 15 drives for storage, but you could use a much cheaper or older CPU/motherboard combo and a 16i HBA card. If this is a place for data at rest, you might not even need a 10Gb nic.

For intensive computer processing it’s going to depend on your workload. In my wife’s Neuroscience lab they usually run into 3 scenarios:

high parallelization which benefit from GPU acceleration
independent component analysis which require lots of memory
simple image mathematical or transformational operations which are i/o bound ( more time to request data than process it)
-This was summed up from https://osf.io/7synk/download and the author was making a case for provisioning cloud computing but the same issues present themselves in the lab.

Of course there are scripts that get used that call for all 3 of the above conditions at different times.

Regarding Compute Canada and 20 TiB limit:
Is that per job or total sum allowed?
I am assuming she is part of a university. If so, they might be able to book time to run on those servers as well (overnight or weekends using the excess computing power the university has).

pcuci · October 11, 2023, 1:18pm

Hi @daemon1001 - thanks for chiming in!

4.3 Data management

Storage options in the cloud are incredibly powerful, but form a model that is more complex than most on-premises storage systems and cost models. Cloud object storage is scalable and highly reliable. However, pricing for object storage is typically based on the size of the data stored, the storage tier (how readily accessible and/or available is the data), and access charges. There are also potentially charges for data egress from the cloud or between regions. Entire machines can be saved along with everything necessary to create an analysis. Finally, object storage is not suitable as a file system; to use data on object storage one needs to stage it to a file system. To use these features cost-effectively requires thinking about data management at the start of an analysis.

1. Save virtual machines, containers, code and data together at the end of a completed analysis so that you can reproduce the analysis.
2. Use directory and file naming conventions consistently so that you can create automatic rules for creating versions of objects for backup, deleting versions of objects, moving objects to less expensive tiers of storage, or archiving objects.
3. Automatically migrate important objects to lower-cost archival storage when appropriate.
4. Save data products for completed analyses when the cost to store them for an appropriate timeframe is less than the cost to reproduce them, and delete them otherwise.
5. Write workflows to copy data from object storage to file system storage and write back results to minimize the size of a working disk

That is so far removed from their day to day, I don’t see it happening.

I wish the authors had put numbers to their claims though, S3 storage is 10x more expensive (when accounting for ingress/egress) compared to lugging a DAS around, and so I found it to be an incredibly difficult selling point in 2023. Though I suspect this could change when they will suddenly need to physically carry 10 external drives; we can revisit this assumption in 2 years.

great cheap CPU for increasing node count in clusters

For a plain storage server, I feel full HL15 build is overkill. I see the appeal of 15 drives for storage, but you could use a much cheaper or older CPU/motherboard combo and a 16i HBA card. If this is a place for data at rest, you might not even need a 10Gb nic.

This was my original take before I discovered the HL15.

Just backing up her few TiB of data every once in a while is a few hours wait time to the home backup server (on currently 1gbps home LAN), I was hoping to upgrade that, but it’s not just the network alone, it’s way more complicated. I.e. backup drive is an hdd, her laptop’s drive is an ssd, so the hdd will saturate first; then I had setup a ceph cluster, and now the network saturates first on a 2+2 erasure coding setup, lol. It’s a wack-a-mole exercise of squashing the next bottleneck…

Compute Canada 20 TiB limit

I’ll let them handle that admin, my impression is that only half of the applicants per year get accepted to work on the HPC clusters, the funniest thing to me is that they ask the researchers how much compute they need before they apply, really? Half the time they don’t even know what data they’ll be working on. 2023 Resource Allocations Competition Results | Digital Research Alliance of Canada

basically, all limits are reached:

Enter the HL45 neuro home lab

daemon1001 · October 12, 2023, 1:13pm

Not sure if you’re leaning towards full build and upgrade processor or still in AMD vs Intel mode,
but check out this guide for Skylake-SP:

" Intel Scalable Processors Xeon Skylake-SP (Purley) Buyers Guide"
(https://www.pugetsystems.com/labs/hpc/intel-scalable-processors-xeon-skylake-sp-purley-buyers-guide-1077/#My_picks_for_best_Xeon_Scalable_Skylake-SP_processors_by_Usage)

The author breaks down Intel Skylake CPU’s by single thread, multi-thread, memory I/O bound and doesn’t matter and gives his recommendation.

For now I’m avoiding higher TDP CPU’s (150 or higher) until I can verify cooling solutions. I think the stock HL15 build just uses a heat sink (no fan) on the CPU.

Serve the Home did a good value analysis between first and second generation Xeon-SP:

pcuci · October 12, 2023, 11:22pm

These are thousands of dollars just the CPU

How would you compare between the “upgrade CPU” and “consumer (amd vs. intel)” options, in a more apples-to-apples comparison? (I never bought Xeons before, so not sure what to look for)

Is ECC memory part of your decision grid? Other important criteria for your home lab use case?

Perhaps I don’t understand the fascination with running corporate equipment in a home lab, I guess to each their own?

daemon1001 · October 13, 2023, 2:53am

I’ll back up a bit. I assumed something and I apologize.

For me home labbing is as much running wires and cabling and assembling and configuring second hand gear as it is learning new software packages. The assumption I made was that you would be following a similar path on hardware.

For instance, I chose a full build HL15 because after sourcing parts online and via ebay the total was less than $150 savings. I was willing to pay the difference for a new motherboard and new cpu vs used and the potential hassle of troubleshooting used cpu/motherboard/memory not booting. I was able to source 6 x 32GB ECC RDIMMs at $47 ea.
I plan on swapping the cpu ( I call it upgrade) with a previous generation xeon gold 5122 . They may have been $1500 previously, but they are @ $30 USD now. I personally will stay away from 6000 and 8000 series as I feel those were developed for multi processor boards and have a higher TDP. (I’m trying to stay under 150 for cooling and energy concerns).
This would also raise my Passmark score to 8800 vs 4800 for the included bronze 3204
I sourced 10 8Gb SAS drives for $450 total.

My proxmox box (for Virtual machines and containers) in my lab is currently an Asus X570 (from 2019), Amd Ryzen 5 and 64Gb ECC.
My backup server is an old Asus P5E WS pro (an old quad core from 2009) with 8Gb ECC memory.
I also have a trueNAS box for my fileserver.
Plex server on an asus WS480 ACE (my PC)

I was linking to the suggested buyers guide in case you were looking to get more performance per $. Buying any of those CPU’s new, along with memory and drives, is out of my budget.
In my wife’s neuroscience lab they use unlocked overclock-able Intel I9 12th gen and xeon W workstations for single threaded work flows. They also have portable enclosures with NVIDIA 3090 for use when offloading calculations to a GPU is beneficial. Most workstations max out at 128Gb or 256Gb memory, but image sets being analyzed are in Tb.

Excuse my wandering. If you are trying to self host these Neuroscience workloads, I would approach it this way:
what are your main goals vs secondary goals? I feel like your primary is data backup and data processing is a close second.
What is your budget? (This sounds like your pocket, not the labs budget)
Are you comfortable with used gear or does it have to be new? Even old gear, like (2014) X99 dual processor boards can be had for @ $300. With a used Intel Xeon E5-2620 v3 @ 2.40GHz it would have a passmark score of 7800. These cpu’s are @ $25. (They are power hungry).

If you are looking for a file server and backup solution, the HL15 would be a fine solution, but probably overkill. An Intel I3 would be fine for that role or even Intel atom C3000 series.
If you are trying to selfhost the data processing, Used gear will stretch your budget further. If you are unsure of your workload requirements, I would consider a high turbo clock speed for single threaded process ( an I9 or workstation Xeon W). If the workflow can be parallelized you can add in a graphics card. Consumer boards will usually bottleneck at I/O throughput as they often have less than half the lanes.

If you want a baseline for CPU comparison try: https://www.cpubenchmark.net/
ECC memory is a consideration for certain workflows (my fileserver and backup server) but even then not required unless data is irreplaceable. I did not trust AMD ECC support for DDR4. If there was an error, it was up to the operating system to catch it and report it (oversimplified version) and this wasn’t implemented consistently across manufacturers.
For my homelab - considerations are: Power draw, Noise, no rack mount servers ( I do have small racks for switches, routers and patch panels) , IPMI is a nice feature - I currently use PiKVM.

pcuci · November 17, 2023, 8:26pm

Given the wealth of knowledge/insight in this thread, it got me thinking that maybe starting small could be the right approach.

The most annoying thing about this problem is that I hit all bottlenecks in one go:

1gpbs internet bandwidth at home means ingesting new data over night ; 10x slower to ship results back out to Compute Canada or the university lab (though results seem 100x smaller), local ISP throttle uploads (ISP duopolies suck )
single-threaded pipeline steps hit the boost rate of the CPU
parallelizable steps require tons of memory, so I hit that too in the same pipeline

These would be my primary concerns.

Secondary concerns would be LAN based backup speed (which can be done overnight), HA availability which is out of budget for the time being, though I do share @SpringerSpaniel 's “hyperconverged” ambition!

Then I found this: H12SSL-i | Motherboards | Super Micro Computer, Inc.

It supports both 2nd Gen EPYC Rome and 3rd Gen EPYC Milan chips, so I could start small with an EPYC 7042P (so that my partner can feel an improvement relative to her current workhorse laptop)

Idea would be to enable an upgrade to a 3rd gen EPYC Milan later, which seems inevitable…

I see the home lab machine evolve as follows:

EPYC 7042P 24 core CPU + 128GB (2x 64GB) RAM + 4x 20TB Seagate Exos (Ceph on OSD failure domain on 2+2 erasure coded setup; PVE host)
Observe reality, and see what bottlenecks get hit most regularly that affect the basic collaborative scientific workflow (I already have 2 of her colleagues ssh into VMs on my scrap-metal PVE HA cluster)
Double RAM, or double storage; probably both
Get the research lab to pay for 3 additional identical machines to upgrade PVE to an HA configuration, and Ceph to Host failure domain. At this point we’re looking at a total of 96 cores (192 threads) setup, 1TB RAM, and .32PiB of data; but most importantly, have the entire lab work on the same “hyperconverged” home lab, unhindered by early grant applications for data/compute (I’ll need to practice my “sales” pitch, haha)
4.1 connect all these with mikrotik’s CRS504-4XQ-IN 100Gb 4x QSFP28 ports switch plus NICs for each node

Basically, no more downloading patient data to own laptop to run your local irreproducible pipeline. Work straight off the cluster instead.

There seems to be a huge jump in $ per GB from 64GiB to 128GiB RAM sticks, what’s up with that? This basically limits the practical limit of RAM upgrades to 0.5TiB per node (so 2TB total)

On the motherboard spec sheet, I see:

Does this mean I can fill in all 15 bays with SATA drives? There is no (6 Gbps) mention on these.

I see some folks in the forums here aim for a dual mirror NVMe for the OS boot drive; I imagine for reliability reasons.

Could that be achieved with the 2 remaining SATA DOM ports instead?

I also wonder about the ongoing maintenance cost of such a system: with 60 drives in a fully loaded 4x node HA setup, one drive must be dying every month(?)

Still thinking out loud at the moment, so any and all feedback very much appreciated before I pull the trigger on some of this gear

daemon1001 · November 18, 2023, 12:47am

What kind of setup did you end up with? Full build? or what CPU and how much RAM?

pcuci · November 18, 2023, 3:34am

Aiming for chassis + power supply, plus the parts in point 1. above

I haven’t placed a full order yet (only the initial deposit), since I’m not in a rush

And seeing how the 45 Drives team is overwhelmed with the current number of orders until at least January 2024, I’ll let them sort out the kinks first

orix · November 19, 2023, 4:53pm

IPMI, iLO, iDRAC, OOBM in general are one of the most underrated features a server grade motherboard offers, in my opion. But, if OP has easy access to their HL15 such as a keyboard and monitor ready to go, I could understand in many homelabs choosing to forego IPMI. It does add a security risk which needs to be accounted for as well.

I also see the appeal of consumer/prosumer grade hardware. It’s standardized, ATX/mini-ATX, easy to walk in and grab off the shelf, and offers the path of least resistance when it comes to hardware upgrades and additions. I can fix my gaming rig by taking a quick trip to Best Buy or Microcenter. I cannot always do the same for one of my various enterprise servers.

Looking back at my own HL15, I could have easily just moved my i7-9700k build from my main desktop into the chassis and saved some cash. Heck, financially, I should have! But, I wanted IPMI, ECC, and the ready to go Xeon knowing it had all the PCIe lanes required to get the intended performance out of the chassis.