I’m upgrading old hardware and Truenas (Core [Supermicro X9 MB] → Scale [HL15 Supermicro X11 MB). Both with 32 GM ram. Both systems have 5 spinning disks, RAIDZ2 config, and both with 500 GB SSD for log and 500 GB SSD for cache. Both datasets use “STANDARD” sync.
Both systems have 10Gb SFP+ to aggregation switch, and iperf3 results show link is fully functional.
Remote VM on a Proxmox system has two NFS mounts, one to CORE and one to SCALE, but limited to 2.5 Gb/s link.
FIO test script, shows markedly different results between the two NFS mounts: (script does random and sequential read/write)
CORE : ~ 145 MB/s write and 265 MB/s read, Average IOPS 31,543
SCALE: ~ 15 MB/s write and 39 MB/s read, Average IOPS 3654
Running FIO command for random write directly on CORE system shows 2.5 GB/s write and average IOPS 659k
Running FIO command for random write directly on SCALE system shows 98 MB/s write and average IOPS 23,875
Any pointers where to start looking and finding the issue for discrepancy?
Boy, I’m really struggling on this one. You’re seeing a big difference between the two so I have to believe something is wrong. I guess I’d start with these questions:
What parameters are you using with fio?
What drives are you using? Have you tried testing without the log and/or cache vdev?
Is the pool on the HL15 brand new? Is record size also the same on the two datasets?
Maybe use iostat and zpool iostat to confirm that the SLOG is actually being used on Scale. It seems like it is not. You might also want to check the autotrim setting on Scale and turn it off for this type of testing.
Also for comparison testing you probably want to be sure you disable the ARC and clear any file system cache before each test.
I’d get R/W working as expected on the system directly before investigating NFS.
Tell us a bit more about the HL15 system; what bays are the HDDs and SSDs in and are those bays the ones going to the SAS controller or the SATA controller? Is the motherboard BIOS and SAS controller BIOS up to date?
So are you trying to test synchronous or asynchronous writes? Adding --sync=1 to the fio command might be more representative of the writes the VMs should be able to achieve. For spitballing performance, I assume the "500 GB SSD"s you refer to are SATA or SAS SSDs, not NVMe.
Maybe set up a VM locally on the Scale box and see what the NFS performance is? Perhaps there’s something going on with the networking on the Proxmox box?
What are rsize and wsize set to for the NFS mounts?
I’m trying to keep with synchronous writes, for a safe VM and database environment.
I ran the above FIO command again adding ‘–sync=1’ on the TrueNAS SCALE system and got these resutls:
Random write 36.1MB/s and IOPS: 9185.
So, what does the “sync” command in my test line “sudo SYNC; fio….” do?
As this is a new system, I get to experiment and try to learn and understand various configs. Through my youtube, and online research, I’m thinking that I’d like a VDEV with 5 Seagate EXOS 20TB each in RAIDZ2 config, with LOG and CACHE and a triple mirror METADATA NVMe. The metadata would be new to config.
However, when I noted the stark difference between my old NAS system (CORE v13) and new 45Drives (SCALE 25.04 and/or 25.10), I started digging.
Your comment about Proxmox being the issue might be valid, yet my CORE TN system has significantly higher throughput. Both systems use 10GbE (via direct SFP+ connections) through USW Aggregation switch (see original post).
The Linux sync command forces all pending changes and buffered data held in the RAM (eg the page cache and ZFS ARC) to be immediately written to the physical disk. This ensures that all pending writes are committed before the next fio test starts. It has nothing to do with sync/async acknowledge mode for the test itself.
I agree it should be looked into. I’m not an expert here, just trying to think through the general process I would use. There are lots of things that can affect read and write performance, but they’re often a few percent, or maybe 10% here and there, not an order of magnitude difference. Also read and write follow different paths and (under-)performance in one may or may not be related to the other. Finally, my understanding is BSD has a bit better support for sync writes and NFS than Linux, but there again I don’t think the difference is an order of magnitude one.
If you are getting a random sync write of 36.1 MB/s I believe (but could be corrected) that is consistent with the expectation for a SATA SSD. So I guess you could proceed one of three ways; try to dig into getting the remote NFS performance up to 36.1 MB/s, add in SLOG and HDDs and get the local performance of a pool better, or continue to compare back to your Core system, but I’d suggest just using that as a final target and not comparing a bunch of interim tests to it.
I’m not an NFS expert, but if it was me, my next step would probably be to set up a local VM and test the NFS performance over the bridged network connection. If that seemed OK, I’d add back a pool with a VDEV and SLOG.
Is there some reason you’re mounting the NFS shares within a VM and not the Proxmox host directly? That might be the desired end state, but for testing it might be better to mount and test the shares just in the host first.