SAS = Too Slow, maybe NVME?

I’ve likely done something wrong with my setup, but I have 15 900GB SAS drives in my kit and a single HBA. (this is on the HL15 chassis+power supply model) Running Proxmox and ZFS ZRAID1. It’s DOG slow. Need to run some perf testing but sooooo many variables.

In my wish list, I was thinking about switching out to NVME drives because of course. But not sure if I should pony up for a new controller and 15 drives, or try to figure out what’s cooking on my spinning platters. Absent good telemetry, I know it’s an amorphous ask, but what would be GOOD performance on the above setup? What am I missing?

  • Read or write or both?
  • What is the layout of the ZFS pool re VDEVs?
  • What specific HBA?
  • Where are you measuring this? On the proxmox host, in a VM guest, or over the network?
  • What RPM are the HDDs?
  • The pool is healthy and not going degraded?

A single 7200RPM drive is going to have a sustained R/W of something like 150MB/s. SAS drives are able to burst higher than that briefly while they empty a cache, but the physical rotation of the disk is always going to be a limiting factor whether it’s SATA 3Gb/s or SAS 12 Gb/s. How the disks are arranged in the pool will affect the amount of parallelism you can expect across the drives based on the RAID level chosen.

4 Likes

I’m just going to add to @DigitalGarden, What kind of files are you transferring and what is your workload like? is it all kinds of tiny files or is it more sequential large transfers?

You may not be running into a bandwidth issue and instead might simply just not have enough IOPs for your use case with your current RAID configuration which again this information would be wonderful to know.

I’ll add

  • Workload?
  • How many users at a single time?
  • Is this over a network or local storage that’s having the slowness?
  • How much RAM is available to the system
  • Do you have any SSD cache drives for a SLOG or a ZIL?
2 Likes

Oh wow, amazing!

  • Read or write or both? both
  • What is the layout of the ZFS pool re ZVOLs? ZFS RaidZ1 all 15 drives
  • What specific HBA? LSI 9300-i16
  • Where are you measuring this? On the proxmox host, in a VM guest, or over the network? inside a Windows VM, perf measures below
  • What RPM are the HDDs? 10K
  • The pool is healthy and not going degraded? healthy
  • Workload? windows VM, just doing random things like patching
  • How many users at a single time? just 1
  • Is this over a network or local storage that’s having the slowness? local VM to local ZFS, provided by Proxmox itself, nothing fancy like TrueNAS and passthrough (yet haha)
  • How much RAM is available to the system 192 on host, 16 on VM
  • Do you have any SSD cache drives for a SLOG or a ZIL? no, and that may be a big part of the problem, couldn’t figure out how to do that in proxmox directly and thought maybe no big deal, but maybe HUGE deal haha

From within VM:
image

Open to thoughts on how best to test perf from Proxmox shell, maybe dd or hdparm? (Sorry, n00b)

I thought it was a great reply! Definitely curious if @Hutch-45Drives has some thoughts as well.

i wanted to run a test on a system here for you to compare to. I think most of what I said is accurate, but I don’t want to post something inaccurate if possible and confuse anyone.

1 Like

Sorry, I meant VDEVs.

Yes, your CDM test seems odd. You say you’re having issues with both read and write, but from that test, I think your issues are probably just on the write side, unless you have some truly unattainable expectations of read throughput from spinning disks. I’m not sure how to best drill into where the issue is, though. I could throw out all sorts of questions, but they may all be irrelevant. How familiar are you with Proxmox or ZFS? A place to start would be testing outside the VM. In the VM, what sort of Antivirus is running, just the normal Windows Defender? What version of Windows? How full is the ZFS pool?

I’m not saying it is your problem, but typical recommendations these days avoid RAID5/Z1 and either recommend RAIDZ2 for redundancy requirements or a striped/mirror configuration like RAID 1+0 for performance. Also, typically the recommendation is to limit the number of disks in a VDEV to around 8. For small 800GB disks, this probably isn’t as important as someone with 20TB drives, but is something to be aware of. Rules are meant to be broken, and you may have valid reasons for your setup.

I can’t replicate your system exactly, but I set up a Windows 7 VM on a system running TrueNAS Scale. It has 12x 10TB SATA drives in a single VDEV with RAIDZ2. It also has a 93xx-16i card. I don’t think the other specs are remarkable or relevant. I created the VM basically using the defaults and a Windows 7+SP1 ISO, so it also isn’t remarkable or tuned in any way.

Here is what I got.

You might want to read through this thread if you haven’t seen it already;

https://forum.45homelab.com/t/zfs-write-errors-with-hl15-full-build-and-sas-drives/827

It has some commands you can use in the Linux shell for performance testing. I don’t think the particular failure mode they were experiencing–45Drives-supplied SFF-8643 cables connecting SAS drives to the full build motherboard–applies to you. Although possible problems with the 45Drives-supplied SFF-8643 cables, assuming that is what you are using, is something to keep in the back of your mind, even though you say the pool isn’t going degraded.

I tried to google quickly for issues with Proxmox Windows Guest VM disk performance, but most of what came up were about SATA SSD and NVME drives. I do have Proxmox in the lab, but not enough drives to make a suitable comparison out of it.

1 Like

Hi @doodlemania and thank you for that info.

Based on the test you provided it seems like your reads are what we would expect but the writes are not.

I would recommend starting by testing the ZFS pool with something like an FIO test to make sure we can get he same reads and writes on the pool.

Here are some tests to run directly on the pool

cd /directory/of/storage
fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
fio --name=seqread --ioengine=posixaio --rw=read --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based

The first test will show the maximum bandwidth of writes to the ZFS pool. the second will test reads

Could you post these results?

Can you also provide “free -h” so I can see how much memory you have available?

Could you also provide more details about this VM you created?
things like what the OS is and how you created the disk for the VM.
If it’s a Windows VM(which Crystal Disk makes me believe it is) did you follow proxmox’s best practices for creating a Windows VM here https://pve.proxmox.com/wiki/Windows_10_guest_best_practices

This community is just amazing! Going to run the tests recommended on the host today to compare.

Agreed - it does look like my writes are the bottleneck, after I run the tests, I’ll definitely experiment with Stripped Mirror, fewer disks in the VDev, and if I can figure out how to do it in Proxmox direct, get a L1 or L2 cache going - but as DG mentioned, that’s probably only performant for reads, not writes.

The super scientific reason I started the thread “it’s dog slow” has been predicated on being able to do basic things in Windows like open Windows Explorer, general GUI responsiveness - went down the Guest Agent being a cause rabbit hole as well and it was kind of inconclusive. Harder to tell on Linux VMs since it’s all terminal and SSH but Windows being unusable is what got me tinkering here.

Ya’ll are amazing! Back soon with some stats.

Here’s some stats from the proxmox box (haven’t changed any VDevs yet):

Writes:
root@pve:/hdd# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=338MiB/s][w=338 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=2463525: Fri Jan 19 14:23:30 2024
write: IOPS=363, BW=363MiB/s (381MB/s)(21.3GiB/60043msec); 0 zone resets
slat (usec): min=2, max=594, avg=15.40, stdev=10.40
clat (usec): min=1495, max=242034, avg=43953.35, stdev=20686.13
lat (usec): min=1501, max=242047, avg=43968.75, stdev=20688.75
clat percentiles (msec):
| 1.00th=[ 4], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 42],
| 30.00th=[ 47], 40.00th=[ 48], 50.00th=[ 49], 60.00th=[ 50],
| 70.00th=[ 51], 80.00th=[ 53], 90.00th=[ 59], 95.00th=[ 68],
| 99.00th=[ 100], 99.50th=[ 112], 99.90th=[ 188], 99.95th=[ 241],
| 99.99th=[ 243]
bw ( KiB/s): min=196608, max=4227072, per=100.00%, avg=372189.87, stdev=417004.55, samples=120
iops : min= 192, max= 4128, avg=363.47, stdev=407.23, samples=120
lat (msec) : 2=0.05%, 4=10.04%, 10=4.50%, 20=2.55%, 50=51.94%
lat (msec) : 100=29.97%, 250=0.94%
cpu : usr=0.67%, sys=0.04%, ctx=10923, majf=0, minf=23
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=50.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,21824,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=363MiB/s (381MB/s), 363MiB/s-363MiB/s (381MB/s-381MB/s), io=21.3GiB (22.9GB), run=60043-60043msec


Reads:
root@pve:/hdd# fio --name=seqread --ioengine=posixaio --rw=read --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=2528MiB/s][r=2528 IOPS][eta 00m:00s]
seqread: (groupid=0, jobs=1): err= 0: pid=2464350: Fri Jan 19 14:26:44 2024
read: IOPS=2516, BW=2516MiB/s (2638MB/s)(147GiB/60007msec)
slat (nsec): min=50, max=59082, avg=112.61, stdev=170.25
clat (usec): min=5287, max=12052, avg=6357.24, stdev=291.67
lat (usec): min=5287, max=12052, avg=6357.35, stdev=291.68
clat percentiles (usec):
| 1.00th=[ 6194], 5.00th=[ 6194], 10.00th=[ 6194], 20.00th=[ 6194],
| 30.00th=[ 6259], 40.00th=[ 6259], 50.00th=[ 6259], 60.00th=[ 6259],
| 70.00th=[ 6259], 80.00th=[ 6325], 90.00th=[ 7046], 95.00th=[ 7111],
| 99.00th=[ 7177], 99.50th=[ 7177], 99.90th=[ 7308], 99.95th=[ 7504],
| 99.99th=[ 8160]
bw ( MiB/s): min= 2368, max= 2618, per=100.00%, avg=2517.13, stdev=48.36, samples=119
iops : min= 2368, max= 2618, avg=2517.13, stdev=48.34, samples=119
lat (msec) : 10=100.00%, 20=0.01%
cpu : usr=0.35%, sys=0.19%, ctx=75498, majf=0, minf=23
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=50.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.0%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=150990,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
READ: bw=2516MiB/s (2638MB/s), 2516MiB/s-2516MiB/s (2638MB/s-2638MB/s), io=147GiB (158GB), run=60007-60007msec
root@pve:/hdd#


Memory:
root@pve:/hdd# free -h
total used free shared buff/cache available
Mem: 125Gi 84Gi 30Gi 146Mi 11Gi 41Gi
Swap: 8.0Gi 57Mi 7.9Gi

ACK - yeah, Windows box and yes, best practices on drivers, etc all followed!

This is so fun!

1 Like

This might be tangential:

I recently had dog slow writes/inconsistent performance with a 9300-16I off Ebay, especially with writes. After trying to cool it down (Damn it was hot, I thought it was throttling at first) and learning about SSD DRAT and RZAT (which was a problem but not the whole problem which made diagnosing super confusing) my 9300-8I I already had (from a different seller) worked just fine, but the 9300-16I I couldn’t get to work, even after updating the firmware, it would just drop down to 100-1MB/s writes (but usually about 40-90MB/s). I’m lucky I had a working 9300-8I to find nothing else was the problem.

People have told me there’s a lot of counterfeit cards out there with most of the same markings and such but they just crap out performance-wise. So I just figured I had a malfunctioning or counterfeit one. Ordered and got a 9305-16I in the mail today and it works like a champ just like the 9300-8I.

(Running Truenas Scale, Intel 13400, main array is a Raid Z1 4x8TB 7200rpm read and writes between 275-475 MB/s with non-problem card)

1 Like

With the results of the FIO you sent the max reads from the pool is 2638MB/s which is great but the writes we saw are only 381MB/s so it is a RAID/Pool issue.

As @Rescue7 mentioned, this could result from a hot/bad HBA card. Can you change the HBA card or try to benchmark a single drive through the HBA and then try directly connecting to the motherboard? I want to see if the drive has the same performance through the motherboard and the HBA or if they are different.

I don’t think an SLOG is needed yet as we would normally see the reads and writes around the same and an SLOG would help for a lot of IOPs which the FIO I gave was just for a large sequential transfer

Here’s the data with a single disk attached (via the HBA) and mounted as ext4:
root@pve:/mnt/testdisk1# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=731396: Fri Jan 26 15:32:12 2024
write: IOPS=128, BW=129MiB/s (135MB/s)(10.0GiB/79466msec); 0 zone resets
slat (usec): min=3, max=100, avg=35.91, stdev=11.78
clat (usec): min=1379, max=73778k, avg=59204.96, stdev=1928045.39
lat (usec): min=1447, max=73778k, avg=59240.87, stdev=1928045.19
clat percentiles (msec):
| 1.00th=[ 9], 5.00th=[ 9], 10.00th=[ 9], 20.00th=[ 9],
| 30.00th=[ 9], 40.00th=[ 9], 50.00th=[ 9], 60.00th=[ 9],
| 70.00th=[ 9], 80.00th=[ 9], 90.00th=[ 10], 95.00th=[ 10],
| 99.00th=[ 10], 99.50th=[ 11], 99.90th=[ 32], 99.95th=[17113],
| 99.99th=[17113]
bw ( MiB/s): min= 700, max= 1842, per=100.00%, avg=1705.50, stdev=319.47, samples=12
iops : min= 700, max= 1842, avg=1705.50, stdev=319.47, samples=12
lat (msec) : 2=0.01%, 10=99.07%, 20=0.69%, 50=0.16%, >=2000=0.07%
cpu : usr=0.51%, sys=4.02%, ctx=231647, majf=0, minf=27
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=49.9%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10241,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=129MiB/s (135MB/s), 129MiB/s-129MiB/s (135MB/s-135MB/s), io=10.0GiB (10.7GB), run=79466-79466msec

Disk stats (read/write):
sda: ios=0/10351, merge=0/204, ticks=0/5706366, in_queue=5706366, util=92.52%
root@pve:/mnt/testdisk1# fio --name=seqread --ioengine=posixaio --rw=read --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=160MiB/s][r=160 IOPS][eta 00m:00s]
seqread: (groupid=0, jobs=1): err= 0: pid=731867: Fri Jan 26 15:34:13 2024
read: IOPS=145, BW=145MiB/s (152MB/s)(8720MiB/60079msec)
slat (nsec): min=60, max=156706, avg=775.00, stdev=1769.58
clat (msec): min=55, max=193, avg=110.18, stdev=23.16
lat (msec): min=55, max=193, avg=110.18, stdev=23.16
clat percentiles (msec):
| 1.00th=[ 91], 5.00th=[ 91], 10.00th=[ 92], 20.00th=[ 92],
| 30.00th=[ 99], 40.00th=[ 99], 50.00th=[ 102], 60.00th=[ 107],
| 70.00th=[ 110], 80.00th=[ 112], 90.00th=[ 155], 95.00th=[ 161],
| 99.00th=[ 180], 99.50th=[ 184], 99.90th=[ 194], 99.95th=[ 194],
| 99.99th=[ 194]
bw ( KiB/s): min=131072, max=194560, per=100.00%, avg=148701.87, stdev=15099.46, samples=120
iops : min= 128, max= 190, avg=145.22, stdev=14.75, samples=120
lat (msec) : 100=40.73%, 250=59.27%
cpu : usr=0.11%, sys=0.06%, ctx=4360, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=49.9%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=8720,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
READ: bw=145MiB/s (152MB/s), 145MiB/s-145MiB/s (152MB/s-152MB/s), io=8720MiB (9144MB), run=60079-60079msec

Disk stats (read/write):
sda: ios=13914/125, merge=0/12, ticks=116384/10832, in_queue=127215, util=99.76%

It could be some time before I can get up to where I store my HL15 to test direct against the motherboard. But I can replace the HBA card when I’m up there if you think it’s bad. Just a cheap $90 job off Ebay :slight_smile:

LSI 9300-16i 16-Port 12Gb/s SAS/SATA HBA ZFS TrueNAS UnRAID IT Mode 16.00.12.00 | eBay is the exact item I purchased.

The single-drive test shows you are getting what a single HDD would get.

The HBA card could be bad or have a bad port causing the slowness. I think swapping out the HBA might be a good next step

I’ve replaced the HBA with a brand new shiny (same exact model) and here are my results:

Single Disk Write:
root@pve:/hdd1# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 –
size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=147MiB/s][w=147 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=5184: Mon Feb 5 12:42:05 2024
write: IOPS=217, BW=218MiB/s (228MB/s)(12.8GiB/60094msec); 0 zone resets
slat (usec): min=2, max=155, avg=27.07, stdev= 8.36
clat (msec): min=2, max=153, avg=73.33, stdev=45.38
lat (msec): min=2, max=153, avg=73.36, stdev=45.38
clat percentiles (msec):
| 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3], 20.00th=[ 6],
| 30.00th=[ 31], 40.00th=[ 100], 50.00th=[ 102], 60.00th=[ 104],
| 70.00th=[ 105], 80.00th=[ 107], 90.00th=[ 110], 95.00th=[ 113],
| 99.00th=[ 121], 99.50th=[ 127], 99.90th=[ 136], 99.95th=[ 136],
| 99.99th=[ 146]
bw ( KiB/s): min=131072, max=5294080, per=100.00%, avg=223206.84, stdev=495899.63, samples=120
iops : min= 128, max= 5170, avg=217.88, stdev=484.29, samples=120
lat (msec) : 4=19.07%, 10=5.40%, 20=3.64%, 50=3.79%, 100=14.04%
lat (msec) : 250=54.06%
cpu : usr=0.79%, sys=0.09%, ctx=6384, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=49.9%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,13088,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=218MiB/s (228MB/s), 218MiB/s-218MiB/s (228MB/s-228MB/s), io=12.8GiB (13.7GB), run=60094-60094msec

Single Disk Read:
fio --name=seqread --ioengine=posixaio --rw=read --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqread: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=2595MiB/s][r=2595 IOPS][eta 00m:00s]
seqread: (groupid=0, jobs=1): err= 0: pid=5661: Mon Feb 5 12:44:52 2024
read: IOPS=2595, BW=2596MiB/s (2722MB/s)(152GiB/60004msec)
slat (nsec): min=50, max=93793, avg=121.68, stdev=245.01
clat (usec): min=5893, max=12053, avg=6162.69, stdev=143.23
lat (usec): min=5893, max=12053, avg=6162.81, stdev=143.24
clat percentiles (usec):
| 1.00th=[ 6063], 5.00th=[ 6063], 10.00th=[ 6063], 20.00th=[ 6128],
| 30.00th=[ 6128], 40.00th=[ 6128], 50.00th=[ 6128], 60.00th=[ 6128],
| 70.00th=[ 6194], 80.00th=[ 6194], 90.00th=[ 6194], 95.00th=[ 6259],
| 99.00th=[ 6915], 99.50th=[ 6980], 99.90th=[ 7242], 99.95th=[ 7439],
| 99.99th=[ 8356]
bw ( MiB/s): min= 2450, max= 2624, per=100.00%, avg=2596.76, stdev=25.89, samples=119
iops : min= 2450, max= 2624, avg=2596.76, stdev=25.89, samples=119
lat (msec) : 10=100.00%, 20=0.01%
cpu : usr=0.32%, sys=0.23%, ctx=77876, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=50.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=155744,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
READ: bw=2596MiB/s (2722MB/s), 2596MiB/s-2596MiB/s (2722MB/s-2722MB/s), io=152GiB (163GB), run=60004-60004msec

Repeating with all 15 disks, RAID Z, bottom line only:
WRITE: bw=358MiB/s (376MB/s), 358MiB/s-358MiB/s (376MB/s-376MB/s), io=21.0GiB (22.6GB), run=60032-60032msec
READ: bw=2583MiB/s (2708MB/s), 2583MiB/s-2583MiB/s (2708MB/s-2708MB/s), io=151GiB (163GB), run=60007-60007msec

And in RAID10 with 14 of the 15 disks:
WRITE: bw=1065MiB/s (1117MB/s), 1065MiB/s-1065MiB/s (1117MB/s-1117MB/s), io=62.4GiB (67.0GB), run=60015-60015msec
READ: bw=2580MiB/s (2705MB/s), 2580MiB/s-2580MiB/s (2705MB/s-2705MB/s), io=151GiB (162GB), run=60005-60005msec

Can you try testing each drive individually to see if there is a bad drive causing the performance to be so bad?

Also, can you try to turn the read cache off on ZFS and try the same tests again? I think the reads are so high because you are reading from the cache and not the disks

Here’s the full output for each drive in seqwrite. They all look okay to me (shrug)? Also, how do I turn off ZFS read cache via CLI?

root@pve:/d1# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=145MiB/s][w=145 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=242162: Tue Feb 6 10:59:38 2024
write: IOPS=218, BW=218MiB/s (229MB/s)(12.8GiB/60070msec); 0 zone resets
slat (usec): min=2, max=150, avg=22.22, stdev= 9.26
clat (usec): min=1641, max=154572, avg=73232.76, stdev=45424.49
lat (usec): min=1644, max=154599, avg=73254.99, stdev=45429.52
clat percentiles (msec):
| 1.00th=[ 3], 5.00th=[ 3], 10.00th=[ 3], 20.00th=[ 5],
| 30.00th=[ 33], 40.00th=[ 99], 50.00th=[ 102], 60.00th=[ 103],
| 70.00th=[ 105], 80.00th=[ 107], 90.00th=[ 109], 95.00th=[ 112],
| 99.00th=[ 131], 99.50th=[ 136], 99.90th=[ 153], 99.95th=[ 155],
| 99.99th=[ 155]
bw ( KiB/s): min=106496, max=5570560, per=100.00%, avg=223509.94, stdev=515155.20, samples=120
iops : min= 104, max= 5440, avg=218.24, stdev=503.09, samples=120
lat (msec) : 2=0.10%, 4=18.99%, 10=5.40%, 20=3.53%, 50=3.72%
lat (msec) : 100=15.17%, 250=53.08%
cpu : usr=0.67%, sys=0.07%, ctx=6499, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.0%, 16=49.9%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,13104,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=218MiB/s (229MB/s), 218MiB/s-218MiB/s (229MB/s-229MB/s), io=12.8GiB (13.7GB), run=60070-60070msec


root@pve:/d2# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=133MiB/s][w=133 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=242553: Tue Feb 6 11:00:45 2024
write: IOPS=217, BW=218MiB/s (228MB/s)(12.8GiB/60092msec); 0 zone resets
slat (usec): min=3, max=187, avg=22.88, stdev=10.01
clat (usec): min=1421, max=165698, avg=73341.85, stdev=45984.71
lat (usec): min=1471, max=165724, avg=73364.73, stdev=45987.12
clat percentiles (usec):
| 1.00th=[ 1647], 5.00th=[ 1860], 10.00th=[ 2474], 20.00th=[ 4817],
| 30.00th=[ 31065], 40.00th=[ 95945], 50.00th=[ 99091], 60.00th=[102237],
| 70.00th=[106431], 80.00th=[108528], 90.00th=[112722], 95.00th=[117965],
| 99.00th=[130548], 99.50th=[135267], 99.90th=[143655], 99.95th=[145753],
| 99.99th=[158335]
bw ( KiB/s): min=126976, max=5898240, per=100.00%, avg=223236.08, stdev=538941.07, samples=120
iops : min= 124, max= 5760, avg=217.98, stdev=526.31, samples=120
lat (msec) : 2=7.30%, 4=11.90%, 10=5.38%, 20=3.55%, 50=3.67%
lat (msec) : 100=22.20%, 250=46.01%
cpu : usr=0.65%, sys=0.06%, ctx=6346, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.6%, 16=49.3%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,13088,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=218MiB/s (228MB/s), 218MiB/s-218MiB/s (228MB/s-228MB/s), io=12.8GiB (13.7GB), run=60092-60092msec


root@pve:/d3# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=144MiB/s][w=144 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=242972: Tue Feb 6 11:02:07 2024
write: IOPS=218, BW=218MiB/s (229MB/s)(12.8GiB/60103msec); 0 zone resets
slat (usec): min=2, max=180, avg=23.28, stdev=11.95
clat (usec): min=1410, max=201708, avg=73226.01, stdev=45515.44
lat (usec): min=1455, max=201735, avg=73249.28, stdev=45520.30
clat percentiles (usec):
| 1.00th=[ 1631], 5.00th=[ 1762], 10.00th=[ 1811], 20.00th=[ 4228],
| 30.00th=[ 33817], 40.00th=[100140], 50.00th=[101188], 60.00th=[103285],
| 70.00th=[104334], 80.00th=[105382], 90.00th=[108528], 95.00th=[110625],
| 99.00th=[126354], 99.50th=[129500], 99.90th=[139461], 99.95th=[139461],
| 99.99th=[193987]
bw ( KiB/s): min=131072, max=6256640, per=100.00%, avg=223580.19, stdev=565697.10, samples=120
iops : min= 128, max= 6110, avg=218.32, stdev=552.44, samples=120
lat (msec) : 2=15.58%, 4=4.30%, 10=3.77%, 20=4.12%, 50=3.64%
lat (msec) : 100=9.64%, 250=58.94%
cpu : usr=0.70%, sys=0.06%, ctx=6344, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.7%, 16=49.2%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.0%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,13116,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=218MiB/s (229MB/s), 218MiB/s-218MiB/s (229MB/s-229MB/s), io=12.8GiB (13.8GB), run=60103-60103msec


root@pve:/d4# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=138MiB/s][w=138 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=243337: Tue Feb 6 11:03:20 2024
write: IOPS=215, BW=216MiB/s (226MB/s)(12.7GiB/60103msec); 0 zone resets
slat (usec): min=3, max=191, avg=22.81, stdev=11.91
clat (usec): min=993, max=163993, avg=73998.26, stdev=46154.47
lat (usec): min=1026, max=164022, avg=74021.07, stdev=46158.82
clat percentiles (usec):
| 1.00th=[ 1680], 5.00th=[ 1811], 10.00th=[ 1876], 20.00th=[ 5800],
| 30.00th=[ 31327], 40.00th=[101188], 50.00th=[103285], 60.00th=[104334],
| 70.00th=[106431], 80.00th=[107480], 90.00th=[109577], 95.00th=[111674],
| 99.00th=[124257], 99.50th=[129500], 99.90th=[135267], 99.95th=[137364],
| 99.99th=[156238]
bw ( KiB/s): min=131072, max=5896192, per=100.00%, avg=221297.00, stdev=538103.72, samples=120
iops : min= 128, max= 5758, avg=216.06, stdev=525.50, samples=120
lat (usec) : 1000=0.01%
lat (msec) : 2=13.68%, 4=5.53%, 10=5.93%, 20=3.06%, 50=3.67%
lat (msec) : 100=5.66%, 250=62.47%
cpu : usr=0.68%, sys=0.05%, ctx=6307, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.5%, 16=49.4%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=4.1%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12978,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=216MiB/s (226MB/s), 216MiB/s-216MiB/s (226MB/s-226MB/s), io=12.7GiB (13.6GB), run=60103-60103msec


root@pve:/d5# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=157MiB/s][w=157 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=243810: Tue Feb 6 11:05:00 2024
write: IOPS=216, BW=216MiB/s (227MB/s)(12.7GiB/60072msec); 0 zone resets
slat (usec): min=3, max=149, avg=29.79, stdev=10.51
clat (usec): min=1418, max=187836, avg=73852.20, stdev=46210.44
lat (usec): min=1463, max=187876, avg=73881.99, stdev=46206.37
clat percentiles (usec):
| 1.00th=[ 1582], 5.00th=[ 1762], 10.00th=[ 1860], 20.00th=[ 4015],
| 30.00th=[ 32113], 40.00th=[100140], 50.00th=[102237], 60.00th=[104334],
| 70.00th=[105382], 80.00th=[107480], 90.00th=[109577], 95.00th=[111674],
| 99.00th=[123208], 99.50th=[126354], 99.90th=[183501], 99.95th=[187696],
| 99.99th=[187696]
bw ( KiB/s): min=110592, max=6230016, per=100.00%, avg=221603.09, stdev=564262.12, samples=120
iops : min= 108, max= 6084, avg=216.39, stdev=551.04, samples=120
lat (msec) : 2=14.84%, 4=5.14%, 10=5.03%, 20=2.95%, 50=3.66%
lat (msec) : 100=8.05%, 250=60.34%
cpu : usr=0.78%, sys=0.12%, ctx=6080, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.8%, 16=49.2%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=3.8%, 16=0.4%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12994,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=216MiB/s (227MB/s), 216MiB/s-216MiB/s (227MB/s-227MB/s), io=12.7GiB (13.6GB), run=60072-60072msec


root@pve:/d6# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=128MiB/s][w=128 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=244361: Tue Feb 6 11:06:58 2024
write: IOPS=215, BW=216MiB/s (226MB/s)(12.7GiB/60106msec); 0 zone resets
slat (usec): min=2, max=208, avg=27.82, stdev=11.62
clat (usec): min=491, max=210112, avg=74033.22, stdev=46430.98
lat (usec): min=510, max=210139, avg=74061.04, stdev=46428.34
clat percentiles (usec):
| 1.00th=[ 1565], 5.00th=[ 1745], 10.00th=[ 1860], 20.00th=[ 4015],
| 30.00th=[ 31851], 40.00th=[100140], 50.00th=[102237], 60.00th=[104334],
| 70.00th=[106431], 80.00th=[107480], 90.00th=[109577], 95.00th=[111674],
| 99.00th=[125305], 99.50th=[130548], 99.90th=[135267], 99.95th=[137364],
| 99.99th=[202376]
bw ( KiB/s): min=126976, max=6213632, per=100.00%, avg=221101.12, stdev=563476.63, samples=120
iops : min= 124, max= 6068, avg=215.88, stdev=550.28, samples=120
lat (usec) : 500=0.01%, 750=0.02%
lat (msec) : 2=15.72%, 4=4.24%, 10=5.06%, 20=3.33%, 50=3.21%
lat (msec) : 100=6.94%, 250=61.48%
cpu : usr=0.80%, sys=0.05%, ctx=5964, majf=0, minf=26
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=51.7%, 16=48.2%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12971,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=216MiB/s (226MB/s), 216MiB/s-216MiB/s (226MB/s-226MB/s), io=12.7GiB (13.6GB), run=60106-60106msec


root@pve:/d7# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=142MiB/s][w=142 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=244897: Tue Feb 6 11:08:51 2024
write: IOPS=213, BW=213MiB/s (223MB/s)(12.5GiB/60084msec); 0 zone resets
slat (usec): min=2, max=147, avg=22.61, stdev=10.52
clat (usec): min=1419, max=158951, avg=74984.16, stdev=47490.70
lat (usec): min=1422, max=158978, avg=75006.77, stdev=47495.44
clat percentiles (usec):
| 1.00th=[ 1680], 5.00th=[ 1778], 10.00th=[ 1827], 20.00th=[ 3752],
| 30.00th=[ 27132], 40.00th=[103285], 50.00th=[105382], 60.00th=[106431],
| 70.00th=[108528], 80.00th=[109577], 90.00th=[111674], 95.00th=[113771],
| 99.00th=[123208], 99.50th=[130548], 99.90th=[139461], 99.95th=[141558],
| 99.99th=[152044]
bw ( KiB/s): min=129024, max=6320128, per=100.00%, avg=218313.77, stdev=572139.45, samples=120
iops : min= 126, max= 6172, avg=213.18, stdev=558.73, samples=120
lat (msec) : 2=16.55%, 4=3.80%, 10=5.16%, 20=3.32%, 50=3.09%
lat (msec) : 100=3.91%, 250=64.18%
cpu : usr=0.65%, sys=0.09%, ctx=6220, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.5%, 16=49.4%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12800,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=213MiB/s (223MB/s), 213MiB/s-213MiB/s (223MB/s-223MB/s), io=12.5GiB (13.4GB), run=60084-60084msec


root@pve:/d8# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=143MiB/s][w=143 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=245547: Tue Feb 6 11:10:59 2024
write: IOPS=212, BW=213MiB/s (223MB/s)(12.5GiB/60107msec); 0 zone resets
slat (usec): min=2, max=140, avg=21.03, stdev=10.11
clat (usec): min=321, max=157313, avg=75147.69, stdev=47652.51
lat (usec): min=348, max=157355, avg=75168.72, stdev=47658.37
clat percentiles (usec):
| 1.00th=[ 1680], 5.00th=[ 1795], 10.00th=[ 1827], 20.00th=[ 3752],
| 30.00th=[ 29230], 40.00th=[101188], 50.00th=[104334], 60.00th=[106431],
| 70.00th=[108528], 80.00th=[110625], 90.00th=[112722], 95.00th=[115868],
| 99.00th=[129500], 99.50th=[137364], 99.90th=[156238], 99.95th=[156238],
| 99.99th=[156238]
bw ( KiB/s): min=110592, max=6324224, per=100.00%, avg=217881.42, stdev=572306.51, samples=120
iops : min= 108, max= 6176, avg=212.72, stdev=558.90, samples=120
lat (usec) : 500=0.01%
lat (msec) : 2=18.06%, 4=2.28%, 10=5.06%, 20=3.26%, 50=3.22%
lat (msec) : 100=5.92%, 250=62.20%
cpu : usr=0.58%, sys=0.11%, ctx=6260, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.3%, 16=49.6%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12783,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=213MiB/s (223MB/s), 213MiB/s-213MiB/s (223MB/s-223MB/s), io=12.5GiB (13.4GB), run=60107-60107msec


root@pve:/d9# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=144MiB/s][w=144 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=245879: Tue Feb 6 11:12:03 2024
write: IOPS=212, BW=213MiB/s (223MB/s)(12.5GiB/60110msec); 0 zone resets
slat (usec): min=3, max=164, avg=23.09, stdev=10.13
clat (usec): min=1496, max=203583, avg=75078.42, stdev=47402.52
lat (usec): min=1544, max=203616, avg=75101.50, stdev=47406.81
clat percentiles (usec):
| 1.00th=[ 1745], 5.00th=[ 2638], 10.00th=[ 2769], 20.00th=[ 4621],
| 30.00th=[ 25297], 40.00th=[102237], 50.00th=[105382], 60.00th=[106431],
| 70.00th=[108528], 80.00th=[110625], 90.00th=[112722], 95.00th=[113771],
| 99.00th=[124257], 99.50th=[132645], 99.90th=[149947], 99.95th=[149947],
| 99.99th=[196084]
bw ( KiB/s): min=130810, max=5505024, per=100.00%, avg=218088.14, stdev=511365.24, samples=120
iops : min= 127, max= 5376, avg=212.96, stdev=499.38, samples=120
lat (msec) : 2=4.10%, 4=14.40%, 10=6.63%, 20=3.72%, 50=3.51%
lat (msec) : 100=4.46%, 250=63.19%
cpu : usr=0.63%, sys=0.09%, ctx=6292, majf=0, minf=26
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.4%, 16=49.5%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.0%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12794,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=213MiB/s (223MB/s), 213MiB/s-213MiB/s (223MB/s-223MB/s), io=12.5GiB (13.4GB), run=60110-60110msec


root@pve:/d10# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=144MiB/s][w=144 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=246313: Tue Feb 6 11:13:25 2024
write: IOPS=212, BW=213MiB/s (223MB/s)(12.5GiB/60107msec); 0 zone resets
slat (usec): min=3, max=146, avg=27.17, stdev=11.22
clat (usec): min=1432, max=202319, avg=75156.64, stdev=47689.37
lat (usec): min=1488, max=202346, avg=75183.81, stdev=47687.83
clat percentiles (usec):
| 1.00th=[ 1598], 5.00th=[ 1762], 10.00th=[ 1876], 20.00th=[ 3752],
| 30.00th=[ 27657], 40.00th=[103285], 50.00th=[106431], 60.00th=[107480],
| 70.00th=[108528], 80.00th=[110625], 90.00th=[111674], 95.00th=[113771],
| 99.00th=[125305], 99.50th=[130548], 99.90th=[139461], 99.95th=[141558],
| 99.99th=[196084]
bw ( KiB/s): min=131072, max=6242107, per=100.00%, avg=217685.38, stdev=565461.29, samples=120
iops : min= 128, max= 6095, avg=212.50, stdev=552.15, samples=120
lat (msec) : 2=15.83%, 4=4.54%, 10=4.55%, 20=3.64%, 50=3.62%
lat (msec) : 100=3.92%, 250=63.90%
cpu : usr=0.68%, sys=0.14%, ctx=5895, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=51.7%, 16=48.3%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.0%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12778,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=213MiB/s (223MB/s), 213MiB/s-213MiB/s (223MB/s-223MB/s), io=12.5GiB (13.4GB), run=60107-60107msec


root@pve:/d11# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=144MiB/s][w=144 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=246771: Tue Feb 6 11:14:55 2024
write: IOPS=217, BW=218MiB/s (228MB/s)(12.8GiB/60106msec); 0 zone resets
slat (usec): min=2, max=175, avg=26.71, stdev=10.76
clat (usec): min=1441, max=205190, avg=73407.28, stdev=45749.71
lat (usec): min=1496, max=205207, avg=73433.99, stdev=45748.81
clat percentiles (usec):
| 1.00th=[ 1582], 5.00th=[ 1745], 10.00th=[ 1860], 20.00th=[ 4113],
| 30.00th=[ 34866], 40.00th=[100140], 50.00th=[102237], 60.00th=[103285],
| 70.00th=[104334], 80.00th=[105382], 90.00th=[107480], 95.00th=[109577],
| 99.00th=[127402], 99.50th=[132645], 99.90th=[139461], 99.95th=[143655],
| 99.99th=[198181]
bw ( KiB/s): min=126976, max=6225920, per=100.00%, avg=223004.97, stdev=564281.03, samples=120
iops : min= 124, max= 6080, avg=217.77, stdev=551.06, samples=120
lat (msec) : 2=15.76%, 4=4.12%, 10=4.98%, 20=3.26%, 50=3.25%
lat (msec) : 100=7.61%, 250=61.01%
cpu : usr=0.75%, sys=0.09%, ctx=6082, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=51.5%, 16=48.4%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.0%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,13082,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=218MiB/s (228MB/s), 218MiB/s-218MiB/s (228MB/s-228MB/s), io=12.8GiB (13.7GB), run=60106-60106msec


root@pve:/d12# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=141MiB/s][w=141 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=247088: Tue Feb 6 11:16:00 2024
write: IOPS=213, BW=214MiB/s (224MB/s)(12.6GiB/60128msec); 0 zone resets
slat (usec): min=3, max=159, avg=28.28, stdev=10.16
clat (usec): min=1467, max=178457, avg=74647.99, stdev=46810.07
lat (usec): min=1525, max=178469, avg=74676.26, stdev=46805.73
clat percentiles (usec):
| 1.00th=[ 1680], 5.00th=[ 1975], 10.00th=[ 2999], 20.00th=[ 5080],
| 30.00th=[ 28443], 40.00th=[101188], 50.00th=[103285], 60.00th=[105382],
| 70.00th=[106431], 80.00th=[108528], 90.00th=[112722], 95.00th=[115868],
| 99.00th=[127402], 99.50th=[135267], 99.90th=[152044], 99.95th=[152044],
| 99.99th=[170918]
bw ( KiB/s): min=104448, max=5341184, per=100.00%, avg=219264.62, stdev=499668.31, samples=120
iops : min= 102, max= 5216, avg=214.11, stdev=487.96, samples=120
lat (msec) : 2=5.20%, 4=11.96%, 10=7.70%, 20=3.62%, 50=3.58%
lat (msec) : 100=5.22%, 250=62.72%
cpu : usr=0.76%, sys=0.07%, ctx=6141, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.5%, 16=49.4%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12864,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=214MiB/s (224MB/s), 214MiB/s-214MiB/s (224MB/s-224MB/s), io=12.6GiB (13.5GB), run=60128-60128msec


root@pve:/d13# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=157MiB/s][w=157 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=247471: Tue Feb 6 11:17:12 2024
write: IOPS=208, BW=209MiB/s (219MB/s)(12.2GiB/60074msec); 0 zone resets
slat (usec): min=3, max=174, avg=25.63, stdev=10.62
clat (usec): min=1438, max=162970, avg=76595.70, stdev=49210.72
lat (usec): min=1501, max=162997, avg=76621.33, stdev=49211.38
clat percentiles (usec):
| 1.00th=[ 1614], 5.00th=[ 1795], 10.00th=[ 1893], 20.00th=[ 4752],
| 30.00th=[ 23200], 40.00th=[105382], 50.00th=[108528], 60.00th=[109577],
| 70.00th=[111674], 80.00th=[112722], 90.00th=[115868], 95.00th=[119014],
| 99.00th=[135267], 99.50th=[143655], 99.90th=[160433], 99.95th=[162530],
| 99.99th=[162530]
bw ( KiB/s): min=108544, max=5865472, per=100.00%, avg=213684.09, stdev=536419.15, samples=120
iops : min= 106, max= 5728, avg=208.66, stdev=523.85, samples=120
lat (msec) : 2=14.10%, 4=4.47%, 10=6.69%, 20=3.80%, 50=3.77%
lat (msec) : 100=3.53%, 250=63.65%
cpu : usr=0.68%, sys=0.09%, ctx=5910, majf=0, minf=25
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=51.2%, 16=48.7%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12528,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=209MiB/s (219MB/s), 209MiB/s-209MiB/s (219MB/s-219MB/s), io=12.2GiB (13.1GB), run=60074-60074msec


root@pve:/d14# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=144MiB/s][w=144 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=247906: Tue Feb 6 11:18:37 2024
write: IOPS=211, BW=211MiB/s (221MB/s)(12.4GiB/60051msec); 0 zone resets
slat (usec): min=2, max=149, avg=23.04, stdev=11.83
clat (usec): min=1453, max=149346, avg=75703.21, stdev=48189.86
lat (usec): min=1505, max=149374, avg=75726.25, stdev=48194.17
clat percentiles (usec):
| 1.00th=[ 1647], 5.00th=[ 1762], 10.00th=[ 1811], 20.00th=[ 3687],
| 30.00th=[ 28181], 40.00th=[105382], 50.00th=[107480], 60.00th=[108528],
| 70.00th=[109577], 80.00th=[110625], 90.00th=[112722], 95.00th=[114820],
| 99.00th=[124257], 99.50th=[132645], 99.90th=[141558], 99.95th=[141558],
| 99.99th=[143655]
bw ( KiB/s): min=124928, max=6276854, per=100.00%, avg=216732.86, stdev=570680.03, samples=119
iops : min= 122, max= 6129, avg=211.59, stdev=557.24, samples=119
lat (msec) : 2=16.45%, 4=4.10%, 10=5.03%, 20=3.33%, 50=3.39%
lat (msec) : 100=2.78%, 250=64.91%
cpu : usr=0.62%, sys=0.11%, ctx=6117, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.8%, 16=49.2%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12672,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=211MiB/s (221MB/s), 211MiB/s-211MiB/s (221MB/s-221MB/s), io=12.4GiB (13.3GB), run=60051-60051msec


root@pve:/d15# fio --name=seqwrite --ioengine=posixaio --rw=write --bs=1M --numjobs=1 --size=10G --iodepth=16 --filename=seqtest --runtime=60 --time_based
seqwrite: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=posixaio, iodepth=16
fio-3.33
Starting 1 process
seqwrite: Laying out IO file (1 file / 10240MiB)
Jobs: 1 (f=1): [W(1)][100.0%][w=153MiB/s][w=153 IOPS][eta 00m:00s]
seqwrite: (groupid=0, jobs=1): err= 0: pid=248315: Tue Feb 6 11:20:01 2024
write: IOPS=212, BW=213MiB/s (223MB/s)(12.5GiB/60108msec); 0 zone resets
slat (usec): min=2, max=151, avg=23.89, stdev=11.98
clat (usec): min=1442, max=150546, avg=75108.40, stdev=47642.38
lat (usec): min=1492, max=150574, avg=75132.30, stdev=47645.73
clat percentiles (usec):
| 1.00th=[ 1647], 5.00th=[ 1778], 10.00th=[ 1827], 20.00th=[ 3785],
| 30.00th=[ 28967], 40.00th=[ 98042], 50.00th=[103285], 60.00th=[105382],
| 70.00th=[108528], 80.00th=[111674], 90.00th=[115868], 95.00th=[116917],
| 99.00th=[127402], 99.50th=[135267], 99.90th=[143655], 99.95th=[145753],
| 99.99th=[145753]
bw ( KiB/s): min=102400, max=6258688, per=100.00%, avg=218045.62, stdev=566781.82, samples=120
iops : min= 100, max= 6112, avg=212.90, stdev=553.50, samples=120
lat (msec) : 2=16.08%, 4=4.24%, 10=5.15%, 20=3.18%, 50=3.25%
lat (msec) : 100=11.79%, 250=56.30%
cpu : usr=0.67%, sys=0.09%, ctx=6136, majf=0, minf=26
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=50.9%, 16=49.1%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.8%, 8=0.1%, 16=4.2%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,12784,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
WRITE: bw=213MiB/s (223MB/s), 213MiB/s-213MiB/s (223MB/s-223MB/s), io=12.5GiB (13.4GB), run=60108-60108msec

looking at the results the individual drives are getting the same in terns of reads and writes so its probably the ZFS read cache bumping up the numbers

you can run the fio test again with the “–direct=1” option which will not use the zfs cache

Also what is your ZFS pool setup like again? is it just a raid Z2?

RaidZ and Mirror are the two most recent sets I’ve been doing. Here come the new numbers (shortly)