I have an interesting situation.
I am in the process of moving from multiple separate Synology NAS to a new 45HomeLab HL15 (1.0) running TrueNAS scale 24.05.01.
ISSUE:
I started moving over 100TB of data over to the system off my Synology Units. I used NFSv4 mounts inside the Synology units to mount the NFS shares from the HL15.
The data copied over without issue, no errors etc. The data was copying over at only ~111 MB/s due to the 1GB Ethernet connections on my Synology Units.
I then started to perform CRC checks of the files using windows SMB, but using two computers concurrently again only over 1GB Ethernet as my desktop and laptop only have 1GB Ethernet ports allowing to get 2.0 Gb per second on the CRC checks.
I began getting frequent disk errors on both the drives connected to the motherboard’s SATA controller and the drives connected to the SAS controller. It can be seen from the reporting page on TrueNAS that every time a drive timed out during SMB transfers, the data transfer stopped for that volume.
The drive issues MOSTLY occur when using SMB transfers. I did get one drive errors when using NFS when mapping the same data set and performing CRCs on the same files. I even stopped using SMB shares on the windows machine and mounted my TrueNAS NFS share on windows 11 and when using the NFS share, i again get one error. The drive errors only occur when using SMB.
I am not sure why i am getting these errors. i am leaning toward bad cables or bad back plane.
this weekend, i plan to move one of my pools (with 5x drives that seemed to get the most frequent errors) to my dell SAS expander as the SSDs attached to that do not appear to have had any issues yet. this will allow me to bypass the HL15’s factory cables and the backplane. If the errors go away, then i think that would show it is either a cable problem or a back plane issue. if the errors continue even after moving to my SAS expander, then i think that would show more of a drive issue, or a power supply issue (the SAS expander has a separate Corsair XM1000 power supply).
System:
HL15 (1.0) fully burnt-in
SuperMicro X11SPH-nCTPF
Corsair XM1000
Xeon Silver
128GB ECC RAM
Added cards:
Nvidia RTX A400
LSI-9400-8e tied to DELL N4C2D JBOD 24 internal 12 external lane SAS2 6Gbps expander board
3x Micron 1.92TB SSDs
LSI-9400-8i replacing the motherboard integrated Intel 3008 controller. The 3008 controller is physically disabled by motherboard jumper pin
4 port 1GB Ethernet Intel i350 Chip-set
#########################################
HDD details
slots 10 through 15 [running through LSI 9400-8i]: WD Gold 18TB drives (WDC WD181KRYZ-01 Firmware: 1H01)
slots 1 through 5 [running through motherboard SATA controller]: WD Gold 18TB drives (WDC WD181KRYZ-01 Firmware: 1H01)
slot 6 [running through motherboard SATA controller]: WD purple 8TB drive (WDC WD82PURZ-85T Firmware: 0A82)
#########################################
root@truenas[~]# lsscsi
[0:0:0:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sda /volume2
[0:0:1:0] disk ATA WDC WD82PURZ-85T 0A82 /dev/sdg /volume5
[0:0:2:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sdi /volume2
[0:0:3:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sdc /volume2
[0:0:4:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sdd /volume2
[0:0:5:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sdf /volume2
[0:0:6:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sde /volume2
[0:0:7:0] enclosu BROADCOM VirtualSES 03 -
[3:0:0:0] enclosu AHCI SGPIO Enclosure 2.00 -
[4:0:0:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sdb /volume3
[5:0:0:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sdk /volume3
**[6:0:0:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sdl /volume3
[7:0:0:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sdh /volume3
**[8:0:0:0] disk ATA WDC WD181KRYZ-01 1H01 /dev/sdj /volume3
[12:0:0:0] enclosu AHCI SGPIO Enclosure 2.00 -
[13:0:0:0] disk ATA Micron_5400_MTFD U002 /dev/sdm /volume1
[13:0:1:0] disk ATA Micron_5400_MTFD U002 /dev/sdn /volume1
[13:0:2:0] disk ATA Micron_5400_MTFD U002 /dev/sdo /volume1
[13:0:3:0] enclosu Dell SAS EXP V0110 0500 -
[14:0:0:0] cd/dvd JetKVM Virtual Media /dev/sr0
[N:0:1:1] disk KINGSTON SNV3S1000G__1 /dev/nvme0n1
#########################################
root@truenas[~]# find -L /sys/bus/pci/devices/*/ata*/host*/target* -maxdepth 3 -name "sd*" 2>/dev/null | egrep block |egrep --colour '(ata[0-9]*)|(sd.*)'
/sys/bus/pci/devices/0000:00:17.0/ata3/host4/target4:0:0/4:0:0:0/block/sdb
/sys/bus/pci/devices/0000:00:17.0/ata4/host5/target5:0:0/5:0:0:0/block/sdk
/sys/bus/pci/devices/0000:00:17.0/ata5/host6/target6:0:0/6:0:0:0/block/sdl
/sys/bus/pci/devices/0000:00:17.0/ata6/host7/target7:0:0/7:0:0:0/block/sdh
/sys/bus/pci/devices/0000:00:17.0/ata7/host8/target8:0:0/8:0:0:0/block/sdj
#########################################
I have four pools (i manually added the pool names to the output of the lsscsi command above)
one is for my apps and uses the SSDs and is Z1
one is for some of my data and uses 6x drives in Z1
one is for the rest of my data and uses 6x drives in Z1
the final is a single drive for Frigate surveillance recording
root@truenas[~]# zpool status
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Jul 6 03:45:06 2025
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
nvme0n1p3 ONLINE 0 0 0
errors: No known data errors
pool: volume1
state: ONLINE
scan: scrub repaired 0B in 00:22:44 with 0 errors on Sun Jun 15 06:10:57 2025
config:
NAME STATE READ WRITE CKSUM
volume1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
4596200f-3a52-4c76-a9db-9cc9faabaa9b ONLINE 0 0 0
49834b66-f515-4880-8109-b1f947a3365f ONLINE 0 0 0
4efb7611-233a-4c98-88c8-e325e4018666 ONLINE 0 0 0
errors: No known data errors
pool: volume2
state: ONLINE
scan: scrub repaired 0B in 07:12:40 with 0 errors on Sun Jun 15 13:24:50 2025
config:
NAME STATE READ WRITE CKSUM
volume2 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
4c17130c-18e1-4039-a1c9-84746d532a64 ONLINE 0 0 0
08d3a783-bd44-47ce-9fe2-c609d27b2068 ONLINE 0 0 0
b3c7e3f4-86fe-45bd-8b10-c520ddf81377 ONLINE 0 0 0
385b8d00-c5b3-4d72-a75d-3bffdcda5abe ONLINE 0 0 0
bcd7a35c-a74e-4dba-8033-912853232182 ONLINE 0 0 0
c5d9c40a-0cee-480d-a1fd-170ae6efe486 ONLINE 0 0 0
errors: No known data errors
pool: volume3
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
volume3 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
bc6cf883-dde3-4a60-9f74-8fc3f389faab ONLINE 0 0 0
8fe3c3d5-4213-4412-b162-081cd3580650 ONLINE 0 0 0
d4140d27-be38-412f-843e-cf420187ee64 ONLINE 0 0 0
004f830a-ac49-44e0-b0d3-d9f84a03288e ONLINE 0 0 0
1e1c7c7b-7c60-4602-8546-893c94f365bb ONLINE 0 0 0
errors: No known data errors
pool: volume5
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
volume5 ONLINE 0 0 0
c7f8210f-a2a3-4e43-bf97-be50d99620da ONLINE 0 0 0
errors: No known data errors
#########################################
root@truenas[~]# dmesg | grep ata5
[ 2.764142] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.764597] ata5.00: ATA-11: WDC WD181KRYZ-01AGBB0, 01.01H01, max UDMA/133
[ 2.772909] ata5.00: 35156656128 sectors, multi 16: LBA48 NCQ (depth 32), AA
[ 2.774587] ata5.00: Features: NCQ-sndrcv NCQ-prio
[ 2.787591] ata5.00: configured for UDMA/133
[829544.395755] ata5.00: exception Emask 0x0 SAct 0x1880002 SErr 0x0 action 0x6 frozen
[829544.396278] ata5.00: failed command: READ FPDMA QUEUED
[829544.396760] ata5.00: cmd 60/00:08:f8:48:12/08:00:32:07:00/40 tag 1 ncq dma 1048576 in
[829544.397713] ata5.00: status: { DRDY }
[829544.398181] ata5.00: failed command: READ FPDMA QUEUED
[829544.398648] ata5.00: cmd 60/00:98:a8:34:0c/05:00:96:07:00/40 tag 19 ncq dma 655360 in
[829544.399597] ata5.00: status: { DRDY }
[829544.400086] ata5.00: failed command: READ FPDMA QUEUED
[829544.400566] ata5.00: cmd 60/40:b8:78:48:12/00:00:32:07:00/40 tag 23 ncq dma 32768 in
[829544.401580] ata5.00: status: { DRDY }
[829544.402080] ata5.00: failed command: READ FPDMA QUEUED
[829544.402575] ata5.00: cmd 60/40:c0:b8:48:12/00:00:32:07:00/40 tag 24 ncq dma 32768 in
[829544.403375] ata5.00: status: { DRDY }
[829544.403799] ata5: hard resetting link
[829544.718017] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[829544.780035] ata5.00: configured for UDMA/133
[829544.780058] ata5: EH complete
[841426.885067] ata5.00: exception Emask 0x0 SAct 0xc84 SErr 0x0 action 0x6 frozen
[841426.885789] ata5.00: failed command: READ FPDMA QUEUED
[841426.886513] ata5.00: cmd 60/40:10:88:0c:e2/05:00:b2:07:00/40 tag 2 ncq dma 688128 in
[841426.887991] ata5.00: status: { DRDY }
[841426.888743] ata5.00: failed command: READ FPDMA QUEUED
[841426.889397] ata5.00: cmd 60/40:38:b0:d7:18/00:00:d2:01:00/40 tag 7 ncq dma 32768 in
[841426.890218] ata5.00: status: { DRDY }
[841426.890626] ata5.00: failed command: READ FPDMA QUEUED
[841426.891026] ata5.00: cmd 60/40:50:30:d8:18/00:00:d2:01:00/40 tag 10 ncq dma 32768 in
[841426.891843] ata5.00: status: { DRDY }
[841426.892245] ata5.00: failed command: READ FPDMA QUEUED
[841426.892645] ata5.00: cmd 60/00:58:48:11:e2/08:00:b2:07:00/40 tag 11 ncq dma 1048576 in
[841426.893482] ata5.00: status: { DRDY }
[841426.893890] ata5: hard resetting link
[841427.203255] ata5: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[841427.260908] ata5.00: configured for UDMA/133
[841427.260954] ata5: EH complete
root@truenas[~]# dmesg | grep ata7
[ 2.454588] ata7: SATA max UDMA/133 abar m524288@0xaa700000 port 0xaa700300 irq 186 lpm-pol 0
[ 2.764061] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.764516] ata7.00: ATA-11: WDC WD181KRYZ-01AGBB0, 01.01H01, max UDMA/133
[ 2.772888] ata7.00: 35156656128 sectors, multi 16: LBA48 NCQ (depth 32), AA
[ 2.774568] ata7.00: Features: NCQ-sndrcv NCQ-prio
[ 2.787810] ata7.00: configured for UDMA/133
[838055.874386] ata7.00: exception Emask 0x0 SAct 0x6018001f SErr 0x0 action 0x6 frozen
[838055.875473] ata7.00: failed command: READ FPDMA QUEUED
[838055.876346] ata7.00: cmd 60/80:00:10:be:4b/00:00:d4:07:00/40 tag 0 ncq dma 65536 in
[838055.877996] ata7.00: status: { DRDY }
[838055.878864] ata7.00: failed command: READ FPDMA QUEUED
[838055.879708] ata7.00: cmd 60/80:08:10:bd:4b/00:00:d4:07:00/40 tag 1 ncq dma 65536 in
[838055.881462] ata7.00: status: { DRDY }
[838055.882372] ata7.00: failed command: READ FPDMA QUEUED
[838055.883284] ata7.00: cmd 60/40:10:98:ac:4b/00:00:d4:07:00/40 tag 2 ncq dma 32768 in
[838055.885161] ata7.00: status: { DRDY }
[838055.886095] ata7.00: failed command: READ FPDMA QUEUED
[838055.887068] ata7.00: cmd 60/40:18:58:ac:4b/00:00:d4:07:00/40 tag 3 ncq dma 32768 in
[838055.889482] ata7.00: status: { DRDY }
[838055.890917] ata7.00: failed command: READ FPDMA QUEUED
[838055.891912] ata7.00: cmd 60/00:20:d0:c3:4b/01:00:d4:07:00/40 tag 4 ncq dma 131072 in
[838055.893768] ata7.00: status: { DRDY }
[838055.894705] ata7.00: failed command: READ FPDMA QUEUED
[838055.895508] ata7.00: cmd 60/80:98:58:af:4b/00:00:d4:07:00/40 tag 19 ncq dma 65536 in
[838055.897075] ata7.00: status: { DRDY }
[838055.897911] ata7.00: failed command: READ FPDMA QUEUED
[838055.898737] ata7.00: cmd 60/c0:a0:d0:b3:4b/00:00:d4:07:00/40 tag 20 ncq dma 98304 in
[838055.900235] ata7.00: status: { DRDY }
[838055.900947] ata7.00: failed command: READ FPDMA QUEUED
[838055.901662] ata7.00: cmd 60/00:e8:00:a1:08/04:00:fd:06:00/40 tag 29 ncq dma 524288 in
[838055.903131] ata7.00: status: { DRDY }
[838055.903847] ata7.00: failed command: READ FPDMA QUEUED
[838055.904500] ata7.00: cmd 60/40:f0:90:b6:4b/00:00:d4:07:00/40 tag 30 ncq dma 32768 in
[838055.905837] ata7.00: status: { DRDY }
[838055.906517] ata7: hard resetting link
[838056.216634] ata7: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[838056.293820] ata7.00: configured for UDMA/133
[838056.293845] ata7: EH complete
I am also getting power-on device reset errors for drives [0:0:2:0], [0:0:3:0], [0:0:4:0], and [0:0:5:0] which are different from the ata5 and ata7 errors above.