Yep, same issue. Here’s a sample:
Dec 26 11:23:22 arnold kernel: mpt3sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Dec 26 11:23:22 arnold kernel: sd 2:0:8:0: [sdf] tag#853 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 26 11:23:22 arnold kernel: sd 2:0:8:0: [sdf] tag#853 CDB: Write(16) 8a 00 00 00 00 00 01 c0 88 f8 00 00 01 40 00 00
Dec 26 11:23:22 arnold kernel: blk_update_request: I/O error, dev sdf, sector 29395192 op 0x1:(WRITE) flags 0x700 phys_seg 18 prio class 0
Dec 26 11:23:22 arnold kernel: zio pool=pool0 vdev=/dev/disk/by-vdev/1-15-part1 error=5 type=2 offset=15049289728 size=163840 flags=40080caa
Dec 26 11:24:29 arnold kernel: mpt3sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Dec 26 11:24:29 arnold kernel: sd 2:0:8:0: [sdf] tag#901 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 26 11:24:29 arnold kernel: sd 2:0:8:0: [sdf] tag#901 CDB: Write(16) 8a 00 00 00 00 00 04 17 fd 10 00 00 01 40 00 00
Dec 26 11:24:29 arnold kernel: blk_update_request: I/O error, dev sdf, sector 68680976 op 0x1:(WRITE) flags 0x700 phys_seg 12 prio class 0
Dec 26 11:24:29 arnold kernel: zio pool=pool0 vdev=/dev/disk/by-vdev/1-15-part1 error=5 type=2 offset=35163611136 size=163840 flags=40080caa
Dec 26 11:29:17 arnold kernel: mpt3sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Dec 26 11:29:17 arnold kernel: sd 2:0:9:0: [sda] tag#985 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 26 11:29:17 arnold kernel: sd 2:0:9:0: [sda] tag#985 CDB: Write(16) 8a 00 00 00 00 00 01 74 28 00 00 00 01 40 00 00
Dec 26 11:29:17 arnold kernel: blk_update_request: I/O error, dev sda, sector 24389632 op 0x1:(WRITE) flags 0x700 phys_seg 7 prio class 0
Dec 26 11:29:17 arnold kernel: zio pool=pool0 vdev=/dev/disk/by-vdev/1-11-part1 error=5 type=2 offset=12486443008 size=163840 flags=40080caa
Dec 26 11:29:18 arnold kernel: mpt3sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Dec 26 11:29:18 arnold kernel: sd 2:0:9:0: [sda] tag#965 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 26 11:29:18 arnold kernel: sd 2:0:9:0: [sda] tag#965 CDB: Write(16) 8a 00 00 00 00 00 01 78 dd 40 00 00 01 40 00 00
Dec 26 11:29:18 arnold kernel: blk_update_request: I/O error, dev sda, sector 24698176 op 0x1:(WRITE) flags 0x700 phys_seg 18 prio class 0
Dec 26 11:29:18 arnold kernel: zio pool=pool0 vdev=/dev/disk/by-vdev/1-11-part1 error=5 type=2 offset=12644417536 size=163840 flags=40080caa
Dec 26 11:29:21 arnold kernel: mpt3sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Dec 26 11:29:21 arnold kernel: sd 2:0:9:0: [sda] tag#1014 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 26 11:29:21 arnold kernel: sd 2:0:9:0: [sda] tag#1014 CDB: Write(16) 8a 00 00 00 00 00 01 83 bb 00 00 00 01 40 00 00
Dec 26 11:29:21 arnold kernel: blk_update_request: I/O error, dev sda, sector 25410304 op 0x1:(WRITE) flags 0x700 phys_seg 18 prio class 0
Dec 26 11:29:21 arnold kernel: zio pool=pool0 vdev=/dev/disk/by-vdev/1-11-part1 error=5 type=2 offset=13009027072 size=163840 flags=40080caa
Dec 26 11:29:22 arnold kernel: mpt3sas_cm0: log_info(0x31120303): originator(PL), code(0x12), sub_code(0x0303)
Dec 26 11:29:22 arnold kernel: sd 2:0:9:0: [sda] tag#968 FAILED Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK cmd_age=0s
Dec 26 11:29:22 arnold kernel: sd 2:0:9:0: [sda] tag#968 CDB: Write(16) 8a 00 00 00 00 00 01 87 ba 00 00 00 01 40 00 00
Dec 26 11:29:22 arnold kernel: blk_update_request: I/O error, dev sda, sector 25672192 op 0x1:(WRITE) flags 0x700 phys_seg 15 prio class 0
Dec 26 11:29:22 arnold kernel: zio pool=pool0 vdev=/dev/disk/by-vdev/1-11-part1 error=5 type=2 offset=13143113728 size=163840 flags=40080caa
SMART tests show OK; ack’d that ZFS can expose issues that SMART may not.
# smartctl -x /dev/sdf
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.18.0-513.9.1.el8_9.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HGST
Product: HUH728080AL5200
Revision: A515
Compliance: SPC-4
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000cca23b025958
Serial number: 2EG191HJ
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Dec 26 11:37:14 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
Read Cache is: Enabled
Writeback Cache is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 30 C
Drive Trip Temperature: 85 C
Manufactured in week 48 of year 2014
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 8
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 127690
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 620007469875200
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 13536 1645.504 0
write: 0 0 0 0 301955 1839.323 0
verify: 0 0 0 0 94507 0.000 0
Non-medium error count: 0
No Self-tests have been logged
Background scan results log
Status: scan is active
Accumulated power on time, hours:minutes 19288:53 [1157333 minutes]
Number of background scans performed: 116, scan progress: 1.61%
Number of background medium scans performed: 116
Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 1
number of phys = 1
phy identifier = 0
attached device type: SAS or SATA device
attached reason: loss of dword synchronization
reason: unknown
negotiated logical link rate: phy enabled; 12 Gbps
attached initiator port: ssp=1 stp=1 smp=1
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000cca23b025959
attached SAS address = 0x500304801cfed211
attached phy identifier = 6
Invalid DWORD count = 181
Running disparity error count = 173
Loss of DWORD synchronization = 12
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 181
Running disparity error count: 173
Loss of dword synchronization count: 12
Phy reset problem count: 0
relative target port id = 2
generation code = 1
number of phys = 1
phy identifier = 1
attached device type: no device attached
attached reason: unknown
reason: power on
negotiated logical link rate: phy enabled; unknown
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000cca23b02595a
attached SAS address = 0x0
attached phy identifier = 0
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 0
Phy reset problem count: 0
# smartctl -x /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-4.18.0-513.9.1.el8_9.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HGST
Product: HUH728080AL5200
Revision: A515
Compliance: SPC-4
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000cca23b05d224
Serial number: 2EG367EJ
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Tue Dec 26 11:37:47 2023 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
Read Cache is: Enabled
Writeback Cache is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Current Drive Temperature: 29 C
Drive Trip Temperature: 85 C
Manufactured in week 48 of year 2014
Specified cycle count over device lifetime: 50000
Accumulated start-stop cycles: 8
Specified load-unload count over device lifetime: 600000
Accumulated load-unload cycles: 127563
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 425305076400128
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 0 0 0 0 7595 1612.876 0
write: 0 0 0 0 169386 1494.826 0
verify: 0 0 0 0 39699 0.000 0
Non-medium error count: 0
No Self-tests have been logged
Background scan results log
Status: scan is active
Accumulated power on time, hours:minutes 19289:05 [1157345 minutes]
Number of background scans performed: 116, scan progress: 0.65%
Number of background medium scans performed: 116
Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 1
number of phys = 1
phy identifier = 0
attached device type: SAS or SATA device
attached reason: loss of dword synchronization
reason: unknown
negotiated logical link rate: phy enabled; 12 Gbps
attached initiator port: ssp=1 stp=1 smp=1
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000cca23b05d225
attached SAS address = 0x500304801cfed212
attached phy identifier = 2
Invalid DWORD count = 812
Running disparity error count = 661
Loss of DWORD synchronization = 15
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 812
Running disparity error count: 661
Loss of dword synchronization count: 15
Phy reset problem count: 0
relative target port id = 2
generation code = 1
number of phys = 1
phy identifier = 1
attached device type: no device attached
attached reason: unknown
reason: power on
negotiated logical link rate: phy enabled; unknown
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000cca23b05d226
attached SAS address = 0x0
attached phy identifier = 0
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 0
Phy reset problem count: 0
I had an issue with 1-15 before this and swapped the drive out, then noticed the issues with both 1-11 and 1-15 this AM. So far drive swap count is:
All drives are the same make/manufacturer. They’re old HGST 8TB He drives from an OpenStack environment. They have some hours on them, but they are nowhere near their EOL.
I have a ton of other disks I can swap out - 2TB and 4TB WD Enterprise, an 8TB Seagate IronWolf (brand new), etc. but at some point we’ll have to admit it’s not the drives.