I looked and couldn’t find either.
EDIT: Found the revision number: 1.01.
I looked and couldn’t find either.
EDIT: Found the revision number: 1.01.
FYI - I’m still going strong with 0 errors running as SAS-2 6Gps. I’m on the 6th Benchmark FIO test and I keep the head -c 1000G </dev/random> going at the same time. Pool now has over 3TiB allocated.
hl15:~$ zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
SEAGATE6Z2 43.7T 3.38T 40.3T - - 0% 7% 1.00x ONLINE -
Every 2.0s: zpool status hl15: Wed Dec 27 19:18:58 2023
pool: SEAGATE6Z2
state: ONLINE
scan: resilvered 2.75M in 00:00:01 with 0 errors on Wed Dec 27 10
:45:12 2023
config:
NAME STATE READ WRITE CKSUM
SEAGATE6Z2 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
1-9 ONLINE 0 0 0
1-10 ONLINE 0 0 0
1-11 ONLINE 0 0 0
1-13 ONLINE 0 0 0
1-14 ONLINE 0 0 0
1-15 ONLINE 0 0 0
errors: No known data errors
Is there a reason this wasn’t caught in the burn in tests? Also one of you is using sata drives and still experiencing this issue?
I wonder if these bad disk errors I am getting are due to this as well.
How interesting …
FYI - Supermicro support has already responded. I just sent them a dump from Broadcom’s LSIget script.
I’ll let them know about the SAS2/6Gbps issue. I don’t know that I can pin my drives to SAS2 but I’ll check it out.
To be fair, burn-in tests are not all encompassing. It’ more of the 80/20 rule and all depends on the test suite chosen. In fact, I would say burn-ins probably miss as much as they catch. I believe 45Drives has already said although they support SAS in 7 slots but that they are targeting SATA drives so I wouldn’t be surprise if most tests involved SATA and not SAS drives.
As far as I know/remember, the errors have only happened on SAS drives so I would wager your bad disk errors are probably unrelated but anything is possible.
Try out the SeaChestUtilities I linked above. It looks like it might at least partially work on non-seagate drives as they have a --onlySeagate
flag. I bet sg3-utils probably has a way to do it too.
SeaChest scripts don’t work, unfortunately.
# ./SeaChest_Configure_x86_64-alpine-linux-musl_static -d /dev/sg0 --phySpeed 3
==========================================================================================
SeaChest_Configure - Seagate drive utilities - NVMe Enabled
Copyright (c) 2014-2023 Seagate Technology LLC and/or its Affiliates, All Rights Reserved
SeaChest_Configure Version: 2.3.1-4_1_1 X86_64
Build Date: Mar 27 2023
Today: Wed Dec 27 19:07:34 2023 User: root
==========================================================================================
/dev/sg0 - HUH728080AL5200 - 2EG2J3MG - A515 - SCSI
Failed to set the PHY speed of the device.
I’ll see about sg3-utils.
storcli can do it at the controller level, but this controller doesn’t seem to support it.
# /opt/45drives/tools/storcli64 /c0/p0 show
CLI Version = 007.1017.0000.0000 May 10, 2019
Operating system = Linux 4.18.0-513.9.1.el8_9.x86_64
Controller = 0
Status = Success
Description = None
Test Link State :
===============
------------------------------------
PhyNo SAS Link Speed PCIe Link Rate
------------------------------------
0 12.0 Gbps N/A
------------------------------------
# /opt/45drives/tools/storcli64 /c0/p0 set linkspeed=6
CLI Version = 007.1017.0000.0000 May 10, 2019
Operating system = Linux 4.18.0-513.9.1.el8_9.x86_64
Controller = 0
Status = Failure
Description = Un-supported command
Womp womp.
Looks like maybe lsiutil can do it option 12 in the main menu according to LSIUtil Configuration Utility on page 42.
EDIT:
Link to several binaries of lsiutil
Looks like lsiutil
can be hard to find. I was able to set the max PHY rate at the disk level with sdparm
, though it’s a little … interesting … modifying hex values.
This appears to set the PHY to max 6Gbps:
sdparm -p 0x19,0x01 --set=0x29:4:1=0 -t sas -S /dev/sdX
sdparm -p 0x19,0x01 --set=0x59:4:1=0 -t sas -S /dev/sdX
While this sets it to 12Gbps:
sdparm -p 0x19,0x01 --set=0x29:4:1=1 -t sas -S /dev/sdX
sdparm -p 0x19,0x01 --set=0x59:4:1=1 -t sas -S /dev/sdX
It did work, as all disks are now pinned at 6Gbps, but I’m still unclear as to exactly what I modified.
Great to hear! I’m still going with no errors so lets see how it works for you.
I’ve written about 500GB to the pool so far and no errors (except for 3 checksum errors on 2x drives) … probably jinx it. We’ll see!
I found v1.62 of lsiutil
but it has some dependency issues that lead me to believe it would be difficult to run no matter what. A lot of the notes, uploads, etc. I see are 5+ years old. That PDF you linked was from 2006.
I found v1.70 but it wouldn’t run, either.
What a pain!
Weird - I just grabbed 1.7 from that github and it ran just fine on Ubuntu 22.04.
This is a little easier to read … no hex values. Note, this works for HGST Ultrastar He8 drives. No idea what the same updates would look like for other makes/models.
Set PHY to SAS2/6Gbps:
sdparm -p pcd --set=PMALR=10 -t sas -S /dev/sdX
sdparm -p pcd --set=PMALR.1=10 -t sas -S /dev/sdX
Set PHY to SAS3/12Gbps:
sdparm -p pcd --set=PMALR=11 -t sas -S /dev/sdX
sdparm -p pcd --set=PMALR.1=11 -t sas -S /dev/sdX
Breakdown:
-p pcd
is the “page”, in this case it’s the PHY Control and Discovery, or PCD. Equivalent to -p 0x19,0x01
in hex.--set=PMALR=XX
sets the Programmed Maximum Link Rate (PMALR). I believe there are two SAS channels, so there’s a PMALR.1 as well. Equivalent to --set=0x29:4:1=XX
and --set=0x59:4:1=XX
respectively.-t sas
defines the transport protocol.-S
tells sdparam to save the update (next drive or system power-down/up) rather than just edit the current setting./dev/sdX
is the device being modified.Here’s some output to show values, etc.
# sdparm -g PMALR -l /dev/sda
/dev/sda: HGST HUH728080AL5200 A515
PMALR 10 [cha: y, def: 10, sav: 10] Programmed maximum link rate
# sdparm -g PMALR.1 -l /dev/sda
/dev/sda: HGST HUH728080AL5200 A515
PMALR.1 10 [cha: y, def: 11, sav: 10] Programmed maximum link rate
You can see the value 10
here that we --set
above. Note that the output is in this format:
<NAME> <CURRENT_VALUE> [<CHANGABLE_Y/N>, <DEFAULT_VALUE>, <SAVED_VALUE>] <DESCRIPTION>
So for the above, the current value is 10
, it can be changed, the default is 10
(or 11
), and the saved value is 10
.
Why is SAS2/6Gbps 10
and SAS3/12Gbps 11
? I have no idea! They correspond to hex values 0x0A
and 0x0B
. I assume SAS/3Gbps would be 9
, which would be hex value 0x09
.
I’ll read sdparm(8) some more tomorrow.
Still no issues with the pool after pinning the disks to SAS2/6Gbps. Several TBs written so far.
EDIT: Some checksum errors just popped up, but no read/write errors.
Here is what 1.62 said:
LSI Logic MPT Configuration Utility, Version 1.62, January 14, 2009
modprobe: FATAL: Module mptctl not found in directory /lib/modules/4.18.0-513.9.1.el8_9.x86_64
/bin/mknod: /dev/mptctl: File exists
I couldn’t be arsed to track down mptctl
or sort out what lsiutil.x86_64
wanted.
I just tried the 1.7 version a second time and realized my dumb ass downloaded the JSON manifest for the binaries and not the actual, raw file. Fixed my mistake and 1.7 works fine. I’ll see what I can see with it.
EDIT: Seems like it’ll work. Since the drives are set to SAS2 at the moment, I’ll leave the controller alone for now.
Alright, I’ve ran the Benchmark FIO tests as originally outlined over 10x now with the new Seagate drives running at SAS-2 6Gbps speeds in a 6 wide Z2 pool and we are still error free and healthy! Throughout the tests, I also kicked off head -c 100G </dev/random> /SEAGATE6Z2/random-test
and jellyfin to playback files from the zpool.
Let’s see what results @pxpunx gets but I feel good that my issue was the system wasn’t able to reliably run SAS drives at 12Gbps. My opinion is this is due to marginal cables. As noted above, when I routed the cables out and away from other components, I still received errors but they came at a much slower rate (~5 at a time) and didn’t FAULT the drive(s) in the pool right away as it did previously with dozens of errors happening in quick succession. I’m still open to it being firmware or some other hardware issue but I think it’s safe to save it’s not the drives.
@Hutch-45Drives I’ll email Corey again with these findings.
In the mean time, I’ll look to switch to using lsiutil
to set the SAS controller to 6Gps instead of the drives themselves via SeagateChestUtilities like I’ve done for testing here.
Every 2.0s: zpool status hl15: Thu Dec 28 07:11:24 2023
pool: SEAGATE6Z2
state: ONLINE
scan: scrub repaired 0B in 01:29:47 with 0 errors on Wed Dec 27 21:01:37 2023
config:
NAME STATE READ WRITE CKSUM
SEAGATE6Z2 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
1-9 ONLINE 0 0 0
1-10 ONLINE 0 0 0
1-11 ONLINE 0 0 0
1-13 ONLINE 0 0 0
1-14 ONLINE 0 0 0
1-15 ONLINE 0 0 0
errors: No known data errors
hl15:~# zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
SEAGATE6Z2 43.7T 3.38T 40.3T - - 0% 7% 1.00x ONLINE -
root@hl15:~# smartctl -x /dev/sg4
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-6.2.0-39-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST8000NM0075
Revision: E004
Compliance: SPC-4
User Capacity: 8,001,563,222,016 bytes [8.00 TB]
Logical block size: 512 bytes
Physical block size: 4096 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c50084dfd3b3
Serial number: ZA10PBP50000R619X3N4
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Thu Dec 28 07:12:13 2023 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
Read Cache is: Enabled
Writeback Cache is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK
Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature: 24 C
Drive Trip Temperature: 60 C
Manufactured in week 03 of year 2016
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 129
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 2561
Elements in grown defect list: 0
Vendor (Seagate Cache) information
Blocks sent to initiator = 3714011832
Blocks received from initiator = 645749496
Blocks read from cache and sent to initiator = 142591573
Number of read and write commands whose size <= segment size = 2331725
Number of read and write commands whose size > segment size = 104142
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 94.83
number of minutes until next internal SMART test = 57
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 927337493 0 0 927337493 0 1901.478 0
write: 0 0 0 0 0 2530.309 0
verify: 46954 0 0 46954 0 0.096 0
Non-medium error count: 0
[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
No Self-tests have been logged
Background scan results log
Status: no scans active
Accumulated power on time, hours:minutes 94:50 [5690 minutes]
Number of background scans performed: 0, scan progress: 0.00%
Number of background medium scans performed: 0
Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 0
number of phys = 1
phy identifier = 0
attached device type: SAS or SATA device
attached reason: unknown
reason: unknown
negotiated logical link rate: phy enabled; 6 Gbps
attached initiator port: ssp=1 stp=1 smp=1
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000c50084dfd3b1
attached SAS address = 0x500304801d01d40e
attached phy identifier = 2
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 1
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 1
Phy reset problem count: 0
relative target port id = 2
generation code = 0
number of phys = 1
phy identifier = 1
attached device type: no device attached
attached reason: unknown
reason: unknown
negotiated logical link rate: phy enabled; unknown
attached initiator port: ssp=0 stp=0 smp=0
attached target port: ssp=0 stp=0 smp=0
SAS address = 0x5000c50084dfd3b2
attached SAS address = 0x0
attached phy identifier = 0
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 0
Phy reset problem count: 0
Yeah, same … no errors since I pinned the drives to SAS2/6Gbps.
Remember, I didn’t have issues with a PCIe 9300-8i at SAS3/12Gbps speeds on the same backplane and cables. While I didn’t run exhaustive tests, that does suggest that the physical media is fine and it’s down to an issue with the onboard SAS3008.
In either case, down the rabbit hole of SAS2 vs SAS3, cable quality, etc.
I still have a case open with Supermicro. If someone were to diagnose an issue with the onboard controller, it would be them … or LSI/Broadcom.
Good to hear! I’m over 24 hour error free now. Would it help if I also opened a case with supermicro? I lean towards no, but, if there’s something I can also try to supply down to supermicro through your ticket, let me know!
Also, I used lsiutil to set the speed at 6Gbps via the controller and put the drives back to auto-negotiate. I’ll work on a quick step-by-step with lsiutil. It’s not hard but it takes a few steps to do.