LSI 9300 16i PCIe Bus Error: severity Correctable pre kernel load

Some background: I am setting up a TrueNAS using a Rosewill 12 bay case, B760M Ultra Gaming MB (small but what I had) BIOS v F7 (Sept 2025), and a LSI 9300-16i. The first LSI we bought was DOA, returned it. Second one was plug and play, no issues. But, after sitting on its own for 3 weeks, the functioning TrueNAS was missing from the network.

The problem: The TrueNAS is rebooted and we see an endless looping of this error (apologies for the potato quality image):

Removed the cables from the card, same issue. Removing the card, server boots normally. Card back in with no cables, same errors.

Booting into BIOS we can see the card is there, and the system information looks normal. There are two LSI SAS3 MPT Controller SAS3008 devices showing in the BIOS Settnigs (as I understand it, that is what the 16i is), and the information showing looks normal. The green LED on the card is also flashing at about 1 beat per minute, which according to one The Art of Server video, means the card is powering up and the firmware is running.

Given this was functioning normally prior to the current state, I am at a bit of a loss on how to troubleshoot this further. Or is it perhaps just dead?

So the good news is the errors all look correctable so things are generally working despite the problem at hand with the reboots. The errors are either physical or related to low level operations. A few things to try:

  1. Physically inspect the pcie port and pins on the card for any problems
  2. Reset the BIOS settings to default
  3. Check the firmware version of the LSI card and Broadcom’s website for updates
1 Like

You mentioned Art of Server. Is that where you bought the HBA(s)? Remember buying these second hand, you don’t really know their provenance, usually things are ok, but cards can be abused. You may even be abusing the card now depending on airflow in your case, the 9300-16i runs pretty hot and needs decent cooling. You probably need to determine if the communication issues are due to some permanent damage or not.

  • Thermal Degradation: LSI 9300-series cards are enterprise hardware designed for servers with high-CFM fans forcing air directly through the heatsinks. Not sure which Rosewill case you have, but it might not have enough airflow. After 3 weeks of running at high temperatures, the PLX chip or SAS controllers may have started to fail, or thermal cycling caused micro-fractures in the BGA solder.
  • Signal Integrity / PCIe Gen Mismatch: Sometimes, modern motherboards (like the B760M) struggle to properly auto-negotiate PCIe generations with older enterprise cards. Small changes in temperature or electrical resistance can push a marginal PCIe Gen 3 signal out of spec, causing packet loss.
  • Card Creep/Sag: Heavy cables combined with heating/cooling cycles can cause the card to slightly unseat from the PCIe slot pins.

Beyond what Ryan mentioned, here are some other things to try:

  • Force PCIe Gen 3 in the BIOS: By default, the Gigabyte B760M motherboard tries to run that slot at Gen 4 or Gen 5. Go into the BIOS and manually set the PCIe slot hosting the LSI card to Gen 3 (or even Gen 2, just to see if the errors stop). This widens the signal margins and often cures AER floods instantly.
  • Add Active Cooling: If the card is still alive, it may need a dedicated fan. Try zip-tying a 40mm Noctua fan directly to the LSI heatsink.
  • Bypass AER to check functionality: To confirm the drives and TrueNAS are actually fine, interrupt the GRUB bootloader and add pci=noaer or pcie_aspm=off to the kernel boot parameters. This tells Linux to ignore the hardware retry noise. If the system boots and the storage pool imports, the issue is strictly PCIe communication logging and not dead SAS controllers.

Those cards were also first manufactured 14 years ago, so depending on when yours was born, repasting it might help.

1 Like

Wow, this is my first post here, and the replies are really really helpful, thank you!

The card I currently have is from an Amazon reseller. I think I have learned my lesson after first card was DOA and this one is not a bit weird. I have ordered a new card from The Art of Server (arriving in a coulpe of weeks), and they have already sent me a few tips via eBay chat, including being mindful of the heat issues with this card. The current card is not flashed with IT mode (does that make sense), and I am unable to get a reading from the temp sensors. TrueNAS blocks apt install, so a tad stuck here I think.

Ryan:

  • Card has been carefully inspected and reseated… several times, no change.
  • BIOS settings I mucked with and rest 2 times now, no change.
  • LSI firmware version - this is one area I have not looked at but will. Since I do have another LSI card coming, I feel comfortable attempting a firmware update even if I destroy the card. Best way to learn :slight_smile:

David:

  • The heat issue and thermal degradation sounds reasonable in this case, as the card was fine for the first day or so, and at some point during it’s short uptime life, these messages started.
  • Signal Integrity, I did not know this, but it does make sense to me.
  • The cables are well supported so no weight there, but good tip for all my wiring. This is my first build in almost 20 years. One forgets a lot over time.
  • Force gen3 or gen2. Based on another thread maybe on Reddit, someone suggested this but I was not able to find the setting. I will look again.
  • Active cooling, definitely going to do this, I think I have a fan somewhere, and I will look into getting something beefier. The rack is where I can’t hear it, so noise levels should not be a problem.
  • Bypass AER - I dd both of these suggestions and was able to get the machine booted and running! This is wonderful, thank you!

Would the manufacturing date appear on this card I wonder. I will check out of curiosity. Repasting is worth trying too.

Admittedly I am a bit surprised at the amount of heat this card puts off. Are there other better HBA cards, more modern, that would work and be less energy hogging? I am running all 12 drives through this.

Question: I don’t have extra power into the card from the PSU, doesn’t seem to change anything, is that recommended?

Question: Can the heat sink be changed for something bigger, or am I best not messing with that?

I will report back on my findings in case it helps someone. Thank you all so much forall the help. The response here has been fantastic!

1 Like

If your goal is to use this with TrueNAS you will need to flash it to IT mode anyway. I don’t think that is related to the errors, the errors are at the physical PCIe layer level.

If I had known ordering a third card was on the table so soon, I would have suggested a 9305 instead of a 9300. The 9305 has a single SAS controller (vs two on the 9300) so is more power and heat efficient. Now, I run a couple of 9300-16is in my 45HL boxes and don’t have any issues, don’t have fans strapped to them and just use case airflow. So it’s not like cooling a server GPU, but just something to be aware of when having issues.

Not to be too much of a downer on the second hand market, but it is also worth noting that there are fake cards in the supply chain, that have undersize heat sinks. AoS charges a bit more, but will make sure you get a clean, flashed, tested card and not a fake.

.I believe that connection is for if the PCIe slot isn’t providing enough power. We haven’t asked about your PSU, but I assume it is something decent, not a 250W SFF unit or something. Normally a PCIe slot should power the card just fine.

AFAIK there’s no standard hole positioning for the heat sinks. I’m not aware of anything third party. I wouldn’t mess with that.

I’m a bit out of my area of direct knowledge, but if that temporarily fixes the issue, I think it indicates the card itself (BGA Solder, SAS chip silicon) is fine. Here is what you can try in that scenario, but it is mostly cut/paste. If you can tell us the specific mobo maybe we can help with the BIOS settings, I wasn’t sure from your post what specific brand and model of B760M.

1. The BIOS Fix: Disable ASPM and Force Gen 3

The most common root cause of these correctable AER floods is an incompatibility with ASPM (Active State Power Management). ASPM is designed to save power by putting idle PCIe links to sleep. However, older enterprise LSI cards absolutely hate consumer motherboard ASPM implementations. The motherboard tries to put the PCIe switch to sleep, the LSI card doesn’t wake up in time, packets drop, and the AER loop begins.

To fix this permanently at the hardware level:

  • Disable ASPM / Native PCIE Enable: Go into the motherboard BIOS and look for settings related to PCIe Power Management, ASPM, or “Native PCIe Enable.” Force them to Disabled.
  • Force PCIe Gen 3: While in the BIOS, manually set that specific PCIe slot to Gen 3 instead of Auto.

2. The Physical Fix: EMI Shielding and Slot Reseating

If disabling ASPM in the BIOS doesn’t stop the underlying noise, you might be dealing with physical signal degradation (EMI interference or poor contact).

  • Clean the Contacts: Take a pencil eraser and 99% isopropyl alcohol and clean the gold PCIe contacts on the card.
  • Change Slots: If possible, move the HBA to a slot further away from the GPU or power supply. The LSI’s unshielded PLX chip might be picking up electromagnetic interference from a neighboring component.

3. The Software Fix: Make the Boot Parameter Permanent

If the hardware and BIOS tweaks fail, but the card runs flawlessly under load with the errors suppressed, there is no harm in simply telling the Linux kernel to permanently ignore the noise.

In a TrueNAS SCALE environment, you shouldn’t edit the GRUB bootloader files directly via the command line, as system updates will overwrite those changes. Instead, make the kernel parameter permanent through the middleware:

  1. Navigate to the TrueNAS SCALE Web UI.
  2. Go to System Settings → Advanced.
  3. Locate the Kernel Command Line Arguments section.
  4. Add pcie_aspm=off (disables the problematic power management entirely) or pci=noaer (just mutes the reporting).
  5. Save and reboot.
2 Likes

I tend to be a bit impulsive with these things. Had I waited 10 minutes I would have possibly saved some hassles for myself here.

Not to be too much of a downer on the second hand market,

Aware of this potential issues here. The card I have seems legit, and since it work still, I am thinking it is real. Fingers crossed. Next will be real :slight_smile:

I believe that connection is for if the PCIe slot isn’t providing enough power

This makes sense, the MB I have appears to be doing fine with powering the card. Currently have 6 HDs running on hte card without issue via TrueNas. Which leads to my next question

If your goal is to use this with TrueNAS you will need to flash it to IT mode anyway.

The card is working fine right now with the latest v of TrueNAS, what does IT mode gain? Apart from the ability to access the heat sensors perhaps?

The BIOS Fix: Disable ASPM and Force Gen 3

Thanks for the suggestions, I will dig into this over the weekend.

The Physical Fix: EMI Shielding and Slot Reseating

Only one slot on the MB I have and this is the only card currently installed.

The Software Fix: Make the Boot Parameter Permanent

This was the change I made to stop the noise. Added it to /etc/defaults/grub or there abouts.

Thank you again for all the advice. I’ve been distracted with the day job, but I hope to get back to the homelab soon.

IT mode is necessary for ZFS to have true block level access to the drive. IR mode on an HBA means a software raid drive is being presented instead of the physical drive. Best case scenario you’re losing performance but I imagine ZFS errors will happen eventually.

IT mode means the LSI card is passing through the drives natively to the OS. This is what you want for ZFS to function properly.

IR mode (the alternative) means the LSI card is trying to do it’s own hardware RAID. You don’t want this as it will interfere with ZFS’s ability to do proper block parity striping and understand disk health.

Before we make assumptions, in Truenas go to System \ Shell and run the following command;

sudo dmesg | grep -i mpt3sas

You should get some output that includes lines like this:

[    3.500036] mpt3sas_cm0: LSISAS3008: FWVersion(16.00.10.00), ChipRevision(0x02)
[    3.500039] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)
[    3.500954] mpt3sas_cm0: sending port enable !!

Yours will likely show this twice, once for each of the SAS3008 chips on the 9300-16i. Does it say Protocol=(Initiator,Target) or just Protocol=Initiator?

1 Like

Output is:

[    4.209117] mpt3sas_cm0: LSISAS3008: FWVersion(07.00.01.00), ChipRevision(0x02)
[    4.209125] mpt3sas_cm0: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)

[    4.541116] mpt3sas_cm1: LSISAS3008: FWVersion(07.00.01.00), ChipRevision(0x02)
[    4.541123] mpt3sas_cm1: Protocol=(Initiator,Target), Capabilities=(TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)

Based on my research this says the card is in IT Mode. Is that correct?

I should ask, is there anyway to install lsiutil on the TrueNas box? Most of what I have read says no. apt is disabled which means no apt install and no build tools. It would be nice to get the temp of each chipset.

Yep it’s in IT mode but the firmware is really old. You can download a Linux x64 binary of lsiutil and run it on truenas. That’s what I’ve done in the past. I believe Broadcom has it on their website somewhere. Sadly they don’t necessarily make it easy to find. Might be better off finding a forum link to it.

2 Likes

I agree with Ryan, you have IT mode firmware, but if you intend to keep the card you want to upgrade it to 16.x. You can get 16.00.10.00, which is the version AoS will probably load on your new card from Broadcom here;

Or there was a hotfix specifically for Truenas released as 16.00.12.00 here;

3 Likes

Thank you both. Next week I’ll work on upgrading the firmware. This weekend will be spent outdoors working on the farm.

1 Like

If all else fails I can recommend the 9400 as long as you don’t need the dual sas controller “feature” of the 9300-16i.

I also started with a 9300 in IT mode and with the Truenas patch. It would usually be fine but I would get random sas errors that were not related to my drives and on reboots it was a cointoss if my drives would show up or not. This didn’t appear to be because of the sas expanders I was using as I would get faults and errors from the second controller on the 9300 that only had drives attached directly

After switching to a 9400, also from AOS I believe, all of my sas issues just went away.

1 Like