Sunday, October 4, 2015

HP P410i - P420i Self-test failure lockup

Recently I was messing around with a DL360G8 from HP. It all looked well until my raid controller, a p420i, quit while I was installing a virtual machine in ESX6.0. When I reboot, I got this nice message; 1783-slot 0 Drive Array Controller Failure. [Self-test failure (cmd=0h, err=00h, lockup=013:0h)]



At first I was kind of clueless, what to do? Was it a failure in my raid controller, or maybe in the cache module? After some Google work I found most people who had a similar problem had it because they did not have OEM disks (neither did I, I just replaced them with some aftermarket Intel SSD DC S3500 ssd's). But after removing them and trying to boot without disks, the error persisted, so it must me something else. I tried booting without the cache module, since many people had issues with this as-well. Unfortunately, this didn't help either. Then I figured I would let the server sit without power for a while to drain any leftover power from batteries or other kind of capacitators, when this didn't work I disabled the p420i in the BIOS/UEFI, I figured maybe if I reinitialise the card in this way, it might work... Wishful thinking, it didn't.

Then, I got the idea to disconnect the cache module, and more importantly, the SAS cables on the motherboard, so there wouldn't be any signal to the raid controller at all. Hallelujah and praise the Flying Spaghetti Monster! It works! In all my enthusiasm I powered down the server to re-attach the cache module and the SAS cables. But hey, when I rebooted the server, the error message was back... Bummer! I figured it must of been the SAS cables, because the cache module alone didn't fix it. So I powered the machine down again and disconnected the SAS cables. I rebooted the machine and held my fingers crossed (with a G8, it takes a while to boot..). Yes! Again no error message. So I hit F8 to enter the adapters setup. Whilst in setup I reconnected the SAS cables to the motherboard and one by one my ssd's reinitialised. When I clicked on the create a raid array, there where my disks, and I could successfully create a new array! After rebooting there was no error message anymore and I was able to boot safely into ESX and continue with my work!


I hope you guys find this useful, and as always... If you find a controller that is helped with the same fix, leave a comment, and you might help someone else with it to!

8 comments:

  1. Hey Yannick,

    Thanks for your post. I followed some of your tips (disconnected cache m odule + battery) and that helped us get our controller back in the air! HP engineer is coming back to replace the motherboard tomorrow.

    Regards,

    Valentijn

    ReplyDelete
  2. Another fix: re-seat DIMM cache of the controller!
    Gen6 HP server, p410i.

    ReplyDelete
  3. i have HP proliant ML 370 G6 server. i got message that "1783 -slot 0 array controller failure. self test failure. no other message. how to solve this. and one more thing that cache modeule + battery disconnected explain with some picture. and how to solve this. this is my mail id: jvanurajecil@gmail.com

    ReplyDelete
  4. On my SE326M1 (DL180 G6 with some more features), I was able to bypass the fault by removing the drives before startup (might be easier as removing the SAS cables, cause they are not really good accessible) and re-adding them while inside the ACU. The error reappears though after the next reboot.

    ReplyDelete
  5. Thanks a lot brother. I directly went ahead and replaced the SAS cable, awesome all back online and RAID was created. Thanks a TON! Long live!

    ReplyDelete
  6. Great and that i have a nifty supply: Whole House Renovation home remodeling near me

    ReplyDelete