And who woulda thought… it mzrfzn figures!

The fates… the FATES. Is this Ironic? I dunno, you tell me after you read this.
So, we have a computer-based NAS serving what I call the “Extended Purl Media Drive” .

First, yes EPMD was on purpose and yes I picked the acronym and then worked backwards, based on my first NAS iteration, the Purl Media Cache. It was Extended and upgraded from a mere Cache to a persistent Drive. Sadly, it’s not Erick or Parrish, nor does it make dollars. It does gots to chill–because hot computers are quickly broken computers–but that’s not the point.

But I digress. Getting back to unfinished business, I had shuffled things around due to some trickling (both motherboard and storage) and had the EPMD setup in its own computer instead of being part of a MythTV system. Part of the raison d’ĂȘtre for the EPMD is for some of the various client (and server) systems to deposit backups. It is also one of the backup locations for all our camera images, from when the new Purl Handheld was days old through now. There’s also the fun stuff like being a holder of our Plex media library of shows and movies. Its shares are on the critical path for a few things around the house, too. Failure is not an option.

I was in the middle of trickling and moving on after having pushed 10’s of teebs around to upgrade the EPMD into a new, bigger software RAID-6 array. I was cruising right along working on the next server when I noticed it. A little indicator in my byobu status line indicating my RAID wasn’t healthy… wtF?!?!?!? I checked and 2 disks had disappeared! (This is why you go RAID-6, kids!). The array was degraded but operational–but any further boolsz-ery and POOF!

Thinking the drives might be okay (at least one of them), I rebooted and then added them back to the array, which safely rebuilt and we were back to being able to survive TWO drive failures. I went back to previous trickle work…. until it happened again days later.

This ended up happening -> I’d check the RAID daily and a couple times a week, I’d see the 2 disks disappear, forcing me to reboot, re-add and rebuild. This was Not A Good LookTM for a server that is supposed to hold backups of other systems, in some cases the only backup (looking at you “Windows 7 Backup” that can only backup to one place at a time). So something needed to be done. I suspected the trickled motherboard, which required a PCI-Express dual-port SATA add-in card to handle the 6 drives that comprised the EPMD was somehow to blame, since the same case, power-supply and mostly the same drives had been happily chugging along in its previous configuration on a motherboard with 6 on-board SATA ports. I then hatched an idea: I could do a trickle within a trickle and end up rotating that happy-chug board back to the EPMD–a bit of an overkill (J-class MB+CPU combo vs i5), but still reliability is of utmost importance.

So I did the eBay thing, got an “upgrade” CPU (and all the necessary fixins from Amazon) for our gaming computer, that trickled with great pain to our second gaming computer (how else is Anastasia going to roam West Virginia with me in Fallout 76?), to a virtualization host to….. Wait, you mean no disk disappear-y for 14 days on the EPMD? Hmm. let’s just wait for that to happen before I swap the motherboards so I don’t have to reboot it unnecessarily (and deal with the hassle of it being off the network).

Well, fast forward (not that fast) to today and…. Still up! Literally the last disk go bye-bye was on the DAY I did a buy it now on eBay for the replacement CPU that would be used to eventually trickle the known-good motherboard into that system. Current uptime is 63 days and counting. It’s digitally sticking out its tongue at me. A virtual nanny nanny boo boo. A computer’s middle finger waving high in the air, in my face. It even talked a little sz (in a manner of speaking) by making it through both the March and April monthly checkarray that Debian does for software RAID. So lesson? None, but if I had to say something, then “Let your computers know that they can be replaced if they start boooooooooooolszng”. Threats of being shelved make a motherboard act right.

Leave a Reply

Your email address will not be published. Required fields are marked *