Debugging Failing Hardware

I went on a debugging adventure this afternoon when a NVME drive suddenly failed. Some of the steps here might help others debugging issues like this, or entertain you if you’ve tried to do the same as me.

I was under the impression that I could take a long weekend to mentally recover after a week of poor sleep. My desktop computer’s NVMe drive decided on the last day of my long weekend that this would be the perfect time to throw in the towel after a couple years of faithfully booting Windows and allowing me to play various video games in peace. This immediately derailed my last day of what would have been an unproductive, video game filled long weekend and made it an entire afternoon of debugging failing hardware.

Hardware debugging is often a frustrating task. I have done it plenty of times in a work context, and enough in a personal context on my desktop to know how painful it can get. Today’s adventures involved plenty of useful utilities and showed the benefits of dual-booting Linux. The following sections will give some idea of how you might diagnose these issues yourself. It might even help you fix some.

Dual Booting (an aside) #

I’ve dual booted Windows and various flavors of Linux for at least 6 years now 1. Dual booting is the easiest way I’ve found to make sure your computer can boot something. Any flavor of Linux (cough GNU/Linux cough) will do, my preferred is anything Debian based for simplicity.

A common reason to not dual boot or use Linux at all is a lack of familiarity with it, or Linux being “too technical” or “for nerds” etc. My immediate counterpoint is that my mother now uses Linux Mint on her laptop and it runs perfectly fine. Although having a deeper level of technical expertise and familiarity with Linux is helpful, the operating system is pretty mature nowadays and realistically most people can use it day to day. If all you’re doing is browsing the internet anyways, it’s perfectly fine.

How to find “Hardware” on Linux #

One extremely beneficial point of Linux machines is how straightforward it is to look under the hood and find hardware issues. I was having problems with my NVMe drive. This is a PCIe device, which you can just look for:

lspci # list pci devices
lsblk # list block devices

Looking at all PCIe devices gets noisy. PCIe is Peripheral Component Interconnect Express - literally all peripheral devices use it these days. It should show up as a block device with lsblk at the very least. In my case, this didn’t happen.

I now needed to look one level deeper. It was possible that this was still a software issue, but this software issue was more likely to be with the drive firmware than Linux. At this point I was digging through kernel messages to see if the drivers had even discovered the device:

sudo dmesg | grep -i pcie
sudo dmesg | grep -i nvme
sudo dmesg | grep -i mount

At least one of these three should have given me something. If it were driver failures, looking for anything PCIe or NVME related via grep -i {pcie/nvme} should have shown something related to the drive. I was not so lucky. Mount errors could have been possible, so I tried grep -i mount out of desperation more than anything, no dice. At this point I was left with the feeling that the drive was just dead after all. It was time for some hardware debugging.

Turn it off and back on again, why not #

As any IT professional with experience troubleshooting will tell you, the silver bullet solution in almost all cases is to turn something off and back on again. Rebooting fixes more problems than it ought to. In my case, I went with the nuclear reboot option:

  • Turn off the desktop from Linux (safely)
  • Wait until all lights on the computer are off
  • Unplug everything from the wall
  • Hold down the power button for 10-15 seconds
  • Take 5 minutes to wander around pondering why you’ve spent all this time for a Windows boot
  • Plug everything in again and power it back on

Yet again after a power on I was left with a boot screen that only showed a Linux drive as the bootable option. BIOS settings were showing me that there was not even a recognized M.2 drive in the one slot my motherboard had. It was time for more hands-on fixes.

Find a Screwdriver #

It was time for me to look at the actual desktop components. After finding nothing on the software side (at all) and a full reboot failing there really are no other options left. Luckily for me I have a reasonably accessible desktop case that makes getting to most drives only slightly painful. I was only two Phillips screws away from getting my NVME drive off the motherboard and where I could see it. After getting the desktop on its side and taking those tiny screws out, I was able to reseat the drive without too much of a hassle 2.

A minute later once my desktop cables were reconnected I was greeted by the unwelcome sign - exactly the same thing I had seen before. My NVME drive was nowhere to be found. In spite of my time spent unplugging the drive, finding it to be entirely normal looking, and reseating it to make sure pins (sockets?) were at least connected reasonably, I had what appeared to be a totally bricked drive on my hands.

Desperately try to repair an ~8 year old SSD? #

My desktop has (had) 3 separate drives, one of which is an ancient 500GB Windows 8 (yes 8) installation that I last booted to at some point in the early 2020s. It stopped working long ago, and I have since used it as storage for some games that didn’t fit in the 1TB of previously quite nice NVME storage. In my apparently delirious state after finding out my NVME was no longer functional I attempted to get my Windows 8 SSD back in working order.

It turns out that this task is far tougher than I ever would have imagined. I would not recommend ever doing this, if given the choice simply reinstall Windows. I will give only a vague outline of what I attempted to do, as this took me well over 2 hours and led nowhere productive:

  • Used Ventoy to create a bootable USB with a Windows 10 ISO
  • Booted the ISO to get to a command prompt
  • Tried to fix the MBR drive that I had - bootrec {/fixmbr -> /fixboot -> /rebuildbcd} in that order. No luck
  • Attempted converting the drive to gpt from mbr mbr2gpt {/validate -> /convert} in that order. Still failed
  • Looked at fixing corrupted files with chkdsk. Found some, fixed them, still didn’t boot
  • Thought sfc /scannow would work, said it did. Apparently it was lying

Ultimately I’ve installed Windows 10 fresh on a replacement drive. If you ever find yourself attempting to repair a near decade old Windows 8 install I have one word of advice - don’t.


  1. Ubuntu, Arch, Artix, PopOS, and Manjaro to name a few. I currently run PopOS, but can recommend most Debian-based distros for general use and will probably write a post about distros at some point. ↩︎

  2. It was actually quite the hassle, as I accidentally unscrewed one of the 4 screws which kept my CPU cooler in place. This caused uneven pressure on my CPU and the motherboard failed to boot. It was an easy fix, but this caused me immense concern for a few minutes where I deeply regretted ever touching any hardware in the first place. ↩︎