Drive Failure and Recovery

Published on October 4, 2022 at 2:02 am by LEW

Introduction

Drive Failure, if you have not been there you do not want to go there. No matter how robust current drive technology is, at some point you will experience a drive failure.I just happened to have an external backup drive fail recently. So I figured this was as good a time as any to explore this particular issue. This will be a short walk through from when you first notice the problem, checking your drive condition, and doing data recovery.

What to Do Now, Before it Happens

I am not going to belabor this point, but there are a few things you should be doing now, before you have a drive failure.

One of the best thing you can do to protect yourself is have good regular backup of all your data. In fact you might want to have multiple backup copies, preferably not all stored in the same location.

Between other backup drives, online backups, and file copies still on devices, this drive failure was trivial. I basically need to replace the drive. Without backups at best it would be a pain in the glutinous maximums, and at the worst it could have been devastating.

Most modern drives are equipped with SMART (Self Monitoring, Analysis and Reporting Technology) these days. Another good preventive measure is to run SMART checks against all your drives periodically. This is one of the best ways to catch drive issues before they become a problem.

Thatโ€™s it, just a couple of quick mention, as recovery is the topic of this post. But it does give me material for some additional future posts on backups and SMART technology

Warning Signs

Since most people run only windows, I will focus on that Operating System (OS). Under Windows I currently use CyrstalDiskInfo to check drive health via SMART data. I am not endorsing the program beyond saying it works for me. CyrstalDiskInfo works with internal and external drives, and provides lots of information, probably more than you want.

Thankfully there is a Blue/Yellow/Red warning status indicator in the upper left corner of the program window. Although I would suggest learning to read some of the other parameters also, and not be solely reliant on this quick indicator.

For the external drive drive I am having issues with, I initially got errors trying to read files from it. Then windows tried to unmount it, and that is where everything went bonkers.

I knew this drive was having issues, as CrystalDiskInfo reported it as Yellow (warning). However once it failed, I decided it would be a good subject for a post.

Troubleshooting and Pulling Data

The first step in troubleshooting was scanning some of the additional data provided by CryatalDiskInfo. Technically, if I reformat it, the drive, and map out the bad blocks, it should still be usable. But that would entail some amount of risk. According to the SMART data, it has 7920 hours of operation (that is about one year). It has been powered on (cold start) 1899 times. I have actually had the drive for about seven years, so it is well out of warranty. It’s specifications are 1 TB 5400 RPM USB3.

The main problem with the drive is the “Current Pending Sector Count” keeps climbing. This is a measure of the number of unstable sectors that have not been remapped. If it was constant, they could be mapped out. But since they are increasing, it is time to replace the drive.

After determining that the drive was failing, the next step is getting data off of it before it totally crashes. This will depend on the state the partitions are in. Best case they are recognized, and you can copy the data to another drive.

Worst case, the drive shows up as RAW or Unallocated, in which case it will take some additional measures to retrieve your data.

For the purposes of this post, I am assuming the latter. For data recovery from a bad partition, I use a program called photorec. It is part of the testdisk package, which is available for most operating systems. Note that on Windows phtorec has both a command line and GUI version.

Before running photorec, you should create a recovery folder on a drive with enough space. Regardless of which version you run, you need to select your drive, file system, and where to save recover data too. Once started just let the program run.

To give you an idea of how long it will take, a one gigabyte 5400 RPM mechanical external USB 3 hard drive took approximately ten hours to scan the whole drive. This will vary from system to system. The point being it could take awhile.

Sorting the Recovered Data

Pulling the recovery data was the easy part. The hard part is sorting it. This type of recovery will go block by block and pull pretty much everything that has ever been installed on the drive that has not yet been overwritten. Additionally you will end up with multiple copies in some cases. Be prepared.

This is a prime example of how deleting a file does not actually remove it from your drive.

I suggest you get organized. Scripts can make this easier, if you have the expertise to write them.

My process is to pull specific types of data to common folders. For example putting all images (jpg, png, tif, etc) in one folder. All Audio Files in another folder, all video file sin their own folder, etc. In the long run, this makes it much easier to do the final recovery when you are only dealing with one media type at a time, instead of a folder with multiple media types in it. But do whatever works for you.

Conclusion

Storage Drive failures will occur, it is just a matter of when. It is best to prepare for them ahead of time with backups and drive monitoring. Regardless, if the drive is functional, it is possible to recover data even though the partitions are no longer valid using a block by block method.

So be aware, be forewarned, and be ready!

Add New Comment

Your email address will not be published. Required fields are marked *