First thing first, running badblocks will wipe the entire disk. Make sure that there is no data on the disk before running badblocks. Also, if you have mulitple disks, make sure to select the correct disk to avoid data loss.
Since I started using Unraid in 2016, the Unraid community recommends to Pre-clear (burn-in test) the disk first before adding it to the array. The reason being it is better it to fail during the test than having it fail with user data on the disk. At my work, we had to RMA several brand new disks because they were DOA.
There is this term called Infant mortality. It means that hard drives are prone to fail during its first couple of months of usage due to some manufacturing defect that was not detected by the vendor. These defects could manifest after a few weeks or months of usage. Therefore, running a burn-in test decreases the chance of encountering one. However, it does not eliminate of getting a faulty disk.
From what I read, some folks run their burn-in test a couple of times just to make sure, I ran mine only one time, it is a risk I am willing to take for now. A 12TB Western Digital Easystore took ~164 hours on USB 3.0. Yes, I didn’t want to schuck it and find out it was bad.
There is an easy way of doing this whole thing via this git repository and you may need to install ksh
. Badblocks can only run one disk at a time. You can open a new terminal and run another instance. I use tmux to accomplish this. Anyways, to run the manual process, it goes like this:
- Run SMART short test
- Run badblock
- Run SMART long test
Badblock and SMART usually come with most distros, but if they are not.
# Badblock should already be installed, but if it is not # To install badblocks dnf install e2fsprogs # CentOS/Fedora/RHEL apt install e2fsprogs # Debian # To install SMART dnf install smartmontools # CentOS/Fedora/RHEL apt install smartmontools # Debian
To identify the target disk, we can use tools such as lsblk, fdisk and smartctl. We can use a serial number or the device ID. Either way, the device needs to be identity. In my case, it is the /dev/sde
.
# lsblk lsblk -d -o name,serial,type,tran NAME SERIAL TYPE TRAN sda DB82736F2414E disk usb sdb 5F903100152051619201 disk usb sdc YJK80ZDL disk sas sdd LPF920ZL disk sas sde WD-UJKEW0129914 disk sas # fdisk fdisk -l ...output omitted for brevity... # SMART smartctl -a /dev/sde | grep Serial Serial Number: WD-UJKEW0129914
If you are running SMART on a USB device as I did, I have to specify the device type to sat
; otherwise, SMART will complain that it cannot recognize the drive.
smartctl /dev/<device> -d sat
To run the SMART short test
smartctl -t short /dev/sde -d sat smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.0-193.19.1.el8_2.x86_64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Tue Nov 3 07:44:48 2020 Use smartctl -X to abort test.
The SMART test runs in the background. To check that status of SMART test.
# SMART test is still running smartctl -a /dev/sde -d sat | grep execution Self-test execution status: ( 241) Self-test routine in progress... # SMART test completed smartctl -a /dev/sde | grep execution Self-test execution status: ( 0) The previous self-test routine completed
Now, that we identified the device and tested for SMART short test, we need to run badblocks. Here are the parameters.
- -b option specifies the block size that could improve the burn-in time. The default is 1024
- -c is the number of blocks which are tested at a time. The default is 64.
- -w option use destructive write test
- -s option shows progress
- -v option is verbose mode
- -o option is output file. Without the -o badblocks will simply use the STDOUT
badblocks -b 32768 -c 512 -wsv /dev/sde -o wd-black-750g.txt
The burn-in test will go through the patterns of 0xAA, 0x55, 0xFF then 0x00 as shown below.
badblocks -b 32768 -c 512 /dev/sde -wsv -o wd-black-750g Checking for bad blocks in read-write mode From block 0 to 22892954 Testing with pattern 0xaa: done Reading and comparing: done Testing with pattern 0x55: done Reading and comparing: done Testing with pattern 0xff: done Reading and comparing: done Testing with pattern 0x00: done Reading and comparing: done Pass completed, 0 bad blocks found. (0/0/0 errors)
Once badblocks is done and there is no error, run the SMART long test
smartctl -t long /dev/sde -d sat
If everything looks good, then the drive is a keeper otherwise, if it is returnable return it or RMA it.
Cheers!