gpt/mdadm/debian stable/failed drive replacement howto

It’s this time again. Last time in 2012. Fast forward to 2021. Here we are again. This time it a little different. The drive has not failed yet but it shows signs of failures.

smartctrl says it cannot read few sectors.

Waring: One thing you learn over many years working with computers, servers, etc, is that you CANNOT ignore hardware failures. They will bite you back if you think you can leave it for few extra day(s). Mine, as well as your policy should be: if you get a warning that something is wrong, you need to act. If you work for a business that means you ship the new drive overnight. No excuses should be allowed in this regard.

With that hardware failure policy you and your business has better chances.

Debian Stable.

Install gdisk 
aptitude install gdisk

Show details of partition md0
mdadm --detail /dev/md0
*2021 drive failing.
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[4] sda1[3] sdc1[1]
3907023872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU]
**Note my 2012 failure was this:
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc1[1] sdd1[2]
3907028864 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]

Since our drive did not fail yet but will soon, we will mark it as failed.

mdadm /dev/md0 -f /dev/sdb1
mdadm: set /dev/sdb1 faulty in /dev/md0

If we didn’t fail it we would get this error if we tried to remove it in next step:

mdadm /dev/md0 -r /dev/sdb1
mdadm: hot remove failed for /dev/sdb1: Device or resource busy


Lets remove the drive from mdadm: (not if you don’t know if ifs sdb1 you can run lsblk to confim)

mdadm /dev/md0 -r /dev/sdb1
mdadm: hot removed /dev/sdb1 from /dev/md0
#We can see our mdadm now shows the drive missing
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sda1[3] sdc1[1]
3907023872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]


SHUT DOWN IF YOU NEED TO REPALCE DRIVE.

If you are putting it in a same slot you should have it mark as same name, but we need to make sure. If the drive name changed you would not want to be making partitions changes on a wrong drive. MAKE SURE NEW DRIVE IS STILL sdb.
Look how disk is structured and what partition type it has
sgdisk -p /dev/sdb
Disk /dev/sdb: 3907029168 sectors, 1.8 TiB
Logical sector size: 512 bytes
Disk identifier (GUID): 0ED13F81-6EEA-4E12-9F27-DD806CF1F09C
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 3907029134
Partitions will be aligned on 8-sector boundaries
Total free space is 0 sectors (0 bytes)

#sgdisk -R=/dev/TO_THIS_DISK /dev/FROM_THIS_DISK
sgdisk -R=/dev/sdb /dev/sda
#Give new GUID since above options clones the disk including GUID
sgdisk -G /dev/sdb

Now readd the drive to md0 
mdadm /dev/md0 -a /dev/sdb1


Check the status
cat /proc/mdstat 
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[4] sda1[3] sdc1[1]
3907023872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
[>………………..] recovery = 0.0% (253788/1953511936) finish=384.8min speed=84596K/sec

#....few minutes later

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd1[4] sda1[3] sdc1[1]
3907023872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
[=>……………….] recovery = 7.4% (145761912/1953511936) finish=391.5min speed=76950K/sec


Done. Check back in few hours to see if it finished.
Keywords: fdisk,sdisk, sgdisk, gdisk,parted,gpt, mbr,raid5,mdadm,linux,debian,business, dell,hp,server,policy finish=786.3min speed=41401K/sec
Done. Check back in few hours to see if it finished.
Keywords: fdisk,sdisk, sgdisk, gdisk,parted,gpt, mbr,raid5,mdadm,linux,debian,