{"id":656,"date":"2021-06-27T14:46:49","date_gmt":"2021-06-27T20:46:49","guid":{"rendered":"http:\/\/lucasmanual.com\/blog\/?p=656"},"modified":"2021-06-27T20:27:53","modified_gmt":"2021-06-28T02:27:53","slug":"gpt-mdadm-debian-stable-failed-drive-replacement-howto","status":"publish","type":"post","link":"https:\/\/lucasmanual.com\/blog\/gpt-mdadm-debian-stable-failed-drive-replacement-howto\/","title":{"rendered":"gpt\/mdadm\/debian stable\/failed drive replacement howto"},"content":{"rendered":"\n<p>It&#8217;s this time again. <a href=\"http:\/\/lucasmanual.com\/blog\/mdadm-raid5-how-to-replace-failed-drive-gpt-partition\/\">Last time in 2012<\/a>. Fast forward to 2021. Here we are again. This time it a little different. The drive has not failed yet but it shows signs of failures.<\/p>\n\n\n\n<p>smartctrl says it cannot read few sectors.<\/p>\n\n\n\n<p>Waring: One thing you learn over many years working with computers, servers, etc, is that you CANNOT ignore hardware failures. They will bite you back if you think you can leave it for few extra day(s). Mine, as well as your policy should be: if you get a warning that something is wrong, you need to act. If you work for a business that means you ship the new drive overnight. No excuses should be allowed in this regard. <br><\/p>\n\n\n\n<p>With that hardware failure policy you and your business has better chances.<br><\/p>\n\n\n\n<p><strong>Debian Stable.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>Install gdisk <\/strong><br><code>aptitude install gdisk<br><\/code><br>Show details of partition md0<br><code>mdadm --detail \/dev\/md0<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">*2021 drive failing.<br>cat \/proc\/mdstat <br> Personalities : [raid6] [raid5] [raid4] <br> md0 : active raid5 sdb1[4] sda1[3] sdc1[1]<br>       3907023872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3\/3] [UUU]<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\"><code>**Note my 2012 failure was this:<\/code><br><code>cat \/proc\/mdstat<br> Personalities : [raid6] [raid5] [raid4]<br> md0 : active raid5 sdc1[1] sdd1[2]<br>       3907028864 blocks level 5, 64k chunk, algorithm 2 [3\/2] [_UU]<\/code><\/pre>\n\n\n\n<p>Since our drive did not fail yet but will soon, we will mark it as failed.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mdadm \/dev\/md0 -f \/dev\/sdb1<br>\nmdadm: set \/dev\/sdb1 faulty in \/dev\/md0<\/pre>\n\n\n\n<p>If we didn&#8217;t fail it we would get this error if we tried to remove it in next step:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mdadm \/dev\/md0 -r \/dev\/sdb1<br>\nmdadm: hot remove failed for \/dev\/sdb1: Device or resource busy<\/pre>\n\n\n\n<p><br>Lets remove the drive from mdadm: (not if you don&#8217;t know if ifs sdb1 you can run lsblk to confim)<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">mdadm \/dev\/md0 -r \/dev\/sdb1<br> mdadm: hot removed \/dev\/sdb1 from \/dev\/md0<\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">#We can see our mdadm now shows the drive missing<br>cat \/proc\/mdstat <br> Personalities : [raid6] [raid5] [raid4] <br> md0 : active raid5 sda1[3] sdc1[1]<br>       3907023872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3\/2] [_UU]<\/pre>\n\n\n\n<p><code> <\/code><br><strong> SHUT DOWN IF YOU NEED TO REPALCE DRIVE.  <\/strong><\/p>\n\n\n\n<p><strong>If you are putting it in a same slot you should have it mark as same name, but we need to make sure. If the drive name changed you would not want to be making partitions changes on a wrong drive. MAKE SURE NEW DRIVE IS STILL sdb.<\/strong><br><strong>Look how disk is structured and what partition type it has <\/strong><br><code>sgdisk -p  \/dev\/sdb<br> Disk \/dev\/sdb: 3907029168 sectors, 1.8 TiB<br> Logical sector size: 512 bytes<br> Disk identifier (GUID): 0ED13F81-6EEA-4E12-9F27-DD806CF1F09C<br> Partition table holds up to 128 entries<br> First usable sector is 34, last usable sector is 3907029134<br> Partitions will be aligned on 8-sector boundaries<br> Total free space is 0 sectors (0 bytes)<\/code><\/p>\n\n\n\n<p><strong>#sgdisk -R=\/dev\/TO_THIS_DISK \/dev\/FROM_THIS_DISK<\/strong><br><code>sgdisk -R=\/dev\/sdb \/dev\/sda<br>\n<strong>#Give new GUID since above options clones the disk including GUID<\/strong><br>\nsgdisk -G \/dev\/sdb<\/code><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>Now readd the drive to md0 <\/strong><code><br> mdadm \/dev\/md0 -a \/dev\/sdb1<br><\/code><br><br><strong>Check the status <\/strong><\/pre>\n\n\n\n<pre class=\"wp-block-preformatted\">cat \/proc\/mdstat <br> Personalities : [raid6] [raid5] [raid4] <br> md0 : active raid5 sdd1[4] sda1[3] sdc1[1]<br>       3907023872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3\/2] [_UU]<br>       [&gt;\u2026\u2026\u2026\u2026\u2026\u2026..]  recovery =  0.0% (253788\/1953511936) finish=384.8min speed=84596K\/sec<br><br>#....few minutes later<br><br>cat \/proc\/mdstat <br> Personalities : [raid6] [raid5] [raid4] <br> md0 : active raid5 sdd1[4] sda1[3] sdc1[1]<br>       3907023872 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3\/2] [_UU]<br>       [=&gt;\u2026\u2026\u2026\u2026\u2026\u2026.]  recovery =  7.4% (145761912\/1953511936) finish=391.5min speed=76950K\/sec<\/pre>\n\n\n\n<p><br><strong>Done. Check back in few hours to see if it finished.<\/strong><br>Keywords: fdisk,sdisk, sgdisk, gdisk,parted,gpt, mbr,raid5,mdadm,linux,debian,business, dell,hp,server,policy<code> finish=786.3min speed=41401K\/sec<\/code><br><strong>Done. Check back in few hours to see if it finished.<\/strong><br>Keywords: fdisk,sdisk, sgdisk, gdisk,parted,gpt, mbr,raid5,mdadm,linux,debian,<\/p>\n","protected":false},"excerpt":{"rendered":"<p>It&#8217;s this time again. Last time in 2012. Fast forward to 2021. Here we are again. This time it a little different. The drive has not failed yet but it shows signs of failures. smartctrl says it cannot read few sectors. Waring: One thing you learn over many years working with computers, servers, etc, is&hellip; <a class=\"more-link\" href=\"https:\/\/lucasmanual.com\/blog\/gpt-mdadm-debian-stable-failed-drive-replacement-howto\/\">Continue reading <span class=\"screen-reader-text\">gpt\/mdadm\/debian stable\/failed drive replacement howto<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,3,34,27,4,33],"tags":[39,25,37,26,36,35,38],"class_list":["post-656","post","type-post","status-publish","format-standard","hentry","category-corporate","category-debian","category-hardware","category-it-department","category-linux","category-policy","tag-business","tag-debian","tag-mdadm","tag-nvme","tag-sda","tag-sdb","tag-stable","entry"],"_links":{"self":[{"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/posts\/656","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/comments?post=656"}],"version-history":[{"count":2,"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/posts\/656\/revisions"}],"predecessor-version":[{"id":659,"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/posts\/656\/revisions\/659"}],"wp:attachment":[{"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/media?parent=656"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/categories?post=656"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/tags?post=656"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}