- Raid 1 (on existing system)
- Raid 5 (via Debian installer)
- GPT Example
Software RAID1 (starts degraded), Debian Etch 4.0, kernel 2.6.18, GRUB
Absolutely no warranty, use entirely at your own risk, the author accepts no responsibility. It is quite likely some statements in this document are inaccurate.
Oh joy, I have a need to create a RAID1 array on an SMTP/IMAP server, and because all of our company's email will is stored on that server, I must not loose the data. I am going to use software RAID1 to mirror the hard drive and make a tape backup once each day. I experimented (on a test system) for days using bits and pieces of the many HOWTOs I found on the subject of RAID1 mirroring on Linux (and specifically Debian) and I found that there is a lot of stuff that did not work for me or is ancient history. I am not an expert on the subject by any means, but I have learned a few things:
If you use this document on anything but Debian Etch 4.0, you might loose all your data. If you don't understand the reason you are performing a step, you might loose all your data. If you don't know exactly what will happen when you perform a step, you might loose all your data. If you blindly copy and paste commands in this document to your shell, you might loose all your data. If you are not paying attention and are not free from distraction, you might loose all your data. If you have not first practiced on a test system, you might loose all your data. If you make typo errors, you might loose all your data. If you do not first back up your data, you might loose all your data. If you don't perform the steps in the proper order, you might loose all your data. If you become impatient, you might loose all your data. If you don't document how to repair your system in the event of a hard drive failure, you might loose all your data. Other than that, it's really pretty simple.
The easiest way to get RAID1 functionality is to configure it using the partman partition manager and mdcfg presented to you when you first install Debian. Of course this is only useful if you are building a new system. This document however is for a system that is currently up and running on a single drive and you wish to add a second drive to mirror the first. If you are building a new system and wish to configure your drives as a RAID1 array or are using the LILO bootloader, this document may not be for you. See http://www200.pair.com/mecham/raid/raid-index.html for other choices.
Raid 1 (on existing system)
What is needed
This document in itself is not designed to get RAID1 functional on your production computer. It is designed to get you comfortable with doing so in a test environment. The test environment you create should be as close as possible to the system you will eventually configure RAID on. A few of the steps I perform definitely should not be performed on a production system with data on it. We may do things to illustrate or prove a point; to educate us. When we finish training ourselves on a test system we should be confident enough to continue on to our production box.
My setup is on an i386 based machine. I am using the 2.6.18 kernel, the ext3 file system and the GRUB boot loader. I have not tried this with LVM.
There is a major problem with getting RAID1 to function. The software modules that are needed to read the data from the devices in the array need to be loaded at boot time, or the devices cannot be read. The problem is, these modules are not normally included in the boot ramdisk image (/boot/initrd.img-x.x.x). For our purposes we need two modules, 'md' (multi-disk) and 'raid1' (redundant array of inexpensive/independent disks, level 1). This is very similar to the problem Windows administrators face when dealing with device drivers for hard disk controllers that are not included with Windows. You have to install the device driver on the hard drive before you install the controller, or you cannot read your hard drive. While there is evidence it may be possible, I know of no straightforward way to include the needed modules in a series of boot floppies. The bottom line is, if you cannot get the modules loaded into the ramdisk, you may not be able to boot your machine from your hard drives. Also, it is not enough to get the modules into the boot ramdisk image. Doing so will get one or two RAID devices running but the remainder depend on additional RAID software that loads later on in the boot process. In addition, you must get your system configured in such a way that BOTH hard drives are bootable so you can boot your system from either drive if the other drive has failed, or is removed. I have to admit that I don't understand (whether Windows or Linux) how the disk is read if the software needed to read the disk is on the disk! It must be some special low-level boot magic.
I suggest reading: man initrd
I am going to talk about both SCSI and EIDE hard drives because I have tested this with both, but the examples will be EIDE. There are not a lot of differences. Simply substitute 'hda' with 'sda', 'hdc' with 'sdb' and such. Sorry, but I have not tested with SATA drives. You will need two identical hard drives to complete this project (in addition to the one currently installed in your system). They can actually be different (within reason) but if they differ in size, the smallest drive will dictate the size of the partitions, so the drives must be equal in size or larger than your current hard disk. Identical is better. Why two additional drives you say? I thought RAID1 only used two drives. True, but once you install one of the drives in your production system we want to have a spare drive available if one of the other two drives fails. That is the point isn't it? Besides, we are going to use the two spare drives as test drives prior to installing one of them in the production machine.
Name your hard drives so it is easier for me to refer to them. Actually label these names on them. Name one of them apple, and one of then pie. If one of the drives is smaller than the other, label the smaller of the two apple. For an EIDE system, one drive must be installed on the primary master drive connector and the other on the secondary master drive connector. I am going to refer to the EIDE drive that is connected to the primary master connector as being in the primary position. Linux should recognize this drive as /dev/hda. I am going to refer to the EIDE drive that is connected to the secondary master connector as being in the secondary position. Linux should recognize this drive as /dev/hdc.
/dev/hda is primary /dev/hdc is secondary
For a SCSI system with both drives on one SCSI adapter, one drive should be configured as SCSI device id 0 (on my system removing all ID jumpers on the drive sets it to 0), and the other drive is typically configured as SCSI device id 1 (I add a jumper to the drive to set this). I am going to refer to the SCSI drive that is configured as SCSI device id 0 as being in the primary position. Linux should recognize this drive as /dev/sda. I am going to refer to the SCSI drive that is configured as SCSI device id 1 as being in the secondary position. Linux should recognize this drive as /dev/sdb. These statements assume both drives are installed. If only one drive is installed, it will be in the primary position and be recognized as /dev/sda regardless of the jumper setting.
For a SCSI system with each drive on separate SCSI adapters, both drives are typically configured as SCSI device id 0 (but I prefer to set the one on the second adapter as SCSI device id 1). You may need to determine which adapter is recognized first and which is recognized second. If the adapters are the same model by the same manufacturer this is a more difficult task. You may have to temporarily remove one of the drives to see which adapter is recognized first. Then you may want to label them. I am going to refer to the SCSI drive that is on the adapter that is recognized first as being in the primary position. Linux usually recognizes this drive as /dev/sda. I am going to refer to the SCSI drive that is on the adapter that is recognized second as being in the secondary position. Linux should recognize this drive as /dev/sdb. These statements assume both drives are installed. If only one drive is installed, it will be in the primary position and be recognized as /dev/sda regardless of which adapter it is installed on.
All the data on the two drives we use for testing will be erased. Any drive that may be used in the future to replace a drive that has failed MUST be clean. Any drive that has been used in a RAID array at any time in the past must also be cleaned. Let me explain why. Let's pretend for a moment that we used apple in a RAID array, then unplugged it and replaced it, then put it away for later use as an emergency replacement. A year from now one of our drives goes bad, so we shut the machine down and place apple in its place. Then we boot up and to our horror, the good drive has synced itself to the data stored on apple and not the other way around. To clean a drive, install it in a system by itself, and boot up using a DBAN disk. You can change the Method to Quick erase. This will write zeros to each bit on the disk. You should also have a rescue disk (like the Etch CD) available.
Install a cleaned apple in the primary position and leave pie out of the computer. A good place for the CDROM drive is the secondary slave EIDE interface. Boot up using the appropriate Debian installer media. At this time Etch is still 'testing', so look here for a CD: http://www.debian.org/devel/debian-installer/. Once Etch becomes stable, look here: http://www.debian.org/distrib/netinst.
HDD 1 Setup
This illustration shows what the end product of my test machine will look like. Some people use a separate /boot partition and I have only tested this setup with one in place. When installing Debian on your system you should set the partitions up in the same manner as your production box so you can gain experience with something closer to your setup. I will not detail installing Debian, but I will say that since this is just a test system you will only need to install the absolute minimum number of software packages. When using the partition manager you do not want to configure RAID in any way. You should install the GRUB boot loader. Please take notes on how your disk is configured:
device mount md-device temp-mount boot partition-type /dev/hdc1 /boot /dev/md0 /mnt/md0 * primary (100MB) /dev/hdc5 swap /dev/md1 logical (1GB) /dev/hdc6 / /dev/md2 /mnt/md2 logical (remainder of disk)
Continue on with the Debian installer until you get to the point you can log in as root.
HDD 2 Setup
Now you can remove apple, install pie in its place and clean it per the instructions mentioned earlier (unless you have already done so). Then place apple back in the primary position and place pie in the secondary position and start up the computer.
Back up a few files. Your initrd.img may be a different version. If so, I suggest saving this document to your computer and doing a search and replace of the kernel version number:
cp /etc/fstab /etc/fstab-backup cp /etc/mtab /etc/mtab-backup cp /etc/initramfs-tools/modules /etc/initramfs-tools/modules-backup cp /etc/modules /etc/modules-backup cp /etc/initramfs-tools/initramfs.conf /etc/initramfs-tools/initramfs.conf-backup cp /boot/grub/menu.lst /boot/grub/menu.lst-backup cp /boot/initrd.img-2.6.18-4-686 /boot/initrd.img-2.6.18-4-686-backup
I personally need my vim, and we need to install our main program, mdadm (multi-disk administrator). Obviously you may choose to use a different editor:
apt-get update apt-get install initramfs-tools mdadm
Accept the default answer (possibly 'all') when prompted.
- [Optional]You can install vim editor by doing:
apt-get install vim
- Now issue a command
Should have these entries configured in a similar manner as shown:
INITRDSTART='all' AUTOSTART=true AUTOCHECK=true START_DAEMON=true DAEMON_OPTIONS="--syslog" VERBOSE=false USE_DEPRECATED_MDRUN=false
Now we will include the needed modules in the ramdisk image. We start by modifying a file that the compiler uses:
and insert at the end of the list of modules:
md raid1 raid5 raid0
Save and exit the file. This part is important to get right or our system will not boot up into the md devices. You need to copy all the modules listed in /etc/modules to /etc/initramfs-tools/modules that deal with our hard disk drives, motherboard chipset and raid (in the same order they are listed in /etc/modules). You would not need to include drivers (modules) that obviously deal with things like the CDROM drive or mouse. If you are not certain, then it is better to include it. If our hard drives are not recognized prior to our md devices, our system will not boot. You might see modules like ide-detect, ide-disk, ide-scsi and others. Essentially copy all the modules to /etc/initramfs-tools/modules, then remove any that do not pertain to our hard drives (psmouse, ide-cd):
grep -vE '^$|^#' /etc/modules >> /etc/initramfs-tools/modules vim /etc/initramfs-tools/modules
Once you have added (and possibly removed duplicated) modules there, save and exit the file. Now we make the new initrd.img. We actually end up doing this three or four different times during this setup (because our system will be going through changes):
If the above command fails with an error message something like:
/boot/initrd.img-2.6.18-4-686 has been altered. Cannot update.
then try to fix the situation with this command:
update-initramfs -k `uname -r` -t -u
In order to load the new image into memory, we must reboot:
When the system comes back up run cat /proc/mdstat to see if we now have a system capable of using a RAID array:
The resulting output on my machine:
Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid10] unused devices: <none>
If [raid1] is not shown here then you are not loading the needed modules. You cannot continue past this point until you are.
Copy Partition Structure
Now we will copy the partition structure from apple to pie. This is one of those things you must be careful doing because this will destroy all data on the target disk. Since we have already cleaned the target disk we should not have to --force this to work (but if required, add --force to the end of the command). The target disk must be of equal or greater size than the source disk. Make sure the command reflects what you want to accomplish:
Run 'df' to first make sure we are currently using the disk you think we are:
Here we copy the structure from /dev/hda to /dev/hdc:
sfdisk -d /dev/hda | sfdisk /dev/hdc
Now we will use cfdisk to edit the partition table on pie (in the secondary position) and change the partition types to "Linux raid autodetect". This may also destroy all data on a disk so be careful you are editing the correct disk. To change the partition type, first use up and down arrows to select a partition, then left and right arrows to select [Type] from the menu. Press [Enter] to change the type. The type you want is FD (lower case is fine). Repeat for all partitions, then [Write] the changes, then [Quit]. Your original drive should have had a partition flagged as bootable that was copied to this target drive. Make sure you don't accidentally toggle this off.
My finished product:
cfdisk 2.12r Disk Drive: /dev/hdc Size: 10005037056 bytes, 10.0 GB Heads: 16 Sectors per Track: 63 Cylinders: 19386 Name Flags Part Type FS Type [Label] Size (MB) ------------------------------------------------------------------------------ hdc1 Boot Primary Linux raid autodetect 98.71* hdc5 Logical Linux raid autodetect 1003.49* hdc6 Logical Linux raid autodetect 8899.76*
At this point we will reboot again so our system properly recognizes the changes made to this disk:
Now we can start the process of creating the degraded array. We start by doing some additional cleaning of our drive in the secondary position (pie). This is to insure there are no remnants from prior RAID installations. Zero the superblock for each of the partitions we configured as type "Linux raid autodetect":
mdadm --zero-superblock /dev/hdc1 mdadm --zero-superblock /dev/hdc5 mdadm --zero-superblock /dev/hdc6
Now we create md devices for each of our partitions, with one partition present on each md device and one partition missing. The ones that are missing are on our primary drive (apple). We can't add these to our array at this time because those partitions are currently in use and they are not the of the type of partition we want. The general plan is to create the RAID structure on the first RAID disk (pie), copy all the data from the original disk (apple) to that RAID disk, reboot to that degraded RAID disk, then reformat the original disk and add it to our RAID array (at which time the two disks will begin to synchronize). There are obvious risks in doing this and the process is prone to error. One thing that could be difficult to keep track of is: a number of files related to RAID must of course be on the RAID drive. When we boot to the RAID drive, it must be configured as a RAID drive. Some people first copy all the data from the original drive to the RAID drive, then modify the RAID related files on the RAID drive prior to rebooting into it. Then if they have problems and need to make changes to the system they often make the mistake of trying to fix the RAID related files by editing the files on the original drive. They get confused. I prefer to configure everything on the original drive and then copy the data over at the very last moment. If things get really ugly we can boot up with the rescue disk and make a few changes to the original disk to enable us to boot up into it (provided we have not reformatted it yet). Then we can make the necessary changes and copy the data over once again. Anyway, lets create the needed md devices. Edit as required and then run these one at a time:
mdadm --create /dev/md0 --level=1 --raid-disks=2 missing /dev/hdc1 mdadm --create /dev/md1 --level=1 --raid-disks=2 missing /dev/hdc5 mdadm --create /dev/md2 --level=1 --raid-disks=2 missing /dev/hdc6
Once again, run cat /proc/mdstat:
You should get something similar to this which displays the fact that one out of two disk devices are up [_U] for each of our md devices (and the other ones are missing). This is called degraded:
Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] [linear] [multipath] [raid10] md2 : active raid1 hdc6 8691008 blocks [2/1] [_U] md1 : active raid1 hdc5 979840 blocks [2/1] [_U] md0 : active raid1 hdc1 96256 blocks [2/1] [_U]
If your system does not show something with a similar structure then you must fix it before continuing. Now we create file systems on our md devices that match the file systems currently in use on our original devices. This also erases data on the target devices. I use ext3 and of course a swap partition:
mkfs.ext3 /dev/md0 mkswap /dev/md1 mkfs.ext3 /dev/md2
When the system boots up into our RAID system, it should automatically assemble at least one of the md devices we created (so we can start the boot process) but it may not assemble the rest. This could result in a failure to complete to boot process. This task of reassembling the remaining devices is handled by /etc/init.d/mdadm-raid. This init script uses the command mdadm -A -s -a which means: "automatically assemble all of our md devices using the information stored in /etc/mdadm/mdadm.conf". Well, we must update the information in mdadm.conf so it correctly reflects our current state (as shown by /proc/mdstat). To do so we start by making a copy of the original mdadm.conf that was created when we installed mdadm. We will use the copy as a basis for any new mdadm.conf we create. The original file (with comments removed) looks like this on my system:
DEVICE partitions CREATE owner=root group=disk mode=0660 auto=yes HOMEHOST <system> MAILADDR root
cp /etc/mdadm/mdadm.conf /etc/mdadm/mdadm.conf-original
Now we will populate mdadm.conf with information about our existing arrays:
mdadm --examine --scan >> /etc/mdadm/mdadm.conf
Now let's display the file this created (comments removed):
With comments removed it should show something like this (do not use these exact UUID numbers in your own file!):
DEVICE partitions CREATE owner=root group=disk mode=0660 auto=yes HOMEHOST <system> MAILADDR root ARRAY /dev/md0 level=raid1 num-devices=2 UUID=2307ad50:dce81757:e5540b2b:ae2626a5 ARRAY /dev/md1 level=raid1 num-devices=2 UUID=0cdcc3a4:d8f14699:e5540b2b:ae2626a5 ARRAY /dev/md2 level=raid1 num-devices=2 UUID=60b4594a:14050c2f:e5540b2b:ae2626a5
Our RAID system has changed, so once again we are going to make a new initrd.img (you may notice you no longer get mdadm error messages):
Before we attempt booting up into our md devices we are first going to do a test to insure our md devices are assembled after a reboot and that they are mountable during the boot process. First we make a mount point for each of our devices (all except the swap partition):
mkdir /mnt/md0 mkdir /mnt/md2
Run 'free' and make a note of how much total swap space you have:
Then edit /etc/fstab and make some changes. At the bottom of the file place directives to mount each of our md devices to the mount points we created (or swap). Here is a sample:
and insert (edit as needed to reflect your system):
/dev/md0 /mnt/md0 ext3 defaults 0 0 /dev/md1 none swap sw 0 0 /dev/md2 /mnt/md2 ext3 defaults 0 0
With these lines added my /etc/fstab now looks like this:
proc /proc proc defaults 0 0 /dev/hda6 / ext3 defaults,errors=remount-ro 0 1 /dev/hda1 /boot ext3 defaults 0 2 /dev/hda5 none swap sw 0 0 /dev/hdd /media/cdrom0 udf,iso9660 user,noauto 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0 /dev/md0 /mnt/md0 ext3 defaults 0 0 /dev/md1 none swap sw 0 0 /dev/md2 /mnt/md2 ext3 defaults 0 0
Save and exit the file, then let's reboot and see if this works:
When the system comes up, run 'mount' to see if the devices were mounted. There is no point continuing past this point unless they were:
If you run free again, it should show the total swap space is twice the size it was before. Assuming you have a swap partition, you must get this working before you continue on:
Then run 'cat /proc/mdstat' again and verify all the md devices that used to be there are still there. There is no point continuing past this point unless they are:
OK. If everything is working (it must be working before you continue), now comes the scary part. Don't reboot until I tell you to. We are going to continue to configure RAID related files on our original drive, then we are going to copy all our data from the original devices to the md devices, then create a boot record on the secondary drive, then boot up using the md devices on the secondary drive instead of our original devices on the primary drive. There are a few things to think about as far as the copy process goes. The machine should not be in the middle of processing stuff, so you should drop into single user mode and possibly disconnect the ethernet cable. You should also consider stopping processes that may be writing to the disk (like your MTA). Because you will be in single user mode you will run the actual copy commands at the console (not remotely). You should not delay once the data is transferred and it comes time to reboot. If you successfully boot up into the md devices then be aware that the data on the original drive will soon become stale, so in the event you need to boot back into the original drive be aware that you may loose data. Hopefully you will have no need to do that and your new RAID devices will hold current data from now on. If you are able to boot up using your md devices, the scary part is over (but opportunities to destroy your system still remain). We are going to configure /etc/fstab and /etc/mtab to boot up into the md devices; we are going to create another initrd.img that knows about our md devices and we are going to tell grub to boot into our md devices. We will also configure grub to boot from our secondary drive. We will start by editing /etc/fstab again. We must remove (or comment out) the lines we added previously (they were just a test), then change the corresponding /dev/hda devices in /etc/fstab to /dev/md devices:
and modify it in a similar manner to this sample. Of course the mount points must correctly correspond to appropriate md devices. Refer to the notes you should have made. My finished /etc/fstab file will look like this:
proc /proc proc defaults 0 0 /dev/md2 / ext3 defaults,errors=remount-ro 0 1 /dev/md0 /boot ext3 defaults 0 2 /dev/md1 none swap sw 0 0 /dev/hdd /media/cdrom0 iso9660 ro,user,noauto 0 0 /dev/fd0 /media/floppy0 auto rw,user,noauto 0 0
Save and exit the file. I also find it useful to edit /etc/mtab to reflect what our new system will look like:
Here is mine before the changes:
/dev/hda6 / ext3 rw,errors=remount-ro 0 0 [...other stuff...] /dev/hda1 /boot ext3 rw 0 0 /dev/md0 /mnt/md0 ext3 rw 0 0 /dev/md2 /mnt/md2 ext3 rw 0 0
and after I edited it in the same manner I edited /etc/fstab (note that I removed /dev/md0 and /dev/md2) it shows:
/dev/md2 / ext3 rw,errors=remount-ro 0 0 [...other stuff...] /dev/md0 /boot ext3 rw 0 0
Make sure there are no mistakes in /etc/fstab or /etc/mtab. Once again we would update our initrd.img:
Now we are going to update the GRUB menu. Edit grub's menu.lst:
We are going to add a new menu item that tells grub to boot from our secondary drive (grub refers to it as hd1). We will also add a fallback entry that (hopefully) will automatically choose the next item in the menu if the first item fails. So, just below "default 0", add this entry:
Make a duplicate of your existing top menu stanza, place the duplicate above the existing stanza and modify it in the same manner I have. I changed hd0 to hd1 and /dev/hda6 to /dev/md2. This example shows partition 0 is the partition flagged as bootable on my system. You can run something like 'fdisk -l /dev/hda' to determine which partition is bootable on your system but your original stanza will be correct. Remember that grub starts counting from zero:
title Debian GNU/Linux, kernel 2.6.18-4-686 RAID (hd1) root (hd1,0) kernel /vmlinuz-2.6.18-4-686 root=/dev/md2 ro initrd /initrd.img-2.6.18-4-686 savedefault title Debian GNU/Linux, kernel 2.6.18-4-686 root (hd0,0) kernel /vmlinuz-2.6.18-4-686 root=/dev/hda6 ro initrd /initrd.img-2.6.18-4-686 savedefault
Just a note: because we mount a /boot partition, you will not see the above entries in the form "/boot/initrd.img-2.6.18-4-686". If you do not mount a /boot partition, you will see the entries in that form.
If you have been following this HOWTO correctly, our md devices will still be mounted to the mount points we had in /etc/fstab when we booted up. If they are not mounted for some reason (shame on you, I told you not to reboot), you will need to remount them. For example: "mount /dev/md2 /mnt/md2", "mount /dev/md0 /mnt/md0". Now we are going to copy our data. In my case I want to copy all the data in the root partition to /mnt/md2, and all the data in the /boot partition to /mnt/md0. The copy from root to the md mount point is straightforward but other mount points such as /boot are not as straightforward. For those I first change to that directory, then use the period (.) to signify "here". In other words "copy from here to there" as opposed to "copy this to that". The prevents me from copying /boot to /mnt/md0 and ending up with a /mnt/md0/boot directory instead of a /mnt/md0 directory containing all the files in the /boot directory.
At the console get into single user mode:
then work on the copy process. All files on the disk need to get copied so use your head:
cp -dpRx / /mnt/md2 cd /boot cp -dpRx . /mnt/md0
Run some tests and make sure the source and destination match for each mount point. Fix it if they don't:
ls -al / ls -al /mnt/md2 ls -al /boot ls -al /mnt/md0
On my system grub was booting off of hard disk 0, partition 0, and it was told root was mounted on /dev/hda6. Now I have instructed it to boot off of hard disk 1, partition 0, and I told it root is mounted on /dev/md2. Now I must install grub on hard disk 1, partition 0 (the secondary drive). Start the grub shell prompt:
at the grub> prompt enter these commands to install grub on both drives (edit partition number if needed):
root (hd0,0) setup (hd0) root (hd1,0) setup (hd1) quit
Hopefully those commands succeeded. If there was a failure I'm not sure you should continue. OK. Now comes the butterflies in your stomach. Knock on wood, throw salt over your shoulder, rub your lucky rabbit's foot, cross your fingers.
If it crashes, don't freak out just yet. Read this. If it reboots, run df and check that it is in fact our md devices we are using. Run cat /proc/mdstat again and insure all md devices are shown there, If all is well, we are no longer using the original drive. If all is not well, it must be fixed before we continue:
df cat /proc/mdstat
My df looked like this:
Filesystem 1K-blocks Used Available Use% Mounted on /dev/md2 8554524 624688 7495288 8% / tmpfs 258344 0 258344 0% /lib/init/rw udev 10240 88 10152 1% /dev tmpfs 258344 0 258344 0% /dev/shm /dev/md0 93207 26191 62204 30% /boot
Shows that swap has returned to its normal size once again.. OK. Now we will reformat the original drive (apple /dev/hda) and then add it to our array. I hope everything is working great so far and all our files were successfully copied because we now must destroy all data on the original drive. Run cfdisk on the original drive and (just as we did for our secondary drive) change the type of each partition to type "FD" (lower case is fine). This is the part where (if you are working on a production box) you should have a good backup of the drive because this will destroy all the data on the original disk:
Change all the partition types, then write and quit. Make sure you have not toggled off the boot flag.
Now we can add the partitions on /dev/hda to our RAID array. Edit this if necessary to suit your system. Do this one at a time:
mdadm --add /dev/md2 /dev/hda6 mdadm --add /dev/md0 /dev/hda1 mdadm --add /dev/md1 /dev/hda5
Now you will just have to WAIT until the disks synchronize. NEVER REBOOT while disks are synchronizing. You can monitor the progress with:
watch -n 6 cat /proc/mdstat
Mine looks like this after a while. Notice we now are using both drives and md2 has fully synced:
Personalities : [raid1] [raid6] [raid5] [raid4] [raid0] md2 : active raid1 hda6 hdc6 8691008 blocks [2/2] [UU] md1 : active raid1 hda5 hdc5 979840 blocks [2/1] [_U] [=>...................] recovery = 8.3% (82368/979840) finish=0.9min speed=16473K/sec md0 : active raid1 hda1 hdc1 96256 blocks [2/1] [_U] resync=DELAYED unused devices: <none>
Of course, it's [Ctrl]+c to cancel 'watch'. Once the sync has completed (and not until then), we need to tell mdadm.conf about our new drives and make another initrd.img (for the last time):
cp /etc/mdadm/mdadm.conf-original /etc/mdadm/mdadm.conf mdadm --examine --scan >> /etc/mdadm/mdadm.conf
Note that there may be an alternate way (a nice sript) to create a new mdadm.conf:
Either way, this should still show all our arrays are present:
For the last time:
We need to edit grub's menu.lst one last time. We are booting off of the secondary drive (and will continue to do so) but now if that should fail we want it to boot off the primary drive (now also configured as a RAID device). Make a copy of the first menu choice stanza, place it in the second position, and modify it in a manner similar to the provided sample:
title Debian GNU/Linux, kernel 2.6.18-4-686 RAID (hd1) root (hd1,0) kernel /vmlinuz-2.6.18-4-686 root=/dev/md2 ro initrd /initrd.img-2.6.18-4-686 savedefault title Debian GNU/Linux, kernel 2.6.18-4-686 RAID (hd0) root (hd0,0) kernel /vmlinuz-2.6.18-4-686 root=/dev/md2 ro initrd /initrd.img-2.6.18-4-686 savedefault
While you are at it, modify the '# kopt=root=' line to reflect our current situation (I changed /hda6 to /md2). Don't remove the # in front of it, it has meaning. (double ## are comments in this special AUTOMAGIC section):
# kopt=root=/dev/md2 ro
Also, if everything is working properly you should remove the menu stanza that boots to a non-raid partition. You would corrupt your system if you were to boot up to something like /dev/hda6 and edit files on that drive.
Your system is complete. I would reboot one more time just to make sure it comes up. OK, now I'm going to simulate a failed drive. I don't recommend you try this (your system may explode), but at least you can learn from my system. I am carefully going to remove the power cable from the primary drive, apple. Once I do this, it will be "dirty" and should not be used again in this system without first being cleaned. This is what mdstat shows as a result: hda1 and hda5 still show they are up because we have not had any read/write operations on them recently, hda6 shows it has failed (Faulty).
md0 : active raid1 hda1 hdc1 96256 blocks [2/2] [UU] md1 : active raid1 hda5 hdc5 979840 blocks [2/2] [UU] md2 : active raid1 hda6(F) hdc6 8691008 blocks [2/1] [_U]
If your hardware supports hot swappable drives I think you should mark the remaining two devices faulty (since they actully are on a failed drive), then use mdadm to remove all three faulty devices from our array before inserting the new drive. You cannot use "mdadm --remove" on devices that are in use, so they need to be set as faulty first. You do not need to do this if you are going to power down the system and replace the drive with a clean drive. Make doubly sure you are failing the partitions on the drive that has failed!
Only needed if using hot-swap drives and you are not going to power down:
mdadm --set-faulty /dev/md0 /dev/hda1 mdadm --set-faulty /dev/md1 /dev/hda5 mdadm --remove /dev/md0 /dev/hda1 mdadm --remove /dev/md1 /dev/hda5 mdadm --remove /dev/md2 /dev/hda6
Shut it down:
shutdown -h now
For consistency (and to keep my sanity) I always move the good drive to the primary position (if it is not already there) and place the new clean drive in the secondary position. We have shut down, so disconnect the good drive, clean apple, move pie (the good drive) into the primary position, place the cleaned apple in the secondary position and bring the system back up. When using SCSI drives all I have to do to swap the two SCSI drives is move the jumper from one drive to the other. OK, my system did boot up. First we see what's going on (cat /proc/mdstat). As you can see, hdc1, hdc5 and hdc6 are missing:
md0 : active raid1 hda1 96256 blocks [2/1] [_U] md1 : active raid1 hda5 979840 blocks [2/1] [_U] md2 : active raid1 hda6 8691008 blocks [2/1] [_U]
We start by copying the partition structure from /dev/hda to /dev/hdc. We do this for what should now be an obvious reason: the secondary drive is empty, but it needs to have the same structure as the primary drive. If the disk was first cleaned, and is large enough, you should have no errors (but you may still have to --force it):
sfdisk -d /dev/hda | sfdisk /dev/hdc
We make sure the superblocks are zeroed out on the new drive (as always, be careful you do this to the correct drive). Edit as needed:
mdadm --zero-superblock /dev/hdc1 mdadm --zero-superblock /dev/hdc5 mdadm --zero-superblock /dev/hdc6
Now we add our three hdc partitions to the corresponding md's. Understand what you are doing here before you do it, edit as needed:
mdadm --add /dev/md0 /dev/hdc1 mdadm --add /dev/md1 /dev/hdc5 mdadm --add /dev/md2 /dev/hdc6
Watch them sync:
watch -n 6 cat /proc/mdstat
Once the recovery is complete (and not until then), create a new boot records on both drives:
From the grub> prompt (edit partition number if needed):
root (hd0,0) setup (hd0) root (hd1,0) setup (hd1) quit
We are working again.
You might want to reboot from the console to make sure you actually boot from the secondary drive.
You should never experiment with this next step on a production system because it will trash your array and you could loose data. If you need to prove to yourself that each drive will boot up when it is the only drive in the system, you could boot up using each drive with the other one missing. As soon as a drive boots up, log in and run 'shutdown -h now' to shut it back down. Then try the other drive. Then if you care in the least about the integrity of the data on the system you should clean one of the drives and install it just as you would if you were replacing a failed drive. It's not a good idea to fire up the system using both drives if each drive has been started independently. Also, in A RAID system it is a good idea to avoid kernel version upgrades (security upgrades should be performed of course). References (alphabetical order). Not all of these are good, but all were interesting to me in one way or another.
Trust me, there are a lot more documents similar to these out there: http://alioth.debian.org/project/showfiles.php?group_id=30283&release_id=288 http://deb.riseup.net/storage/software-raid/ http://forums.whirlpool.net.au/forum-replies-archive.cfm/471585.html http://nepotismia.com/debian/raidinstall/ http://nst.sourceforge.net/nst/docs/user/ch14.html http://piirakka.com/misc_help/Linux/raid_starts_degraded.txt http://thegoldenear.org/toolbox/unices/server-setup-debian.html http://togami.com/~warren/guides/remoteraidcrazies/ http://www.debian-administration.org/articles/238 http://www.debian-administration.org/users/philcore/weblog/4 http://www.doorbot.com/guides/linux/x86/grubraid/ http://www.epimetrics.com/topics/one-page?page_id=421&topic=Bit-head%20Stuff&page_topic_id=120 http://www.james.rcpt.to/programs/debian/raid1/ http://www.linuxjournal.com/article/5898 http://www.linuxsa.org.au/mailing-list/2003-07/1270.html http://www.linux-sxs.org/storage/raid_setup.html http://www.parisc-linux.org/faq/raidboot-howto.html http://www.tldp.org/HOWTO/Software-RAID-HOWTO.html http://trinityhome.org/misc/bootable-raid1.html http://www.vermittlungsprovision.net/367.html http://xtronics.com/reference/SATA-RAID-debian-for-2.6.html
Gary V mr88talent at yahoo dot com Last edited 25 FEB 2007
Raid 5 (via Debian installer)
Our team at LinuxForce recently put together a Debian server with LVM on a software RAID5 volume. This has been possible through complex installation procedures in the past, but today the Debian Etch installer is capable of handling such an installation if you follow the proper steps, which I outline in this article. Among other things, we needed the flexibility to write partition tables for Xen on the fly, dependability that would allow a generous replacement window when harddrives failed, and as little possibility of data loss and downtime through harddrive failure as possible. Assumptions about our example for this article 1. Our partition table will be as follows: * 1G / - root * 1G /tmp - tmp * 3G /home - home * 3G /var - var * 500M swap 2. Our system has four drives (for RAID: three active drives, one hot spare) 3. The harddrives are SATA or SCSI, if you're using IDE drives keep in mind that all the sda1 references (for example) will be hda1. 4. Since this article is being written so close to the release of Etch as stable, there shouldn't be too many changes to these directions before release, but for reference we are using the Debian Etch RC1 installer on a netinst daily image downloaded March 13, 2007. Before Installation The first challenge was developing a partitioning scheme that would boot. lilo and grub cannot boot from RAID5 so an additional partition separate from the RAID5 had to be created for this purpose. There are many creative ways of doing this, but our solution was to create a 1 gig RAID1 root partition. Alternatively we could have created a smaller /boot partition. Another consideration is where to put the swap partition. There is an argument for putting swap on it's own partition outside of the lvm (for speed, mostly). We made the decision to simply keep it on LVM for ease of administration. Make sure any hardware Raid is turned off in the BIOS, we want all disks to appear separately in the partitioning table. Creating RAID Volumes Start the Debian Etch netinst as with a normal install. At the Partition Disks menu choose Partitioning method: Manual Delete any existing partitions so only "FREE SPACE" is listed. Select the first drive and do the following: * Create new partition * New partition size: 1G (or the size you want root to be) * Type for the new partition: Primary * Location for the new partition: Beginning * Use as: physical volume for RAID * Done setting up the partition This will create your RAID1 bootable section. Then: * Create new partition * New partition size: <the remaining space on the drive> * Type for the new partition: Primary * Location for the new partition: Beginning * Use as: physical volume for RAID * Done setting up the partition This will create your RAID5. Partition the remaining three disks in the same way. Now each partition on the drives will show up as partitioned as "K Raid" Configure Software RAID There is now an option to Configure Software Raid on your partitioning screen - select this. At the next screen say "Yes" to write the changes to the storage devices and configure RAID. Choose: Create MD device * Multidisk device type: RAID1 * Number of active devices for the RAID1 array: 3 * Number of spare devices for the Raid1 array: 1 * Now you will choose the devices to use. Since you created the 1G partitions first, you will want sda1, sdb1, sdc1 * The next screen will ask you which device you should use for the spare, choose sdd1 Now you will want to choose Create MD device again. * Multidisk device type: RAID5 * Number of active devices for the RAID5 array: 3 * Number of spare devices for the Raid1 array: 1 * Now you will choose the devices to use. The first three should be the ones you wish to use as active. * The next screen will ask you which device you should use for the spare, select the only remaining option which should be sdd2 Now: Finish You will be sent back to the partitioning screen. Partition RAID1 Select the partition on your newly created Raid1 volume, it will say: "Use as: do not use" - select this line, hit enter and make it an ext3 / (root) partition Done setting up this partition Create Physical LVM Volume Select the partition on your newly created RAID5 volume, it will say: "Use as: do not use" - select this line, hit enter and make it a "physical volume for LVM" Done setting up the partition Once this is completed the RAID5 volume will show up as partitioned as "K LVM" Configure LVM There is now an option to Configure the Logical Volume Manager on your partitioning screen - select this. At the next screen say "Yes" to write the changes to the storage devices and configure LVM. In the LVM Configuration screen you will first need to Create Volume Group: * We name this Group the same name as the server itself. * Choose device - should only be one choice: /dev/md1 Now Create Logical Volumes: * Select the volume group you just created (there should only be one option) * Create your logical volumes, one for each partition being created and name them: servername_var, servername_tmp, servername_swp and servername_home (note: you can name these whatever you want) * The size of these Logical Volumes is the size you want to make the actual partitions of /var /tmp swap and /home Once you are finished and arrive back at the LVM configuration screen choose Finish Partition LVM Volumes Back at the partitioning screen you will see all the logical volumes in the partition table. Partition each of these as you normally would (when you go into each parition it will say: it will say: "Use as: do not use" - select this line, hit enter to change), LV servername_var will be the entire /var partition, etc. When partitioning is complete: Finish partitioning and write changes to disk. Continue Debian installation as with a normal install. Bootloader When you get to the step where you need to install grub/lilo you want to install it on your RAID1 partition, md0. The Debian installer should figure this out on its own and you can agree to the default, but keep this in mind if any problems arise when you complete the installation and reboot the machine. Notes Optional step: To make sure LVM doesn't get "confused" by the separate disks versus the RAID volume, we tell lvm only to start on the md1 block device: Edit /etc/lvm/lvm.conf: Change the filter line to: filter = [ "a|/dev/md1|", "r/.*/" ] (and make sure you only have one filter line) We leave this as an optional step because you may have other reasons for looking for other block devices. Written by: Elizabeth Bevilacqua, System Administrator at LinuxForce Acknowledgements: Stephen Gran, System Administrator at LinuxForce
- Examine drive:
mdadm --examine /dev/hda1
- Assemble array:
mdadm --assemble --auto=yes /dev/md0 /dev/hda1 /dev/hdb1 /dev/hdc1 /dev/hdd1
Install MBR on Disk 2
- When you removed the first disk, it wouldn't boot (grub error 15).
- You have to installed in MBR on a second disk. Here is how you do it:
- Log in in single user mode and type in this command
grub --device-map=/boot/grub/device.map (at this point you get a different prompt, with grub>) root (hd0,0) setup (hd0) root (hd1,0) setup (hd1) quit
- After you do this, you could take out the first hdd and see if the system boots from a second drive.
Adding new harddrive
When adding new harddrive please make sure you use the --add command and not -add. The -add adds a spare drive.
mdadm --add /dev/md0 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sde5
Promote spare to active
- If you try to promote spare or add a drive and you get drive busy one thing you can do is:
#This will list the details of the raid mdadm -D /dev/md0 #stop the partition mdadm --stop /dev/md0 #add all the drive including the spare mdadm --assemble --update=resync /dev/md0 /dev/sda5 /dev/sdb5 /dev/sdc5 /dev/sde5 #now watch cat /proc/mdstat
- Install gdisk. On debian squeeze you need to add main backports
vi /etc/apt/source.list #Add below deb http://backports.debian.org/debian-backports squeeze-backports main contrib non-free
- Install gdisk
aptitude install gdisk
Drive sdb1 failed
- Show details of partition
mdadm --detail /dev/md0
cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdc1 sdd1 3907028864 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]
- Drive sdb1 is already removed. but if you need to remove it manually you can:
mdadm /dev/mmdadm /dev/md0 -r /dev/sdb1
- SHUT DOWN IF YOU NEED TO REPALCE DRIVE. MAKE SURE NEW DRIVE IS STILL sdb.
- Look how disk is structured and what partition type it has
sdisk -p /dev/sdb
sgdisk -p /dev/sdb Disk /dev/sdb: 3907029168 sectors, 1.8 TiB Logical sector size: 512 bytes Disk identifier (GUID): 0ED13F81-6EEA-4E12-9F27-DD806CF1F09C Partition table holds up to 128 entries First usable sector is 34, last usable sector is 3907029134 Partitions will be aligned on 8-sector boundaries Total free space is 0 sectors (0 bytes) Number Start (sector) End (sector) Size Code Name 1 34 3907029134 1.8 TiB FD00
- Now copy partition A structure into partition B
#sgdisk -R=/dev/TO_THIS_DISK /dev/FROM_THIS_DISK sgdisk -R=/dev/sdb /dev/sda #Give new GUID since above options clones the disk including GUID sgdisk -G /dev/sdb
- Now readd the drive to md0
mdadm /dev/md0 -a /dev/sdb1
- Checkc the status
cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdb1 sda1 sdc1 3907028864 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU] [>....................] recovery = 0.0% (124204/1953514432) finish=786.3min speed=41401K/sec