Could not find logical volume "VolGroup00" LVM over RAID recovery - FIXED RH/Fedora/CentOS/CloudLinux

Intro: This tutorial describes the local recovery procedure for servers running LVM over software RAID1 on RH powered servers when the boot drive (the grub MBR) has failed in the RAID1 array.
It also provides a solution for the very annoying error when RAID is involved:
Scanning logical volumes
No volume groups found
Activating logical volumes
Volume group "VolGroup00" not found
Trying to resume from /dev/VolGroup00/LogVol01
Unable to access resume device /dev/VolGroup00/LogVol01
Creating root device
Mounting root filesystem
mount: could not find the system "/dev/root/"
Setting up other filesystems
Setting up new root fs
setuproot: moving /dev/failed: No such file or directory
no fstab.sys, mounting internal defaults
setuproot: error mounting /proc: No such file or directory
setuproot: error mounting /sys: No such file or directory
Switching to new root and running init
unmounting old /dev
unmounting old /proc
unmounting old /sys
switchroot: mount failed: No such file or directory
Kernel Panic - not syncing: attempted to kill init
Tools you will need: 1 new replacement HDD, 1 Slackware linux bootable CD, 1 RH/CentOS/Fedora/CL bootable CD.
Problem description: in case of a boot disk failure in a RAID1 array for the boot drive on systems running LVM over software RAID1 the system will not boot from the backup drive as grub is actually missing from the MBR and cannot simply be installed on the second drive. In the most common setup we will have 1 /boot (md0, ext2, RAID1) partition and 1 or multiple other LVs mounted as the filesystem over RAID (md1 etc. - we will presume only md1 is involved)
Problem solution:
First you will have to get a working grub from the backup HDD. Because of the fact that the backup HDD contains a LVM over RAID system it will only be partially visible in recovery mode (only /proc and /sys). To be able to see the backup HDD in recovery mode you will have to couple the backup HDD with another clean install of your RH distribution.
Pull out the faulty HDD AND the backup HDD from the server and install the new replacement HDD as /dev/sda (the first HDD). Make a simple, default minimal RH installation on the new HDD.
After the installation has completed, put in the backup HDD containing the data to be recovered as /dev/sdb (second drive) and boot from the RH bootable CD in recovery mode, you will have to reconstruct a valid grub loader. To do that boot the RH bootable CD and enter recovery mode by typing at the prompt:
linux recovery
The recovery console will now detect both HDDs. The filesystem from /dev/sda will be mounted in /mnt/sysimage. Execute chroot by typing at the prompt:

chroot /mnt/sysimage

Start the grub shell with:
grubProbing devices to guess BIOS drives. This may take a long time.

GNU GRUB version 0.97 (640K lower / 3072K upper memory)
[ Minimal BASH-like line editing is supported. For the first word, TABlists possible command completions. Anywhere else TAB lists the possiblecompletions of a device/filename.]
Execute the following commands to reconstruct the grub loader:
grub> find /grub/grub.conf
(hd0,0)
(hd1,0)
grub> root (hd1,0)
Filesystem type is ext2fs, partition type 0x83
grub> setup (hd1)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/e2fs_stage1_5" exists... yes
Running "embed /grub/e2fs_stage1_5 (hd1)"... 16 sectors are embedded.
succeeded
Running "install /grub/stage1 (hd1) (hd1)1+16 p (hd1,0)/grub/stage2 /grub/grub.conf"... succeeded
Done.
grub> quit
You will now need to move the bootloader and the partition scheme from /dev/sdb to /dev/sda
Boot from the Slackware CD. You will now have to clone the new drive according to the backup drive. Assuming /dev/sda is the new drive and /dev/sdb is the RAID1 backup drive (containing the data to be recovered) execute the following command in the Slackware shell:
dd if=/dev/sdb of=/dev/sda conv=noerror
(aproximate cloning time: 2.5h to 2h for every 250 GB of data depending on drive speed, you can use paramtere b=1024 for increased speed instead of the default block size of 512bytes)
In order to see the data that has been written by dd, open a new Slackware console (ALT+F2) and execute:
ps -a | grep dd
get the PID of the process and execute
watch -n 10 'kill -USR1 $PID_NUMBER'
In the original console (ALT+F1) you will get 10 second updates regarding the data read and written.
The cloning process will provide an exact duplicate of the backup filesystem and MBR. After the cloning process has ended you will have a failed RAID system but a working grub.
Should you wish to boot the system and try to boot the default kernel you will get the following error:
Scanning logical volumes
No volume groups found
Activating logical volumes
Volume group "VolGroup00" not found
Trying to resume from /dev/VolGroup00/LogVol01
Unable to access resume device /dev/VolGroup00/LogVol01
Creating root device
Mounting root filesystem
mount: could not find the system "/dev/root/"
Setting up other filesystems
Setting up new root fs
setuproot: moving /dev/failed: No such file or directory
no fstab.sys, mounting internal defaults
setuproot: error mounting /proc: No such file or directory
setuproot: error mounting /sys: No such file or directory
Switching to new root and running init
unmounting old /dev
unmounting old /proc
unmounting old /sys
switchroot: mount failed: No such file or directory
Kernel Panic - not syncing: attempted to kill init
This is due to the fact that initrd is not working properly anymore and needs to be reconstructed properly.
As the recovery tools from RH based bootable CDs are missing LVM tools (they have disk checking, partitioning and RAID tools but not lvm tools) boot your system with the Slackware CD. After booting first create a working directory:
mkdir /mnt2
First check for md1.
cat /proc/mdstat
You will notice that md1 is visible and not syncing (as /dev/sda1 is marked as failed), and md0 is missing. This is due to the fact that md0 is not active. You will first have to sync md1:
mdadm --add /dev/md1 /dev/sda1
Wait for the RAID to sync, when it is done scan for logical groups:
lvscan
inactive          '/dev/VolGroup00/LogVol00' [$SIZE GB] inherit
Activate the volume group by:
lvchange -a y /dev/VolGroup00
Mount the LV in the working directory:
mount /dev/VolGroup00/LogVol00 /mnt2
As the md0 ext2 RAID1 containing /boot now has identical MBR, md0 is not active. You will have to start it manually the first time with just one drive, add the second and wait for sync.
To do that execute the following procedure:
mdadm --examine --scan
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=a0e5e5ad:9f73d669:934cc421:335f7291
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=ae10742c:d7fb9523:a1fe81e6:ab1b4be8
Create the device:
mknod /dev/md0 b 9 0
Assemble and activate the array with second drive:
mdadm --assemble /dev/md0 /dev/sdb0
mdadm --run /dev/md0
mdadm --add /dev/md0 /dev/sdb0
Add the first drive and wait for it to sync:
mdadm --add /dev/md0 /dev/sda0
To check the progress use:
cat /proc/mdstat
Now you will have to mount the /boot files in the correct /mnt2/boot directory in order to be able to partially reconstruct initrd so as to bypass the Volume group "VolGroup00" not found Trying to resume from /dev/VolGroup00/LogVol01 error.
mount /dev/md0 /mnt2/boot
Should you try to chroot and reconstruct initrd you will find that it is not possible as you will get a "error opening /sys/block" error (the needed /sys is not available as the systems has booted a different OS and since you need to be chrooted to reconstruct the initrd you will not be able to break the chroot down to the real /sys). The solution is to copy the Slackware /sys into /mnt/sys (this will produce plenty of errors but do not mind them, you will get the files you need).
cp -Rp /sys/* /mnt2/sys/
chroot /mnt2
cd /boot
Backup your initrd (replace the $VERSION with the correct version of the initrd/kernel, we will use as an example: initrd-2.6.18-374.3.1.el5.lve0.8.44PAE.img ):
mv initrd-$VERSION.img initrd.bak
So in our case we will have:
mv initrd-2.6.18-374.3.1.el5.lve0.8.44PAE.img initrd.bak
Partially rebuild the initrd:
mkinitrd -v --force-lvm-probe --force-raid-probe /boot/initrd-2.6.18-374.3.1.el5.lve0.8.44PAE.img 2.6.18-374.3.1.el5.lve0.8.44PAE
Reboot the system and see that /dev/sda is set as the first bootable device.
If you will try to boot the system with the new initrd you will now get the following error:
mount: could not find filesystem '/dev/root'
setuproot: moving /dev/failed: no such file or directory
setuproot: error mounting /proc: No such file or directory.
setuproot: error mounting /sys: No such file or directory.
switchroot: mount failed: No such file or directory.
This is due to the fact that the initrd has not been built by using the right files. You will not have to once again rebuild the initrd. To do that boot the system using RH CD in recovery mode. The system will now detect the Logical Volume Groups AND the the md0 RAID for /boot files.
Once booted the filesystem has been mounted in /mnt/sysimage. Mount md0:
mount /dev/md0 /mnt/sysimage/boot
chroot /mnt/sysimage
cd /boot
Delete the current initrd:
rm -rf /boot/initrd-2.6.18-374.3.1.el5.lve0.8.44PAE.img
The PROPER (to be read WORKING) way to rebuild the initrd is:
mkinitrd -v --force-lvm-probe --force-raid-probe --with=dm_mod --with=dm_crypt --with=aes-generic /boot/initrd-2.6.18-374.3.1.el5.lve0.8.44PAE.img 2.6.18-374.3.1.el5.lve0.8.44PAE
Reboot the system and you can now boot from your original server.

Posting Komentar