Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Created attachment 883446[details]
first 64 bytes of disk after grub root setup procedure run
Description of problem:
I have a Dell R720xd system with an LSI 9207-8i HBA running IT (non-RAID) firmware. The first two SAS disks in the chassis are recognized by the RHEL6 server as /dev/sda and /dev/sdv. I kickstart the system with mirrored MD /boot comprising of /dev/sda1 and /dev/sdv1 (500 MB), and mirrored MD root comprising /dev/sda2 and /dev/sdv2 (rest of disk). During kickstart, the grub bootloader is installed. According to /var/log/anaconda.program.log, grub is installed like this:
install --stage2=/boot/grub/stage2 /grub/stage1 d (hd0) /grub/stage2 p (hd0,0)/grub/grub.conf
After kickstart, I can boot the system from either disk.
If a disk fails, and I replace the disk, I re-copy the partition table from the remaining good disk:
eg. sfdisk -d /dev/sda | sfdisk --force /dev/sdv
I then need to re-install grub on the new disk so that it's bootable. If I follow the common procedure of:
grub> root (hd0,0)
grub> setup (hd1)
... the grub boot loader is installed to the right disk. However, when I try to boot using that disk, the system hangs. There are no error messages.
The difference in grub bootloader installation between Anaconda version, and my secondary method (root/setup) is that Anaconda points stage1 directly at stage2 in /boot. However, "setup (hd1)" embeds stage1.5 loader. Not being an expert in grub, it's not obvious why "setup" can't simply choose to install grub without embedding stage1.5 since stage1.5 is not needed (/boot is in proper location for it to be accessed directly).
If I dd the original MBR that was installed my Anaconda to the replacement disk, then the system boots fine.
NOTE: This server will be going into production within the next few weeks, after which I won't be able to test changes. I know this is unlikely to be dealt with before then. However, I'm reporting the bug now in the hopes that the problem will either be resolved in a future RHEL6 update, or it will help someone else to avoid the testing I've been through. I'm quite positive it's a grub bug.
Steps to Reproduce:
1. kickstart system - I can boot from either disk
2. manually fail MD RAID1
3. install replacement disk
4. copy partition table from existing disk
5. allow MD to rebuild
6. re-install grub using "root/setup" procedure.
7. attempt to boot from the disk - the system locks up
Additional info:
I suspect this is related: http://bugs.centos.org/print_bug_page.php?bug_id=1940
I'm not sure if this is related: https://www.illumos.org/issues/4659
Attached is a copy of the first 64 bytes of the disk after root/setup procedure, and stage1.5 installed. (dd if=/dev/sdX of=/tmp/sdX count=64)
Hi Jan.
I'm trying to find time to test this. I finally had a few minutes, but unfortunately, the permission on the files is not correct so I get "forbidden" when trying to download.
Jason.
Hi Jan,
I spent a full work day debugging this issue. First, before trying the patched grub, I wanted to verify that I could replicate the issue without the original hardware. I kickstarted a virtual system with VirtualBox, and shared md for /boot and /. I tested various failover strategies, and each time I replaced the disk, and re-installed grub, I was able to boot successfully. However, this wasn't using the LSI 9207-8i card that was installed in the original server, and since the Illumos bug report had also made reference to the problem occuring with disks connected to an LSI HBA, I got some physical server hardware close enough to the model where I had originally discovered the problem, and I installed an LSI 9207-8i there. I once again thoroughly tested failover once again. Fortunately, I was not able to make the problem occur. This means that somewhere along the line, the problem has been corrected. Sorry for my delay in testing.
Comment 9David Kaspar // Dee'Kej
2016-05-12 15:56:33 UTC
Hello,
I just got to this BZ to investigate it more. But before I actually do it, could you please tell me if this bug is still problem for you?
Thank you!
David