Description of problem: 2.6.18 kernels have a known bug where a spurious BUG_ON in the softirq code can trigger with sufficiently large initrd ramdisks. This bug is fixed in 2.6.18.6 by this change: commit da9aa2f4fdc179d71961b6f1562a2375998ce9d5 Author: Zachary Amsden <zach> Date: Wed Dec 6 20:39:39 2006 -0800 [PATCH] softirq: remove BUG_ONs which can incorrectly trigger If RHEL5 kernel does not have this change, it will be affected by the bug. Version-Release number of selected component (if applicable): How reproducible: When running in a virtual machine (VMware, Xen, VirtualPC, qemu are all affected), with a large enough ramdisk, 100% of the time. On native hardware, a very large compressed ramdisk is required, and the bug is not as reproducible (due to faster decompression time, the interrupts required to trigger the bug do not always happen in the window where the spurious assertion can fire). Steps to Reproduce: 1. 2. 3. Actual results: Kernel fails to boot. Expected results: Kernel should boot. Additional info:
Hrm... Thus far, I'm unable to reproduce in a kvm i386 guest with a 52MB initrd and my copy of vmware 5.5 seems to hate 2.6.22 kernels. (This is on a core 2 quad system w/2GB of RAM). Does this trigger more readily on slower machines with less memory?
From Zach: This happens only in 2.6.16.5 and earlier kernels. If the base kernel has changed, the bug might not exist in RHEL anymore. (changing product back so Zach can edit the bug)
> This happens only in 2.6.16.5 and earlier kernels. Zach, did you mean 2.6.18.5 and earlier there? Based on the patch, that would be my guess. We don't have this change in the RHEL5 kernel yet, so I'd like to continue to try to reproduce this, if possible.
Yes, I meant 2.6.18.5 and ealier. Attempting to reproduce it is likely to be a waste of time. It is hardware dependent, and fires only if you have a certain combination of drivers that conspire to schedule timeouts and fire softirqs at the right time. I don't know the exact kernel config which will trigger it, as this is a rather old bug. I believe we originally hit it on OpenSUSE, where it fired 100% of the time. Perhaps the RHEL configuration is different enough that it doesn't. So I can't be sure if the bug is present in the RHEL binary, but if it is in the sources, the fix is trivial.
I gave it a quick shot trying to reproduce the bug on vmware 5.5 on top of rhel5, without any success. Despite that, I'll post the patch for internal review for possible rhel5, because we don't have this change, and it seems plausible someone else might be able to trigger it.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in 2.6.18-58.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot1--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot3--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot4--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot6--available now on partners.redhat.com. We are nearing GA for 5.2 so please test and confirm that your issue is fixed ASAP. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
I have no idea how to access partners.redhat.com and have no idea who my Partner Manager is, nor does anyone else here, so I am giving up on this bug. I assume the parties involved here did the right thing, removing a broken line of code is not very hard to do, so I am changing the status to VERIFIED.
Ugh. Pardon all the automated spam. I don't have a clue how to access partners.redhat.com myself either, but the relevant kernel bits are available from people.redhat.com, per comment #10, should you wish to poke at 'em, but yeah, removing a line of code isn't too complex, pretty sure we got it right... :)
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html