Bug 221554

Summary: Spurious BUG_ON causes kernel to not boot
Product: Red Hat Enterprise Linux 5 Reporter: Zachary Amsden <zach>
Component: kernelAssignee: Jarod Wilson <jarod>
Status: CLOSED ERRATA QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.1CC: clalance, dzickus, jsethi, poelstra, srihan, xen-maint
Target Milestone: ---Keywords: OtherQA
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: RHBA-2008-0314 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-21 14:40:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1300182    

Description Zachary Amsden 2007-01-05 03:23:13 UTC
Description of problem:

2.6.18 kernels have a known bug where a spurious BUG_ON in the softirq code can
trigger with sufficiently large initrd ramdisks.

This bug is fixed in 2.6.18.6 by this change:

commit da9aa2f4fdc179d71961b6f1562a2375998ce9d5
Author: Zachary Amsden <zach>
Date:   Wed Dec 6 20:39:39 2006 -0800

    [PATCH] softirq: remove BUG_ONs which can incorrectly trigger

If RHEL5 kernel does not have this change, it will be affected by the bug.

Version-Release number of selected component (if applicable):


How reproducible:

When running in a virtual machine (VMware, Xen, VirtualPC, qemu are all
affected), with a large enough ramdisk, 100% of the time.  On native hardware,
a very large compressed ramdisk is required, and the bug is not as reproducible
(due to faster decompression time, the interrupts required to trigger the bug do
not always happen in the window where the spurious assertion can fire).

Steps to Reproduce:
1.
2.
3.
  
Actual results:

Kernel fails to boot.

Expected results:

Kernel should boot.

Additional info:

Comment 3 Jarod Wilson 2007-08-16 20:47:34 UTC
Hrm... Thus far, I'm unable to reproduce in a kvm i386 guest with a 52MB initrd
and my copy of vmware 5.5 seems to hate 2.6.22 kernels. (This is on a core 2
quad system w/2GB of RAM).

Does this trigger more readily on slower machines with less memory?

Comment 4 Jarod Wilson 2007-08-16 20:58:47 UTC
From Zach:
This happens only in 2.6.16.5 and earlier kernels.  If the base kernel has
changed, the bug might not exist in RHEL anymore.

(changing product back so Zach can edit the bug)

Comment 5 Jarod Wilson 2007-08-16 21:00:26 UTC
> This happens only in 2.6.16.5 and earlier kernels.

Zach, did you mean 2.6.18.5 and earlier there? Based on the patch, that would be
my guess. We don't have this change in the RHEL5 kernel yet, so I'd like to
continue to try to reproduce this, if possible.

Comment 6 Zachary Amsden 2007-08-16 21:06:17 UTC
Yes, I meant 2.6.18.5 and ealier.

Attempting to reproduce it is likely to be a waste of time.  It is hardware
dependent, and fires only if you have a certain combination of drivers that
conspire to schedule timeouts and fire softirqs at the right time.

I don't know the exact kernel config which will trigger it, as this is a rather
old bug.  I believe we originally hit it on OpenSUSE, where it fired 100% of the
time.  Perhaps the RHEL configuration is different enough that it doesn't.

So I can't be sure if the bug is present in the RHEL binary, but if it is in the
sources, the fix is trivial.

Comment 7 Jarod Wilson 2007-09-20 22:19:57 UTC
I gave it a quick shot trying to reproduce the bug on vmware 5.5 on top of
rhel5, without any success. Despite that, I'll post the patch for internal
review for possible rhel5, because we don't have this change, and it seems
plausible someone else might be able to trigger it.

Comment 8 RHEL Program Management 2007-11-20 05:06:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Don Zickus 2007-11-29 17:05:51 UTC
in 2.6.18-58.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 12 John Poelstra 2008-03-21 03:59:18 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot1--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you

Comment 13 John Poelstra 2008-04-02 21:40:04 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot3--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 14 John Poelstra 2008-04-09 22:45:45 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot4--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 15 John Poelstra 2008-04-23 17:41:06 UTC
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot6--available now on partners.redhat.com.  

We are nearing GA for 5.2 so please test and confirm that your issue is fixed ASAP.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you


Comment 16 Zachary Amsden 2008-04-28 21:43:32 UTC
I have no idea how to access partners.redhat.com and have no idea who my Partner
Manager is, nor does anyone else here, so I am giving up on this bug.

I assume the parties involved here did the right thing, removing a broken line
of code is not very hard to do, so I am changing the status to VERIFIED.

Comment 17 Jarod Wilson 2008-04-28 22:20:40 UTC
Ugh. Pardon all the automated spam. I don't have a clue how to access
partners.redhat.com myself either, but the relevant kernel bits are available
from people.redhat.com, per comment #10, should you wish to poke at 'em, but
yeah, removing a line of code isn't too complex, pretty sure we got it right... :)

Comment 19 errata-xmlrpc 2008-05-21 14:40:32 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html