Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 575309

Summary: Kernel panic - not syncing: IO-APIC + timer doesn't work!
Product: Red Hat Enterprise Linux 5 Reporter: Alok Kataria <akataria>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.4.zCC: cward, dhecht, garrett, jsavanyo, jwilson, kzhang, mjenner, tao, zamsden
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-01-13 21:19:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Skip the timer_irq_works check when on VMware. none

Description Alok Kataria 2010-03-19 23:49:41 UTC
Created attachment 401370 [details]
Skip the timer_irq_works check when on VMware.

Description of problem:

We are hitting the IO-APIC + timer bug (a.k.a pester mingo on mainline) with the RHEL 5.4 32bit kernel at bootup. The problem is specific to virtualization since in some cases the hypervisor can be de-scheduled when the kernel is doing the timer_irq_works call, as a result the TSC and jiffies values can go out of sync. This is more prominent on VMware since we enable the LazyTimerEmulation mode for 32bit kernels on VMware platform for 5.4 kernel (as part of PR 463573).

Since this problem is VMware specific, I have added a condition to skip the timer_irq_works call when running on VMware platform.
Please note that this problem is not prominent on mainline kernel since the PIT is not used to drive timekeeping interrupts on those kernels as a result we don't have to enter the LazyTimerEmulation mode for correct timekeeping over there.

I am attaching a patch which skips the check when on VMware.

Comment 2 Zachary Amsden 2010-04-14 20:24:49 UTC
The patch is fine, it is definitely the easiest fix.

Comment 3 RHEL Program Management 2010-05-20 12:41:30 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Alok Kataria 2010-06-07 19:38:28 UTC
Is this patch committed for any update release yet ?

Please let me know if you are waiting on any info from me.

Thanks.

Comment 7 Jarod Wilson 2010-07-23 15:28:29 UTC
in kernel-2.6.18-208.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 9 Alok Kataria 2010-08-19 18:03:16 UTC
Thanks for including the fix.

Comment 10 Zhang Kexin 2010-12-06 09:52:27 UTC
Hi Alok,
Could you please provide some guidance how to reproduce the bug(setup notes or a known system/environment) or will you test this bug?
Thanks a lot!

Comment 11 Alok Kataria 2011-01-06 20:11:34 UTC
Hi Zhang, 

Apologies for the delay. 

We were hitting this bug while running boot halt tests, don't remember the frequency now, but was not 100%. 

In any case I got hold of the kernel-2.6.18-236.el5.src.rpm and verified that the fix is included. You can close this as verified now.

Thanks.

Comment 14 errata-xmlrpc 2011-01-13 21:19:49 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html