Bug 500150

Summary: 2.6.18-145.el5.x86_64 kernel panic with Xorg
Product: Red Hat Enterprise Linux 5 Reporter: Jay Turner <jturner>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: urgent Docs Contact:
Priority: high    
Version: 5.3CC: dzickus, srevivo
Target Milestone: betaKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.18-146.el5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-05-12 11:11:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jay Turner 2009-05-11 12:43:46 UTC
Description of problem:
Sadly I can't get the eventual kernel panic to capture via kdump.  I'm seeing problems with 2.6.18-145.el5.x86_64.  Dropping back to -144.el5 everything is fine.  Looking at the changelog there's nothing obvious to explain the problem.

Version-Release number of selected component (if applicable):
kernel-2.6.18-145.el5

How reproducible:
Always

Steps to Reproduce:
1. Boot with -145.el5 kernel
2. Pull up youtube.com and attempt to view a video
3. Also, I've been able to reproduce by hitting <alt>-t (which should simply expand the Tools menu item)
  
Actual results:
Hard lock and eventually I get what appears to be a kernel panic (lights flashing on the keyboard.)  For some reason it's not getting caught by kdump (which might be a wholly different issue.)

Comment 1 Jay Turner 2009-05-11 12:56:24 UTC
I'm able to reproduce this with firefox-3.0.10-1.el5.

Comment 2 Jay Turner 2009-05-11 14:09:10 UTC
I'm throwing this information in here as well seeing as it might be related.  On another box (x86_64 with Intel Corporation 82G965 Integrated Graphics Controller (rev 02) running xorg-x11-drv-i810-1.6.5-11.fc7.x86_64 and xorg-x11-server-Xorg-1.1.1-48.52.el5.x86_64) I'm getting a kernel panic on boot when the system tries to exec the xserver for gdm.  I'll throw the resulting vmcore on a server for retrieval.  Also editing the summary of the bug.

Comment 3 Jay Turner 2009-05-11 14:10:23 UTC
The core can be found at http://cobalt.devel.redhat.com/vmcore/Xorg_vmcore

Comment 4 Jay Turner 2009-05-11 14:11:07 UTC
Also note that 2.6.18-144.el5.x86_64 is functioning quite well on the box described in comment 2.

Comment 5 Jay Turner 2009-05-11 21:07:00 UTC
Meant to post this earlier.  Backtrace:

PID: 3749   TASK: ffff81006134d0c0  CPU: 0   COMMAND: "Xorg"
 #0 [ffff81005d6f9b00] crash_kexec at ffffffff800ac052
 #1 [ffff81005d6f9bc0] __die at ffffffff8006617f
 #2 [ffff81005d6f9c00] do_page_fault at ffffffff80067dfa
 #3 [ffff81005d6f9cf0] error_exit at ffffffff8005ede9
    [exception RIP: __handle_mm_fault+784]
    RIP: ffffffff800091d6  RSP: ffff81005d6f9da8  RFLAGS: 00013202
    RAX: 0000000000000000  RBX: ffff81005dc92758  RCX: ffff810009000058
    RDX: ffff81005d6f9df8  RSI: 000000000005cf88  RDI: 00000000000001b0
    RBP: ffff810009b975c0   R8: 0000000000001f80   R9: 0000000000200000
    R10: 0000000000000000  R11: 000000364f87a3d0  R12: 00003ffffffff000
    R13: ffff81005dcc31e8  R14: ffff81007a4aa528  R15: 0000000000000000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #4 [ffff81005d6f9da0] __handle_mm_fault at ffffffff800090c0
 #5 [ffff81005d6f9e60] do_page_fault at ffffffff80067b78
 #6 [ffff81005d6f9f50] error_exit at ffffffff8005ede9
    RIP: 000000364f87aa5b  RSP: 00007fffd1996fd8  RFLAGS: 00013206
    RAX: 00002abadc06b000  RBX: 0000000000000000  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: 0000000000000000  RDI: 00002abadc06b000
    RBP: 0000000000000002   R8: 0000000000001f80   R9: 0000000000200000
    R10: 0000000000000000  R11: 000000364f87a3d0  R12: 0000000014174b50
    R13: 00000000141747d0  R14: 0000000000000000  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0033  SS: 002b

Comment 6 Jay Turner 2009-05-12 11:10:43 UTC
I don't know if -145.el5 experienced some problems in build or if some of the fixes included in -146.el5 resolved the issue (there was some thought that bug 476148 might be related) but -146.el5 is working perfectly well on both of my machines so closing out this issue.

Comment 7 Don Zickus 2009-05-12 14:35:55 UTC
The thinking is the patch for bz 476148 caused the problem in 145.el5.  A followup patch for the same bz was meant to fix a panic in kvm.  That fix has most likely fixed this issue as well according to the poster of the patch.