Bug 156183 - Kernel 2.4.21-15.ELsmp panics at boot
Summary: Kernel 2.4.21-15.ELsmp panics at boot
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jim Paradis
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-04-28 09:21 UTC by Colin Leroy
Modified: 2013-08-06 01:14 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-01-03 22:28:11 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Colin Leroy 2005-04-28 09:21:57 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2

Description of problem:
RAMDISK: Compressed image found at block 0
VFS: Mounted root (ext2 filesystem).
Red Hat nash verpage_fault: wrong gs ffffffff805e5d80 expected ffffffff805e5d00
sUion 3.5.13 starnable to handle kernel paging requestting
Loading sc< a6>t SCviSIr tsuaulb saydsdterem ssdr 0iv00e0r0 R07efvbisffifon3a:8 14
00
rinting rip:
00000000040104ae
PML4 4e0b067 PGD 4e08067 P
MD 4e07067 PTE 80000000049ac065
Oops: 0007
CPU 1
Pid: 0, comm: swapper Not tainted
RIP: 0033:[<000000000040104a>]
RSP: 002b:0000007fbfff3930  EFLAGS: 00010207
RAX: 0000000000000011 RBX: 00000000fffffffd RCX: 00000000004141a0
RDX: 0000000000000000 RSI: 00000000004543db RDI: 0000007fbfff3980
RBP: 0000007fbfff3eb0 R08: fefefefefefefeff R09: ffffff0000000000
R10: 0000000000000069 R11: 0000000000000246 R12: 0000007fbfffc218
R13: 0000000000000002 R14: 0000000000000032 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffffffff805e5d00(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000007fbfff3a84 CR3: 0000000004d71000 CR4: 00000000000006e0

Call Trace:
Process swapper (pid: 0, stackpage=10037f3b000)

Kernel panic: Fatal exception
In idle task - not syncing

Version-Release number of selected component (if applicable):
kernel-2.4.21-15.ELsmp

How reproducible:
Always

Steps to Reproduce:
Boot 2.4.21-15.ELsmp.

Actual Results:  Every boot of the 2.4.21-15.ELsmp crashes with a "page fault: wrong gs" error.


Additional info:

Booting the 2.4.21-15.EL works fine, and it looks like SMP is compiled in this one too (uname -a says SMP, and /proc/cpuinfo shows 4 cpus). We have been able to workaround the problem by getting the kernel sources, use the .config from 2.4.21-15.EL (_not_ smp) and changing EXTRAVERSION from custom to smp (we need this kernel version because of closed-source voltaire's Infiniband drivers that support only a very limited subset of kernels).

This happens on Dells Poweredge SC1425 which are bi-Xeons (64bit).

Comment 2 Steve Churchill 2005-06-09 19:45:17 UTC
I'm getting the same thing on a Dell 1850 using 2.4.21-20.ELsmp.

"page fault: wrong gs" - right after the scsi driver loads.

I also see SMP capabilities in the non smp kernel 2.4.21-20.EL.

-steve

Comment 3 Jim Paradis 2006-01-03 22:28:11 UTC
In your case, the 2.4.21-20.EL kernel is the correct one to use.

For RHEL3, there are three kernels distributed for 64-bit x86, e.g.:

    kernel-2.4.21-xx.EL.x86_64.rpm - for uniprocessor AMD64 systems
    kernel-smp-2.4.21-xx.EL.x86_64.rpm - for SMP AMD64 systems
    kernel-2.4.21-xx.EL.ia32e.rpm - for ALL 64-bit Intel ia32e systems

If you're running an Intel system, then *only* the single ia32e kernel should be
used: it covers both UP and SMP systems.  If you're running an AMD system, you
should be running the SMP or non-SMP version as appropriate.  The installer
automatically selects the proper kernel to install for your system type.

Since the ia32e kernel works properly under SMP on your system, I do not
consider this a kernel bug.  My guess is that the reason you were trying the
"smp" kernel is that the third-party Infiniband drivers look for particular
kernel version strings, and are unaware of the fact that kernels not labeled
"smp" are valid for ia32e systems.

I'm closing this as NOTABUG, since I believe the issue is with the driver, not
the kernel.  This is something to take up with voltaire.




Note You need to log in before you can comment on or make changes to this bug.