Bug 243312 - [RHEL5.1 IA64 Xen] kernel BUG at arch/ia64/kernel/irq_ia64.c:481!
[RHEL5.1 IA64 Xen] kernel BUG at arch/ia64/kernel/irq_ia64.c:481!
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
ia64 Linux
medium Severity medium
: ---
: ---
Assigned To: Aron Griffis
Martin Jenner
: Regression
Depends On: 241674
Blocks:
  Show dependency treegraph
 
Reported: 2007-06-08 11:02 EDT by Jarod Wilson
Modified: 2007-11-30 17:07 EST (History)
5 users (show)

See Also:
Fixed In Version: kernel-xen-2.6.18-32.el5
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-06-29 11:20:29 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Console boot log from failed xen 3.1-based kernel-xen boot (8.22 KB, text/plain)
2007-06-08 11:02 EDT, Jarod Wilson
no flags Details

  None (edit)
Description Jarod Wilson 2007-06-08 11:02:51 EDT
New bug for the second crasher bug uncovered with the xen 3.1.0 rebase with the
getcpu patch removed from the build. See attached for full console boot log

+++ This bug was initially created as a clone of Bug #241674 +++

Description of problem:
Recent RHEL5 xen kernels fail to boot on at least some ia64 hardware that was
previously functional. An hp zx2000 that works with the 5.0 GA kernel encounters
"Unable to handle kernel paging request at virtual address 006eb92000000000"
followed by "Unable to handle kernel NULL pointer dereference (address
0000000000000000)" on boot with the -20 kernel. See attachment for full console
dump.

Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-20.el5 (ia64)

-- Additional comment from jwilson@redhat.com on 2007-05-29 10:31 EST --
Created an attachment (id=155593)
Console log from failed 2.6.18-20.el5xen ia64 boot


-- Additional comment from pm-rhel@redhat.com on 2007-05-29 10:44 EST --
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

-- Additional comment from jwilson@redhat.com on 2007-06-06 16:38 EST --
kernel-xen-2.6.18-17.el5.ia64 boots fine, kernel-xen-2.6.18-18.el5.ia64 blows up
similarly to -20 and -21. Next up, to look at the relevant changes between the
two...

-- Additional comment from jwilson@redhat.com on 2007-06-06 18:01 EST --
Best guess at possible culprits thus far:

[serial] panic in check_modem_status on 8250 (Norm Murray ) [238394]
[misc] getcpu system call (luyu ) [233046]
[mm] NULL current->mm dereference in grab_swap_token causes oops (Jerome
Marchand ) [231639]

I've got a test kernel based on -21 minus those three patches building now, will
see if my guesses hold any water in the morning...

-- Additional comment from pm-rhel@redhat.com on 2007-06-06 18:08 EST --
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

-- Additional comment from jwilson@redhat.com on 2007-06-07 10:17 EST --
Yep, its definitely one of those three patches. Now on to figuring out exactly
which one...

-- Additional comment from jwilson@redhat.com on 2007-06-07 12:00 EST --
The culprit appears to be "[misc] getcpu system call (luyu ) [233046]" (bug
233046). Which of course is a rather large patch, so it'll take some effort to
figure out exactly what the cause is within that patch...

-- Additional comment from jwilson@redhat.com on 2007-06-07 17:41 EST --
Its definitely the getcpu syscall patch, but nothing obvious jumps out as being
the cause of the boot failures. Best guess is that the greatly increased size of
the syscall table may cause a page table overlap or some such thing that xen
doesn't handle cleanly. Punting back to Luming Yu who submitted the patch in the
first place... Any ideas?

-- Additional comment from jwilson@redhat.com on 2007-06-08 01:08 EST --
Fun. The xen 3.1.0 rebased bits from Gerd fail to boot even with that patch
removed, in a fairly similar 
looking fashion (but definitely different -- there's an actual line that says
"kernel BUG at arch/ia64/kernel/
irq_ia64.c:481!". The following upstream changeset appears potentially relevant
to this one:

http://www.mail-archive.com/xen-ia64-devel@lists.xensource.com/msg05946.html

Doesn't apply cleanly at the moment, seems parts of it have already been
cherry-picked...

Should probably file a new bug for this new issue, shouldn't I?... (will do in
the morn)
Comment 1 Jarod Wilson 2007-06-08 11:02:51 EDT
Created attachment 156579 [details]
Console boot log from failed xen 3.1-based kernel-xen boot
Comment 2 RHEL Product and Program Management 2007-06-08 11:04:27 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 3 RHEL Product and Program Management 2007-06-08 11:07:09 EDT
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.
Comment 4 Jarod Wilson 2007-06-11 17:42:50 EDT
Kei, do you by chance have any ideas here? Is there a specific patch in
Fujitsu's patch series that might address this?
Comment 5 Keiichiro Tokunaga 2007-06-12 11:33:34 EDT
(In reply to comment #4)
> Kei, do you by chance have any ideas here? Is there a specific patch in
> Fujitsu's patch series that might address this?

(The same comments put in bz241674.)

There is only one specific patch in my patch set posted to rh-kernel, which is 
BZ242989 that changes interface versions of dom0 so that it can get along with 
xen-3.1 bits.

Compared to Gerd's original patch set, my patch set has some additional csets.
They are cset14347, 13429, and 13454.

I have not tried 2.6.18.el5.kraxel.4xen on PRIMEQUEST yet.  I am willing to 
try it, but I am not able to use the box for now because it's in the process 
of firmware upgrading.  I will update here with the results once it gets done.
Comment 6 Jarod Wilson 2007-06-29 11:20:29 EDT
This was fixed in recent kernels, no longer seeing this issue with 2.6.18-29.el5
or so and later.

Note You need to log in before you can comment on or make changes to this bug.