Bug 499553 - Cannot generate proper stacktrace on xen-ia64
Summary: Cannot generate proper stacktrace on xen-ia64
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.4
Hardware: ia64
OS: Linux
medium
medium
Target Milestone: rc
: 5.6
Assignee: Andrew Jones
QA Contact: Boris Ranto
URL:
Whiteboard:
Depends On:
Blocks: 514489 557597
TreeView+ depends on / blocked
 
Reported: 2009-05-07 07:01 UTC by Daniel Kwon
Modified: 2018-11-14 18:24 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 20:48:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
patches to make XEN to show callstack properly (1.04 KB, application/octet-stream)
2009-05-07 07:01 UTC, Daniel Kwon
no flags Details
Pathc 1/3, the kernel side of the stack unwinding (736 bytes, patch)
2009-05-07 07:13 UTC, Chris Lalancette
no flags Details | Diff
Patch 2/3, hypervisor patch to dump execution state (530 bytes, patch)
2009-05-07 07:13 UTC, Chris Lalancette
no flags Details | Diff
Patch 3/3, hypervisor patch to add unwind_info (394 bytes, patch)
2009-05-07 07:14 UTC, Chris Lalancette
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Daniel Kwon 2009-05-07 07:01:18 UTC
Created attachment 342777 [details]
patches to make XEN to show callstack properly

Description of problem:
   Xen-ia64 cannot generate Backtraces properly at some points.
   if xen-ia64 hits BUG() and panic(), we cannot get the proper Calltraces.
   There are some causes.
   1. At some points of procedure entry, unwind info cannot be generated properly.
   2. BUG() doesn't show the stacktrace.

Version-Release number of selected component (if applicable):
   Red Hat Enterprise Linux Version Number: RHEL5
   Release Number: 5.3ga
   Architecture: ia64
   Kernel Version: kernel-xen-2.6.18-128
   Related Package Version:
   Related Middleware / Application:

How reproducible:
  Always

Steps to Reproduce:
   The following steps don't hit BUG(),
   but it is easy to show the same stack trace as hitting BUG().

   1. Connect from another machine through serial cable.
   2. from the minicom, type 'CTRL+a' three times to change to XEN mode
   3. type 'd'
  
Actual results:
   (XEN) FIXME: implement ia64 dump_execution_state()

Expected results:
  Show the proper Calltraces
  Something like following:

(XEN) *** Dumping CPU0 host state: ***
(XEN) FIXME: implement ia64 dump_execution_state()
(XEN) 
(XEN) Call Trace:
(XEN)  [<f0000000040b8610>] show_stack+0x80/0xa0
(XEN)                                 sp=f00000000411b970 bsp=f000000004115580
(XEN)  [<f00000000402cb90>] __dump_execstate+0x30/0x100
(XEN)                                 sp=f00000000411bb40 bsp=f000000004115568
(XEN)  [<f00000000402ccc0>] dump_registers+0x60/0x180
(XEN)                                 sp=f00000000411bb40 bsp=f000000004115540
(XEN)  [<f00000000402cfb0>] handle_keypress+0x1d0/0x210
(XEN)                                 sp=f00000000411bb40 bsp=f000000004115508
(XEN)  [<f000000004056840>] serial_rx+0x1a0/0x1c0
(XEN)                                 sp=f00000000411bb40 bsp=f0000000041154d8
(XEN)  [<f000000004059140>] serial_rx_interrupt+0x1a0/0x2b0
(XEN)                                 sp=f00000000411bb40 bsp=f0000000041154a0
(XEN)  [<f000000004057bc0>] ns16550_interrupt+0xa0/0xf0
(XEN)                                 sp=f00000000411bb50 bsp=f000000004115460
(XEN)  [<f000000004072ce0>] __do_IRQ+0x360/0x480
(XEN)                                 sp=f00000000411bb50 bsp=f000000004115400
(XEN)  [<f0000000040b1c30>] ia64_handle_irq+0xf0/0x180
(XEN)                                 sp=f00000000411bb50 bsp=f0000000041153b0
(XEN)  [<f0000000040b1400>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f00000000411bb50 bsp=f0000000041153b0
(XEN)  [<f00000000405e170>] startup_cpu_idle_loop+0x3c0/0x3e0
(XEN)                                 sp=f00000000411bd50 bsp=f000000004115320
(XEN)  [<f00000000408fa30>] start_kernel+0x1520/0x17d0
(XEN)                                 sp=f00000000411bdf0 bsp=f000000004115260
(XEN)  [<f000000004019d20>] _start+0x340/0x360
(XEN)                                 sp=f00000000411be00 bsp=f0000000041151c0


Additional info:
    Customer provides a couple of patches which solve this problem:

   770: [IA64] add unwind info to xen_event_callback
   http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/d5c2e97b87ac
   18824:[IA64] make ia64 dump_execution_state() print stack trace for debugging.
   http://xenbits.xensource.com/xen-unstable.hg?rev/958942c44332
   Fix unwind info of fast_hypercall
   http://lists.xensource.com/archives/html/xen-ia64-devel/2009-04/msg00003.html

Comment 1 Chris Lalancette 2009-05-07 07:08:34 UTC
For RHEL-5 Xen bugs, please make sure the "Product" field is set to "Red Hat Enterprise Linux 5", otherwise there is a chance we will not see it.

Also, please do not attach tar files for patches, even if there are more than one.  It makes it difficult to look at and review.

I've fixed both of these issues on this bug.

Chris Lalancette

Comment 2 Chris Lalancette 2009-05-07 07:13:08 UTC
Created attachment 342778 [details]
Pathc 1/3, the kernel side of the stack unwinding

Comment 3 Chris Lalancette 2009-05-07 07:13:53 UTC
Created attachment 342779 [details]
Patch 2/3, hypervisor patch to dump execution state

Comment 4 Chris Lalancette 2009-05-07 07:14:30 UTC
Created attachment 342780 [details]
Patch 3/3, hypervisor patch to add unwind_info

Comment 5 Chris Lalancette 2009-08-25 09:58:57 UTC
I've uploaded a test kernel that should have a fix for this problem here:

http://people.redhat.com/clalance/virttest/

Can the reporters who are having problems please download and try out this test kernel?

Thanks,
Chris Lalancette

Comment 6 Issue Tracker 2009-08-26 01:26:58 UTC
Event posted on 08-26-2009 10:26am JST by moshiro

Dear Chris,

Could you please provide source package to Fujitsu as well? Fujitsu would
like to review the code as well.

Best Regards,
Moritoshi


This event sent from IssueTracker by moshiro 
 issue 283498

Comment 7 Chris Lalancette 2009-08-26 07:20:43 UTC
(In reply to comment #6)
> Event posted on 08-26-2009 10:26am JST by moshiro
> 
> Dear Chris,
> 
> Could you please provide source package to Fujitsu as well? Fujitsu would
> like to review the code as well.

I've uploaded the .src.rpm to the same location.  The code should be the same as what was posted in this BZ.

Chris Lalancette

Comment 14 Chris Lalancette 2010-07-16 12:48:28 UTC
Yeah, this one just fell off of my radar.  The patches in this BZ are the latest, and should work fine.  Drew, do you mind posting them?

Thanks,
Chris Lalancette

Comment 15 Andrew Jones 2010-07-16 13:02:02 UTC
(In reply to comment #14)
> Yeah, this one just fell off of my radar.  The patches in this BZ are the
> latest, and should work fine.  Drew, do you mind posting them?
> 

No problem. I'll get the posted soon.

Drew

Comment 16 Larry Troan 2010-08-03 22:09:25 UTC
Looks like this is a go for 5.6. Setting BLOCKER=? until commented by devel.

Otherwise will miss 5.6.

Comment 17 RHEL Program Management 2010-08-04 12:10:06 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 18 Larry Troan 2010-08-06 11:01:29 UTC
Correction, setting exception awaiting ACK by QA

Comment 19 Andrew Jones 2010-08-16 14:16:35 UTC
I can go ahead and post this patchset, since it doesn't look risky and might make some improvement. However my testing didn't show a huge improvement. When dumping registers before this patch with the Xen serial console I still got backtraces and registers, but only when there was something to show (best done during a boot). After this patch I get the backtrace every time (due to the 2nd patch in the series), but otherwise what seems to be the same behavior as before (perhaps it now supports more, but I didn't try to test it further). Therefore, my question to Fujitsu is whether or not they truly find this necessary and useful. If not, I would drop it to avoid the code change.

Comment 20 Andrew Jones 2010-08-16 14:31:30 UTC
I went ahead and posted the patch set. It matches upstream and looks ok. I guess there's some improvement, even if small, so should be worth the posting/reviewing time.

Drew

Comment 22 Jarod Wilson 2010-09-03 19:05:11 UTC
in kernel-2.6.18-215.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 23 Jarod Wilson 2010-09-03 20:46:42 UTC
Moving back to POST, the xen half of this patchset was missed in 215.

Comment 25 Jarod Wilson 2010-09-10 21:38:05 UTC
in kernel-2.6.18-219.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 27 Boris Ranto 2010-10-14 08:46:55 UTC
When I tried to verify I've found out that FIXME: implement ia64 dump_execution_state() line is still there although call trace is generated. I see that it is same in the patch but is this right? Isn't it already implemented, now?

Comment 28 Andrew Jones 2010-10-14 11:44:05 UTC
We still don't get a register dump, since all we added was the calltrace. So I think the fixme is still technically correct, but it probably could have be changed to a code comment instead of a print message.

Comment 29 Boris Ranto 2010-10-14 12:06:21 UTC
OK, thanks for the answer. If anything is still missing there than I guess it is ok to print the message. Moving to verified.

In 2.6.18-225, Call Trace is present:
[root@hp-sapphire-01 ~]# uname -a
Linux hp-sapphire-01.rhts.eng.bos.redhat.com 2.6.18-225.el5xen #1 SMP Mon Sep 27 10:57:15 EDT 2010 ia64 ia64 ia64 GNU/Linux
[root@hp-sapphire-01 ~]# (XEN) *** Serial input -> Xen (type 'CTRL-a' three times to switch input to DOM0).
(XEN) 'd' pressed -> dumping registers
(XEN) 
(XEN) *** Dumping CPU0 host state: ***
(XEN) FIXME: implement ia64 dump_execution_state()
(XEN) 
(XEN) Call Trace:
(XEN)  [<f0000000040c0530>] show_stack+0x80/0xa0
(XEN)                                 sp=f000000004127970 bsp=f000000004121580
(XEN)  [<f00000000402cab0>] __dump_execstate+0x30/0x100
(XEN)                                 sp=f000000004127b40 bsp=f000000004121568
(XEN)  [<f00000000402cbe0>] dump_registers+0x60/0x180
(XEN)                                 sp=f000000004127b40 bsp=f000000004121540
(XEN)  [<f00000000402ced0>] handle_keypress+0x1d0/0x210
(XEN)                                 sp=f000000004127b40 bsp=f000000004121508
(XEN)  [<f00000000405ad10>] serial_rx+0x1a0/0x1c0
(XEN)                                 sp=f000000004127b40 bsp=f0000000041214d8
(XEN)  [<f00000000405d330>] serial_rx_interrupt+0x1a0/0x2b0
(XEN)                                 sp=f000000004127b40 bsp=f0000000041214a0
(XEN)  [<f00000000405bf20>] ns16550_interrupt+0xa0/0xf0
(XEN)                                 sp=f000000004127b50 bsp=f000000004121460
(XEN)  [<f000000004078a40>] __do_IRQ+0x360/0x480
(XEN)                                 sp=f000000004127b50 bsp=f000000004121400
(XEN)  [<f0000000040b9b50>] ia64_handle_irq+0xf0/0x180
(XEN)                                 sp=f000000004127b50 bsp=f0000000041213b0
(XEN)  [<f0000000040b9320>] ia64_leave_kernel+0x0/0x300
(XEN)                                 sp=f000000004127b50 bsp=f0000000041213b0
(XEN)  [<f000000004062590>] startup_cpu_idle_loop+0x300/0x320
(XEN)                                 sp=f000000004127d50 bsp=f000000004121330
(XEN)  [<f000000004095c80>] start_kernel+0x1590/0x1840
(XEN)                                 sp=f000000004127df0 bsp=f000000004121270
(XEN)  [<f000000004019d20>] _start+0x340/0x360
(XEN)                                 sp=f000000004127e00 bsp=f0000000041211d0
(XEN) *** Dumping CPU0 guest state: ***
(XEN) No guest context (CPU is idle).

Comment 33 errata-xmlrpc 2011-01-13 20:48:06 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.