Bug 249021 - [RHEL5] [AMTU] Hangs while running on kernel-xen
Summary: [RHEL5] [AMTU] Hangs while running on kernel-xen
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel-xen
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Martin Jenner
URL: http://rhts.lab.boston.redhat.com/cgi...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-07-20 12:37 UTC by Jeff Burke
Modified: 2009-01-12 18:04 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-12 18:04:51 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
tries to cause a segfault (114 bytes, text/x-csrc)
2007-07-26 15:45 UTC, Don Zickus
no flags Details

Description Jeff Burke 2007-07-20 12:37:11 UTC
Description of problem:
 /usr/bin/amtu hangs while running in the domU or on a dom0. It runs fine on all
other kernels.

Version-Release number of selected component (if applicable):
 amtu-1.0.4-4

How reproducible:
 Always

Steps to Reproduce:
1. Install RHEL5 GA with xen. Run the /usr/bin/amtu application.
  
Actual results:
 System starts the test but just hangs.

Expected results:
 If for some reason the application can't support xen then it should detect that
and exit. It should not hang.

Additional info:
 This is causing RHTS failures.

Comment 1 Steve Grubb 2007-07-20 12:46:12 UTC
This would be indicating that the Xen kernel has a problem. Whoever maintains
that kernel should be cc'ed or this bz transferred to them.

Comment 2 Steve Grubb 2007-07-20 12:52:50 UTC
The test logs show this:

4gb seg fixup, process prelink (pid 19553), cs:ip 73:080a288c
printk: 338 messages suppressed.
4gb seg fixup, process prelink (pid 19553), cs:ip 73:08084338
printk: 20 messages suppressed.

Is prelink not working? Are there AVC's that denied prelink from running?

Do you have any idea where its hanging? Is this prelink failure related in any way?

Comment 3 Jeff Burke 2007-07-20 14:15:05 UTC
Steve,
    I don't know if it is related. I think it would be best/fastest to bring up
a test system and duplicate the issue. It happenes every time. You can use RHTS
and run the reserve workflow to get a test box if you don't have one available.


Comment 4 Steve Grubb 2007-07-20 17:03:44 UTC
I installed the xen kernel and set it to be default kernel on boot

[root@ibm-taroko ~]# /usr/bin/amtu
 Executing Memory Test...
 Memory Test SUCCESS!
 Executing Memory Separation Test...
 Memory Separation Test SUCCESS!
 Executing Network I/O Tests...
 Network I/O Controller Test SUCCESS!
 Executing I/O Controller - Disk Test...
 I/O Controller - Disk Test SUCCESS!
 Executing Supervisor Mode Instructions Test...
 Privileged Instruction Test SUCCESS!
[root@ibm-taroko ~]# uname -r
 2.6.18-8.el5xen

[root@ibm-taroko amtu]# make run
 No Build required using version packaged in distro
 ./runtest.sh
 ***** Starting the runtest.sh script *****
 ***** Current Running Kernel Package = kernel-xen-2.6.18-34.el5.x86_64 *****
 ***** Current Running AMTU Package = amtu-1.0.4-4.x86_64 *****
 ***** Current Running Distro = Red Hat Enterprise Linux Server release 5 *****
 ***** End of runtest.sh *****
    metric: 0
    Log: /tmp/tmp.Cc2746
    DMesg: /tmp/dmesg.log

I haven't tried a guest OS. Maybe the kernel got horked up from other tests and
that caused amtu to appear to fail?

Comment 5 Don Zickus 2007-07-26 15:45:18 UTC
Created attachment 160032 [details]
tries to cause a segfault

Comment 6 Don Zickus 2007-07-26 15:47:02 UTC
This problem only exists on 32-bit xen kernels (dom0 and domU).  I have narrowed
the problem to the kernel itself by running a simple testcase (stolen from the 
amtu source code).

The testcase just calls 'asm("HLT\n\t")' and expects a segfault but instead hangs.

The backtrace looks like this for process 'don':

don           T 00000640  2840 28197  27623                     (NOTLB)
       ea11dec0 00000282 5c6834fd 00000640 00000001 00000001 eab0d000 c0c33550
       6080c5bc 00000640 041890bf eab0d10c c10092c0 00000005 00101734 00000014
       c05fcf2f ea11dfbc 00030001 00000001 c042a916 00000000 ecb023c0 00000000
Call Trace:
 [<c05fcf2f>] do_general_protection+0x0/0x13b
 [<c042a916>] find_task_by_pid_type+0xa/0x12
 [<c0425cb7>] finish_stop+0x50/0x65
 [<c0426c25>] get_signal_to_deliver+0x2c1/0x3b0
 [<c05fcf2f>] do_general_protection+0x0/0x13b
 [<c040496a>] do_notify_resume+0x77/0x68e
 [<c0541355>] evtchn_do_upcall+0x64/0x9b
 [<c0405a51>] check_lazy_exec_limit+0x219/0x22a
 [<c05fcf76>] do_general_protection+0x47/0x13b
 [<c0405515>] hypervisor_callback+0x3d/0x48
 [<c05fcf2f>] do_general_protection+0x0/0x13b
 [<c04053f9>] work_notifysig+0x13/0x1a


Re-assigning this bz to the xen-kernel team.


Comment 8 RHEL Program Management 2008-03-11 19:43:57 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 9 RHEL Program Management 2008-06-09 22:01:31 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 10 Rik van Riel 2009-01-12 17:16:18 UTC
An execshield patch for 32 bit Xen went into the RHEL tree recently.  Can you verify whether this bug still happens with RHEL 5.3?

Comment 11 Jeff Burke 2009-01-12 17:57:14 UTC
Rik,
    It looks like it has been working OK for a while now. Actually it is good with RHEL5.2. I think the issue is resolved. 

   In case you are interested. Here is a link to all the test passes in the last year on the machine that showed the issue.
http://tinyurl.com/amtu-failure-bz249021

   I believe we can close this as resolved.

Jeff


Note You need to log in before you can comment on or make changes to this bug.