Bug 199654

Summary: xen is very unstable, actions on domUs crash domain0
Product: [Fedora] Fedora Reporter: Davyd <davyd>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED UPSTREAM QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: bstein, ronny-rhbugzilla
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-26 20:20:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Davyd 2006-07-21 04:31:49 UTC
Some actions on the domainU machines, especially shutting them down. Cause the
entire machine to crash. Connecting NFS servers via bridged ethernet connections
may also causes crashes (confirmation required).

Managed to get this stacktrace from dmesg:
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:436
invalid opcode: 0000 [1] SMP
CPU 3
Modules linked in: xt_physdev iptable_filter ip_tables x_tables bridge ipv6
ppdev autofs4 hidp nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc raid0 video
button battery acpi_memhotplug ac lp parport_pc parport ohci_hcd sg mx_driver(U)
mx_mcp(U) e100 mii tg3 i2c_amd756 hw_random i2c_amd8111 i2c_core dm_snapshot
dm_zero dm_mirror dm_mod ext3 jbd aic79xx scsi_transport_spi sata_sil libata
sd_mod scsi_mod
Pid: 17, comm: events/3 Tainted: P      2.6.17-1.2157_FC5xen0 #1
RIP: e030:[<ffffffff80281334>]
<ffffffff80281334>{xen_destroy_contiguous_region+1008}
RSP: e02b:ffff88000061fd88  EFLAGS: 00010082
RAX: 00000000ffffffea RBX: ffff8803e2a28000 RCX: ffffffffffffff01
RDX: 0000000000000000 RSI: 800000037ddc3067 RDI: ffff8803e2a29000
RBP: 0000000000000001 R08: 000000000037ddc3 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff8055f348 R12: ffff8803e2a29000
R13: 0000000000000000 R14: ffff88001874c980 R15: 0000000000000004
FS:  00002aaaaae0f6f0(0000) GS:ffffffff805d2180(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process events/3 (pid: 17, threadinfo ffff88000061e000, task ffff880000adc080)
Stack: ffff88000061fdd0 0000000000000001 0000000000000001 0000000000007ff0
       ffffffff8055f340 0000000000000002 0000000000000000 0000000000007ff0
       0000000000000000 000000000037ddc2
Call Trace: <ffffffff802be00e>{slab_destroy+60} <ffffffff802bee91>{cache_reap+412}
       <ffffffff802becf5>{cache_reap+0} <ffffffff80255162>{run_workqueue+159}
       <ffffffff80251940>{worker_thread+0} <ffffffff80251a30>{worker_thread+240}
      <ffffffff8028a307>{default_wake_function+0} <ffffffff80238752>{kthread+212}
       <ffffffff80267f36>{child_rip+8} <ffffffff8023867e>{kthread+0}
       <ffffffff80267f2e>{child_rip+0}

Code: 0f 0b 68 4e f7 46 80 c2 b4 01 48 b8 ff ff ff 7f ff ff ff ff
RIP <ffffffff80281334>{xen_destroy_contiguous_region+1008} RSP <ffff88000061fd88>
 <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1

Call Trace: <ffffffff80298253>{blocking_notifier_call_chain+31}
       <ffffffff80218677>{do_exit+32} <ffffffff80271754>{kernel_math_error+0}
       <ffffffff80272110>{do_invalid_op+163}
<ffffffff80281334>{xen_destroy_contiguous_region+1008}
       <ffffffff80407504>{rtnetlink_fill_ifinfo+1175}
<ffffffff80267c37>{error_exit+0}
       <ffffffff80281334>{xen_destroy_contiguous_region+1008}
       <ffffffff80281330>{xen_destroy_contiguous_region+1004}
       <ffffffff802be00e>{slab_destroy+60} <ffffffff802bee91>{cache_reap+412}
       <ffffffff802becf5>{cache_reap+0} <ffffffff80255162>{run_workqueue+159}
       <ffffffff80251940>{worker_thread+0} <ffffffff80251a30>{worker_thread+240}
      <ffffffff8028a307>{default_wake_function+0} <ffffffff80238752>{kthread+212}
       <ffffffff80267f36>{child_rip+8} <ffffffff8023867e>{kthread+0}
       <ffffffff80267f2e>{child_rip+0}
BUG: events/3/17, lock held at task exit time!
 [ffffffff804ce1a0] {cache_chain_mutex}
.. held by:          events/3:   17 [ffff880000adc080, 110]
... acquired at:               cache_reap+0x1a/0x1f6

The machine is a dual Opteron 280 with a total of 4 cores. The domainU are
running a recompiled FC5 kernel with the Xen PCI frontend turned on, but not
currently being used. Problems occur running the domain0 and a single domainU.

Comment 1 Brian Stein 2006-10-26 20:20:47 UTC
Closing; please reopen if issue persists.