Bug 199654 - xen is very unstable, actions on domUs crash domain0
Summary: xen is very unstable, actions on domUs crash domain0
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel-xen
Version: 5
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Xen Maintainance List
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-07-21 04:31 UTC by Davyd
Modified: 2008-08-02 23:40 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-10-26 20:20:47 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Davyd 2006-07-21 04:31:49 UTC
Some actions on the domainU machines, especially shutting them down. Cause the
entire machine to crash. Connecting NFS servers via bridged ethernet connections
may also causes crashes (confirmation required).

Managed to get this stacktrace from dmesg:
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at arch/x86_64/mm/../../i386/mm/hypervisor.c:436
invalid opcode: 0000 [1] SMP
CPU 3
Modules linked in: xt_physdev iptable_filter ip_tables x_tables bridge ipv6
ppdev autofs4 hidp nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc raid0 video
button battery acpi_memhotplug ac lp parport_pc parport ohci_hcd sg mx_driver(U)
mx_mcp(U) e100 mii tg3 i2c_amd756 hw_random i2c_amd8111 i2c_core dm_snapshot
dm_zero dm_mirror dm_mod ext3 jbd aic79xx scsi_transport_spi sata_sil libata
sd_mod scsi_mod
Pid: 17, comm: events/3 Tainted: P      2.6.17-1.2157_FC5xen0 #1
RIP: e030:[<ffffffff80281334>]
<ffffffff80281334>{xen_destroy_contiguous_region+1008}
RSP: e02b:ffff88000061fd88  EFLAGS: 00010082
RAX: 00000000ffffffea RBX: ffff8803e2a28000 RCX: ffffffffffffff01
RDX: 0000000000000000 RSI: 800000037ddc3067 RDI: ffff8803e2a29000
RBP: 0000000000000001 R08: 000000000037ddc3 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff8055f348 R12: ffff8803e2a29000
R13: 0000000000000000 R14: ffff88001874c980 R15: 0000000000000004
FS:  00002aaaaae0f6f0(0000) GS:ffffffff805d2180(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process events/3 (pid: 17, threadinfo ffff88000061e000, task ffff880000adc080)
Stack: ffff88000061fdd0 0000000000000001 0000000000000001 0000000000007ff0
       ffffffff8055f340 0000000000000002 0000000000000000 0000000000007ff0
       0000000000000000 000000000037ddc2
Call Trace: <ffffffff802be00e>{slab_destroy+60} <ffffffff802bee91>{cache_reap+412}
       <ffffffff802becf5>{cache_reap+0} <ffffffff80255162>{run_workqueue+159}
       <ffffffff80251940>{worker_thread+0} <ffffffff80251a30>{worker_thread+240}
      <ffffffff8028a307>{default_wake_function+0} <ffffffff80238752>{kthread+212}
       <ffffffff80267f36>{child_rip+8} <ffffffff8023867e>{kthread+0}
       <ffffffff80267f2e>{child_rip+0}

Code: 0f 0b 68 4e f7 46 80 c2 b4 01 48 b8 ff ff ff 7f ff ff ff ff
RIP <ffffffff80281334>{xen_destroy_contiguous_region+1008} RSP <ffff88000061fd88>
 <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():0, irqs_disabled():1

Call Trace: <ffffffff80298253>{blocking_notifier_call_chain+31}
       <ffffffff80218677>{do_exit+32} <ffffffff80271754>{kernel_math_error+0}
       <ffffffff80272110>{do_invalid_op+163}
<ffffffff80281334>{xen_destroy_contiguous_region+1008}
       <ffffffff80407504>{rtnetlink_fill_ifinfo+1175}
<ffffffff80267c37>{error_exit+0}
       <ffffffff80281334>{xen_destroy_contiguous_region+1008}
       <ffffffff80281330>{xen_destroy_contiguous_region+1004}
       <ffffffff802be00e>{slab_destroy+60} <ffffffff802bee91>{cache_reap+412}
       <ffffffff802becf5>{cache_reap+0} <ffffffff80255162>{run_workqueue+159}
       <ffffffff80251940>{worker_thread+0} <ffffffff80251a30>{worker_thread+240}
      <ffffffff8028a307>{default_wake_function+0} <ffffffff80238752>{kthread+212}
       <ffffffff80267f36>{child_rip+8} <ffffffff8023867e>{kthread+0}
       <ffffffff80267f2e>{child_rip+0}
BUG: events/3/17, lock held at task exit time!
 [ffffffff804ce1a0] {cache_chain_mutex}
.. held by:          events/3:   17 [ffff880000adc080, 110]
... acquired at:               cache_reap+0x1a/0x1f6

The machine is a dual Opteron 280 with a total of 4 cores. The domainU are
running a recompiled FC5 kernel with the Xen PCI frontend turned on, but not
currently being used. Problems occur running the domain0 and a single domainU.

Comment 1 Brian Stein 2006-10-26 20:20:47 UTC
Closing; please reopen if issue persists.


Note You need to log in before you can comment on or make changes to this bug.