Description of problem: Running gdb in F-9 x86_64 guest on F-8 x86_64 host crashes guest. Nothing in xen guest console log. xend.log: [2008-08-07 16:06:38 29499] WARNING (XendDomainInfo:1203) Domain has crashed: name=xenf964 id=5. [2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1821) XendDomainInfo.destroyDomain(5) [2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1479) Removing vif/0 [2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:569) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0 [2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1479) Removing vbd/51712 [2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:569) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51712 [2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1479) Removing console/0 [2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:569) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0 [2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:106) XendDomainInfo.create_from_dict({'vcpus_params': {}, 'PV_args': '', 'features': '', 'cpus': [], 'paused': 0, 'domid': 5, 'shutdown': 0, 'VCPUs_live': 1, 'PV_bootloader': '/usr/bin/pygrub', 'actions_after_crash': 'restart', 'vbd_refs': ['c3b0b813-c876-d26b-af8a-6da40860c47c'], 'PV_ramdisk': '', 'is_control_domain': False, 'name_label': 'xenf964', 'VCPUs_at_startup': 1, 'HVM_boot_params': {}, 'platform': {}, 'cpu_weight': 256, 'console_refs': ['51e07efc-e718-3bb2-be7e-ab7d05b355b7'], 'online_vcpus': 1, 'cpu_cap': 0, 'blocked': 0, 'on_xend_stop': 'ignore', 'memory_static_min': 0, 'HVM_boot_policy': '', 'shutdown_reason': 3, 'VCPUs_max': 1, 'start_time': 1218146633.25, 'memory_static_max': 314572800, 'actions_after_shutdown': 'destroy', 'on_xend_start': 'ignore', 'crashed': 1, 'memory_dynamic_max': 314572800, 'actions_after_suspend': '', 'is_a_template': False, 'PV_bootloader_args': '', 'memory_dynamic_min': 314572800, 'uuid': 'd8e1c7ca-2d62-5f39-6f80-a476603a9970', 'PV_kernel': '', 'cpu_time': 12.434866100000001, 'shadow_memory': 0, 'dying': 1, 'vcpu_avail': 1L, 'notes': {'FEATURES': '!writable_page_tables|pae_pgdir_above_4gb', 'VIRT_BASE': '18446744071562067968', 'GUEST_VERSION': '2.6', 'PADDR_OFFSET': '0', 'GUEST_OS': 'linux', 'HYPERCALL_PAGE': '18446744071568605184', 'LOADER': 'generic', 'ENTRY': '18446744071568281600', 'XEN_VERSION': 'xen-3.0'}, 'other_config': {}, 'running': 0, 'actions_after_reboot': 'restart', 'vif_refs': ['243e2340-21e3-051c-c510-aeed7ea54a6d'], 'vtpm_refs': [], 'security': None, 'devices': {'243e2340-21e3-051c-c510-aeed7ea54a6d': ('vif', {'bridge': 'eth0', 'mac': '40:00:00:00:00:05', 'script': 'vif-bridge', 'uuid': '243e2340-21e3-051c-c510-aeed7ea54a6d', 'backend': 0}), '51e07efc-e718-3bb2-be7e-ab7d05b355b7': ('console', {'protocol': 'vt100', 'location': '2', 'uuid': '51e07efc-e718-3bb2-be7e-ab7d05b355b7'}), 'c3b0b813-c876-d26b-af8a-6da40860c47c': ('vbd', {'uuid': 'c3b0b813-c876-d26b-af8a-6da40860c47c', 'bootable': 1, 'driver': 'paravirtualised', 'dev': 'xvda:disk', 'uname': 'file:/export/data1/xen_disks/xenf964', 'mode': 'w', 'backend': 0})}}) [2008-08-07 16:06:38 29499] ERROR (XendDomainInfo:111) Domain construction failed Traceback (most recent call last): File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 109, in create_from_dict vm.start() File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 428, in start raise XendError('VM already running') XendError: VM already running [2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1802) XendDomainInfo.destroy: domid=5 [2008-08-07 16:06:38 29499] ERROR (XendDomainInfo:1369) Failed to restart domain 5. Traceback (most recent call last): File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 1355, in _restart self.info) File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomain.py", line 945, in domain_create_from_dict dominfo = XendDomainInfo.create_from_dict(config_dict) File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 109, in create_from_dict vm.start() File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 428, in start raise XendError('VM already running') XendError: VM already running Must restart xend to restart guest. Version-Release number of selected component (if applicable): host: 2.6.21.7-3.fc8xen guest: 2.6.25.3-2.fc9.x86_64.xen How reproducible: Everytime. Steps to Reproduce: 1. gdb cmake 2. break cmFindLibraryCommand.cxx:93 3. run -DCMAKE_INSTALL_PREFIX:PATH=/usr -DCMAKE_INSTALL_LIBDIR:PATH=/usr/lib64 - DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL _DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS: BOOL=ON .. --debug-output --debug-trycompile -DENABLE_ada:BOOL=ON -DENABLE_d:BOOL=ON -DENA BLE_ocaml:BOOL=ON -DENABLE_pdl:BOOL=ON -DHAVE_PTHREAD:BOOL=ON -DPL_FREETYPE_FONT_PATH:PATH =/usr/share/fonts/freefont -DPLD_aqt:BOOL=ON -DPLD_conex:BOOL=ON -DPLD_dg300:BOOL=ON -DPLD _imp:BOOL=ON -DPLD_linuxvga:BOOL=ON -DPLD_ljii:BOOL=ON -DPLD_ljiip:BOOL=ON -DPLD_mskermit: BOOL=ON -DPLD_ntk:BOOL=ON -DPLD_pstex:BOOL=ON -DPLD_svg:BOOL=ON -DPLD_tek4010:BOOL=ON -DPL D_tek4010f:BOOL=ON -DPLD_tek4107:BOOL=ON -DPLD_tek4107f:BOOL=ON -DPLD_versaterm:BOOL=ON -D PLD_vlt:BOOL=ON -DPLD_xterm:BOOL=ON -DPLD_wxwidgets:BOOL=ON -DBUILD_DOC:BOOL=ON -DBUILD_TE ST:BOOL=ON
Not much to go on here I'm afraid ... Try the 3.1.4 HV update I've just pushed out Try getting a stack trace of the guest by running this in the host: $> /usr/lib/xen/bin/xenctx -s System.map-2.6... <domid> Does "xm dmesg" show anything? How do you know the guest has crashed? Could it be the console has just hung, or ... ?
Okay, scratch the update part. Looks like the culprit may have been a failed attempt to install a F10 guest at the same time as I did the update. Reverting to 3.1.2-2 did not help.
Sorry, ignore last comment. Posting to the wrong bug.
Relevant xm dmesg seems to be: (XEN) traps.c:1747:d2 Domain attempted WRMSR 00000000c0000082 from ffff828c:801eb000 to ffffffff:810102a0. (XEN) domain_crash_sync called from entry.S (XEN) Domain 1 (vcpu#0) crashed on cpu#0: (XEN) ----[ Xen-3.1.3 x86_64 debug=n Not tainted ]---- (XEN) CPU: 0 (XEN) RIP: 0007:[<0000003b19e1dc3f>] (XEN) RFLAGS: 0000003b19c0c731 CONTEXT: guest (XEN) rax: 0000000000000000 rbx: 00002ab52e50f9e0 rcx: ffffffff8047af10 (XEN) rdx: 0000003b19c00158 rsi: 0000000000000006 rdi: 0000003b19e1dc40 (XEN) rbp: 00007fff7c597640 rsp: 000000000000e033 r8: 0000000000000000 (XEN) r9: 0000000000000001 r10: 0000000000000004 r11: 0000000000000246 (XEN) r12: 00002ab52e50f580 r13: 00007fff7c5fe268 r14: 000000000000000f (XEN) r15: 00002ab52e50f000 cr0: 0000000080050033 cr4: 00000000000006f0 (XEN) cr3: 0000000006d53000 cr2: 00000000007ca290 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0247 cs: 0007 (XEN) Guest stack trace from rsp=000000000000e033: (XEN) Fault while accessing guest memory.
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Moving to RHEL 5.2 since that is what I've moved by Dom0 host to. Still present with: kernel-xen-2.6.18-105.el5 kernel-xen-2.6.25.3-2.fc9.x86_64 guest console log shows: Kernel BUG at ffffffff80465fc0 [verbose debug info unavailable] invalid opcode: 0000 [1] CPU 0 Modules linked in: sha256_generic aes_generic cbc dm_crypt crypto_blkcipher dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 msdos linear raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 xen_netfront xen_blkfront ipv6 iscsi_tcp libiscsi scsi_transport_iscsi scsi_mod ext2 ext3 jbd ext4dev jbd2 mbcache crc16 squashfs pcspkr edd loop nfs lockd nfs_acl sunrpc vfat fat cramfs Pid: 4824, comm: ldconfig Not tainted 2.6.25-2.fc9.x86_64.xen #1 RIP: e030:[<ffffffff80465fc0>] [<ffffffff80465fc0>] xen_failsafe_callback+0x0/0x10 RSP: e02b:ffff880016e25e08 EFLAGS: 00010003 RAX: 0000000000000001 RBX: ffffffff80628410 RCX: ffffffff80465fc0 RDX: ffffffffff516000 RSI: 0000000000000004 RDI: 000000000ac4a000 RBP: ffff880016e25ea0 R08: ffffffff8063d730 R09: 0000000000000000 R10: 00000054bd94c057 R11: 0000000000000203 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffffffff805bf000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000cd1028 CR3: 000000000ac4a000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000 Process ldconfig (pid: 4824, threadinfo ffff88000ba00000, task ffff88001aa48000) Stack: ffffffff80627398 0000000000000000 0000000000000000 0000000000000000 0000000000000063 0000000000000000 ffffffff8020b55e 000000010000e030 0000000000000003 ffff880016e25e60 000000000000e02b ffff880016e25ea0 Call Trace: Code: 0f 07 66 0f 1f 84 00 00 00 00 00 48 8b 0c 24 4c 8b 5c 24 08 48 83 c4 10 6a 00 50 48 8d 05 19 00 00 00 e9 54 fb ff ff 0f 1f 40 00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 fc 65 ff RIP [<ffffffff80465fc0>] xen_failsafe_callback+0x0/0x10 RSP <ffff880016e25e08>
Host is running kernel-xen-2.6.18-105.el5 Orion: could you try and reproduce with RHEL5.3 ?
Also, does the same bug happen in an F-10/F-11 guest, and/or a RHEL 5.3 guest? I would just like to know if this is limited to the F-9 xen kernel (which is somewhat of an odd beast), or is in some other guests as well. Thanks, Chris Lalancette
Dom0 is currently running: kernel-xen-2.6.18-132.el5virttest10 xen-3.0.3-73.el5 F-10 guest doesn't seem to crash. Does seem to have some spurious SIGTRAPs though: (gdb) run .. Starting program: /usr/bin/cmake .. Program received signal SIGTRAP, Trace/breakpoint trap. 0x000000323aa17807 in access () from /lib64/ld-linux-x86-64.so.2 (gdb) c Continuing. Program received signal SIGTRAP, Trace/breakpoint trap. 0x000000323aa17957 in munmap () from /lib64/ld-linux-x86-64.so.2 Hope to be able to get you 5.3 info next week....
Running DomU fedora rawhide kernel 2.6.29.1-46.fc11.x86_64 under Dom0 2.6.18-128.1.6.el5xen I am seeing the same issues when running gdb. Spurious SIGTRAP signals that make debugging things really hard. It also interferes with ltrace: $ ltrace /bin/ls unexpected breakpoint at 0x7f9a5fbc6bbf unexpected breakpoint at 0x7f9a5fbb939f unexpected breakpoint at 0x7f9a5fbbc9d3 [...] And with systemtap user space probes: $ sudo stap -e 'probe process("/bin/ls").function("*"){log(pp())}' & $ ls Trace/breakpoint trap
Jeremy Fitzhardinge posted an analysis and a possible fix: http://lkml.org/lkml/2009/3/29/317
Mark, Can you confirm whether that patch fixes the issue? Assuming it does, and that patch is queued for upstream (I'll take a look shortly), then we can do a backport to the appropriate kernels. Thanks, Chris Lalancette
I build a fedora kernel from the F-10 rpm spec with the patch applied (2.6.29.1-15.mjw.x86_64) and can confirm that gdb, ltrace and systemtap uprobe support work with it running under 2.6.18-128.1.6.el5xen. But during bootup I do get: WARNING: at arch/x86/xen/enlighten.c:453 cvt_gate_to_trap+0x80/0xc0() (Not tainted) Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.29.1-15.mjw.x86_64 #1 Call Trace: [<ffffffff81048e84>] warn_slowpath+0xdb/0xfa [<ffffffff8100e8a1>] ? __xen_spin_lock+0xae/0xc1 [<ffffffff8100e5d9>] ? xen_spin_unlock+0x11/0x2e [<ffffffff810993bc>] ? trace_hardirqs_off+0x9/0x20 [<ffffffff810993bc>] ? trace_hardirqs_off+0x9/0x20 [<ffffffff81365246>] ? _spin_unlock_irqrestore+0x27/0x3e [<ffffffff8104959d>] ? release_console_sem+0x1d4/0x1e0 [<ffffffff81049af9>] ? vprintk+0x313/0x326 [<ffffffff810993bc>] ? trace_hardirqs_off+0x9/0x20 [<ffffffff81365246>] ? _spin_unlock_irqrestore+0x27/0x3e [<ffffffff8100c4d5>] ? get_phys_to_machine+0x1a/0x31 [<ffffffff810993bc>] ? trace_hardirqs_off+0x9/0x20 [<ffffffff8102ac6b>] ? pvclock_clocksource_read+0x42/0x7b [<ffffffff8102ac6b>] ? pvclock_clocksource_read+0x42/0x7b [<ffffffff81365890>] ? nmi+0x0/0x51 [<ffffffff8100aa86>] cvt_gate_to_trap+0x80/0xc0 [<ffffffff8100ab14>] xen_convert_trap_info+0x4e/0x7e [<ffffffff8100b731>] xen_load_idt+0x47/0x71 [<ffffffff8135d8e3>] cpu_init+0xd6/0x331 [<ffffffff8100b4cc>] ? xen_write_idt_entry+0x41/0xa5 [<ffffffff8118679a>] ? generic_swap+0x0/0x1c [<ffffffff8117f418>] ? cmp_ex+0x0/0x15 [<ffffffff815d2a08>] trap_init+0x1b5/0x1b7 [<ffffffff815cbc06>] start_kernel+0x1f1/0x3c8 [<ffffffff815cb2c3>] x86_64_start_reservations+0xae/0xb2 [<ffffffff815d1c6c>] xen_start_kernel+0x584/0x593
OK, so it looks like it probably setup the alternate stacks for debug and int3, but it was also called with something else where the val->ist wasn't 0, causing that WARN_ON() to fire. So the patch is in the right direction, but isn't 100% correct yet. We'll need to follow up with the upstream thread. Chris Lalancette
I did post a followup here: http://thread.gmane.org/gmane.comp.emulators.xen.devel/63804/focus=64374
Unfortunately no reply on the list. I am not subscribed though, so I check it from gmane from time to time. The patch does work though if one wants to use gdb (or some other debugging or tracing tool) in a xen client.
More testers and a reply upstream by Jeremy Fitzhardinge: http://article.gmane.org/gmane.comp.emulators.xen.devel/65181 The patches are here: http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=bb2105d9743a902fd6035ba6d72cef6cda871664 http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=19a1071c33c4aa7531ac5c5a634a12607aefa90a
Justin: another one worth backporting to F-11, maybe?
Seems so. Won't make the F-11 release, but we should get it into an update kernel.
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
kernel-2.6.30.5-43.fc11 has been submitted as an update for Fedora 11. http://admin.fedoraproject.org/updates/kernel-2.6.30.5-43.fc11
kernel-2.6.30.5-43.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report.