458385 – Running gdb in F-9 x86_64 guest on RHEL 5.2 x86_64 host crashes guest.

Bug 458385 - Running gdb in F-9 x86_64 guest on RHEL 5.2 x86_64 host crashes guest.

Summary: Running gdb in F-9 x86_64 guest on RHEL 5.2 x86_64 host crashes guest.

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	11
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Justin M. Forbes
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	329781 F11VirtTarget 492568 513462
TreeView+	depends on / blocked

Reported:	2008-08-07 22:18 UTC by Orion Poplawski
Modified:	2009-09-17 19:16 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2009-09-17 19:16:15 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Orion Poplawski 2008-08-07 22:18:08 UTC

Description of problem:

Running gdb in F-9 x86_64 guest on F-8 x86_64 host crashes guest.  Nothing in xen guest console log.  xend.log:

[2008-08-07 16:06:38 29499] WARNING (XendDomainInfo:1203) Domain has crashed: name=xenf964 id=5.
[2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1821) XendDomainInfo.destroyDomain(5)
[2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1479) Removing vif/0
[2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:569) XendDomainInfo.destroyDevice: deviceClass = vif, device = vif/0
[2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1479) Removing vbd/51712
[2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:569) XendDomainInfo.destroyDevice: deviceClass = vbd, device = vbd/51712
[2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1479) Removing console/0
[2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:569) XendDomainInfo.destroyDevice: deviceClass = console, device = console/0
[2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:106) XendDomainInfo.create_from_dict({'vcpus_params': {}, 'PV_args': '', 'features': '', 'cpus': [], 'paused': 0, 'domid': 5, 'shutdown': 0, 'VCPUs_live': 1, 'PV_bootloader': '/usr/bin/pygrub', 'actions_after_crash': 'restart', 'vbd_refs': ['c3b0b813-c876-d26b-af8a-6da40860c47c'], 'PV_ramdisk': '', 'is_control_domain': False, 'name_label': 'xenf964', 'VCPUs_at_startup': 1, 'HVM_boot_params': {}, 'platform': {}, 'cpu_weight': 256, 'console_refs': ['51e07efc-e718-3bb2-be7e-ab7d05b355b7'], 'online_vcpus': 1, 'cpu_cap': 0, 'blocked': 0, 'on_xend_stop': 'ignore', 'memory_static_min': 0, 'HVM_boot_policy': '', 'shutdown_reason': 3, 'VCPUs_max': 1, 'start_time': 1218146633.25, 'memory_static_max': 314572800, 'actions_after_shutdown': 'destroy', 'on_xend_start': 'ignore', 'crashed': 1, 'memory_dynamic_max': 314572800, 'actions_after_suspend': '', 'is_a_template': False, 'PV_bootloader_args': '', 'memory_dynamic_min': 314572800, 'uuid': 'd8e1c7ca-2d62-5f39-6f80-a476603a9970', 'PV_kernel': '', 'cpu_time': 12.434866100000001, 'shadow_memory': 0, 'dying': 1, 'vcpu_avail': 1L, 'notes': {'FEATURES': '!writable_page_tables|pae_pgdir_above_4gb', 'VIRT_BASE': '18446744071562067968', 'GUEST_VERSION': '2.6', 'PADDR_OFFSET': '0', 'GUEST_OS': 'linux', 'HYPERCALL_PAGE': '18446744071568605184', 'LOADER': 'generic', 'ENTRY': '18446744071568281600', 'XEN_VERSION': 'xen-3.0'}, 'other_config': {}, 'running': 0, 'actions_after_reboot': 'restart', 'vif_refs': ['243e2340-21e3-051c-c510-aeed7ea54a6d'], 'vtpm_refs': [], 'security': None, 'devices': {'243e2340-21e3-051c-c510-aeed7ea54a6d': ('vif', {'bridge': 'eth0', 'mac': '40:00:00:00:00:05', 'script': 'vif-bridge', 'uuid': '243e2340-21e3-051c-c510-aeed7ea54a6d', 'backend': 0}), '51e07efc-e718-3bb2-be7e-ab7d05b355b7': ('console', {'protocol': 'vt100', 'location': '2', 'uuid': '51e07efc-e718-3bb2-be7e-ab7d05b355b7'}), 'c3b0b813-c876-d26b-af8a-6da40860c47c': ('vbd', {'uuid': 'c3b0b813-c876-d26b-af8a-6da40860c47c', 'bootable': 1, 'driver': 'paravirtualised', 'dev': 'xvda:disk', 'uname': 'file:/export/data1/xen_disks/xenf964', 'mode': 'w', 'backend': 0})}})
[2008-08-07 16:06:38 29499] ERROR (XendDomainInfo:111) Domain construction failed
Traceback (most recent call last):
  File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 109, in create_from_dict
    vm.start()
  File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 428, in start
    raise XendError('VM already running')
XendError: VM already running
[2008-08-07 16:06:38 29499] DEBUG (XendDomainInfo:1802) XendDomainInfo.destroy: domid=5
[2008-08-07 16:06:38 29499] ERROR (XendDomainInfo:1369) Failed to restart domain 5.
Traceback (most recent call last):
  File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 1355, in _restart
    self.info)
  File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomain.py", line 945, in domain_create_from_dict
    dominfo = XendDomainInfo.create_from_dict(config_dict)
  File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 109, in create_from_dict
    vm.start()
  File "/usr/lib64/python2.5/site-packages/xen/xend/XendDomainInfo.py", line 428, in start
    raise XendError('VM already running')
XendError: VM already running

Must restart xend to restart guest.

Version-Release number of selected component (if applicable):
host: 2.6.21.7-3.fc8xen
guest: 2.6.25.3-2.fc9.x86_64.xen

How reproducible:
Everytime.

Steps to Reproduce:
1. gdb cmake
2. break cmFindLibraryCommand.cxx:93
3. run -DCMAKE_INSTALL_PREFIX:PATH=/usr -DCMAKE_INSTALL_LIBDIR:PATH=/usr/lib64 -
DINCLUDE_INSTALL_DIR:PATH=/usr/include -DLIB_INSTALL_DIR:PATH=/usr/lib64 -DSYSCONF_INSTALL
_DIR:PATH=/etc -DSHARE_INSTALL_PREFIX:PATH=/usr/share -DLIB_SUFFIX=64 -DBUILD_SHARED_LIBS:
BOOL=ON .. --debug-output --debug-trycompile -DENABLE_ada:BOOL=ON -DENABLE_d:BOOL=ON -DENA
BLE_ocaml:BOOL=ON -DENABLE_pdl:BOOL=ON -DHAVE_PTHREAD:BOOL=ON -DPL_FREETYPE_FONT_PATH:PATH
=/usr/share/fonts/freefont -DPLD_aqt:BOOL=ON -DPLD_conex:BOOL=ON -DPLD_dg300:BOOL=ON -DPLD
_imp:BOOL=ON -DPLD_linuxvga:BOOL=ON -DPLD_ljii:BOOL=ON -DPLD_ljiip:BOOL=ON -DPLD_mskermit:
BOOL=ON -DPLD_ntk:BOOL=ON -DPLD_pstex:BOOL=ON -DPLD_svg:BOOL=ON -DPLD_tek4010:BOOL=ON -DPL
D_tek4010f:BOOL=ON -DPLD_tek4107:BOOL=ON -DPLD_tek4107f:BOOL=ON -DPLD_versaterm:BOOL=ON -D
PLD_vlt:BOOL=ON -DPLD_xterm:BOOL=ON -DPLD_wxwidgets:BOOL=ON -DBUILD_DOC:BOOL=ON -DBUILD_TE
ST:BOOL=ON

Comment 1 Mark McLoughlin 2008-08-08 07:04:55 UTC

Not much to go on here I'm afraid ...

Try the 3.1.4 HV update I've just pushed out

Try getting a stack trace of the guest by running this in the host:

  $> /usr/lib/xen/bin/xenctx -s System.map-2.6... <domid>

Does "xm dmesg" show anything?

How do you know the guest has crashed? Could it be the console has just hung, or ... ?

Comment 2 Orion Poplawski 2008-08-12 15:55:38 UTC

Okay, scratch the update part.  Looks like the culprit may have been a failed
attempt to install a F10 guest at the same time as I did the update.  Reverting
to 3.1.2-2 did not help.

Comment 3 Orion Poplawski 2008-08-12 16:03:41 UTC

Sorry, ignore last comment.  Posting to the wrong bug.

Comment 4 Orion Poplawski 2008-08-15 21:09:05 UTC

Relevant xm dmesg seems to be:

(XEN) traps.c:1747:d2 Domain attempted WRMSR 00000000c0000082 from ffff828c:801eb000 to ffffffff:810102a0.
(XEN) domain_crash_sync called from entry.S
(XEN) Domain 1 (vcpu#0) crashed on cpu#0:
(XEN) ----[ Xen-3.1.3  x86_64  debug=n  Not tainted ]----
(XEN) CPU:    0
(XEN) RIP:    0007:[<0000003b19e1dc3f>]
(XEN) RFLAGS: 0000003b19c0c731   CONTEXT: guest
(XEN) rax: 0000000000000000   rbx: 00002ab52e50f9e0   rcx: ffffffff8047af10
(XEN) rdx: 0000003b19c00158   rsi: 0000000000000006   rdi: 0000003b19e1dc40
(XEN) rbp: 00007fff7c597640   rsp: 000000000000e033   r8:  0000000000000000
(XEN) r9:  0000000000000001   r10: 0000000000000004   r11: 0000000000000246
(XEN) r12: 00002ab52e50f580   r13: 00007fff7c5fe268   r14: 000000000000000f
(XEN) r15: 00002ab52e50f000   cr0: 0000000080050033   cr4: 00000000000006f0
(XEN) cr3: 0000000006d53000   cr2: 00000000007ca290
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0247   cs: 0007
(XEN) Guest stack trace from rsp=000000000000e033:
(XEN)   Fault while accessing guest memory.

Comment 5 Bug Zapper 2008-11-26 11:04:43 UTC

This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 6 Orion Poplawski 2008-12-19 18:38:19 UTC

Moving to RHEL 5.2 since that is what I've moved by Dom0 host to.  Still present with:

kernel-xen-2.6.18-105.el5

kernel-xen-2.6.25.3-2.fc9.x86_64

guest console log shows:

Kernel BUG at ffffffff80465fc0 [verbose debug info unavailable] 
invalid opcode: 0000 [1]                                       
CPU 0    
Modules linked in: sha256_generic aes_generic cbc dm_crypt crypto_blkcipher dm_emc dm_round_robin dm_multipath dm_snapshot dm_mirror dm_zero dm_mod xfs jfs reiserfs lock_nolock gfs2 msdos linear raid10 raid456 async_xor async_memcpy async_tx xor raid1 raid0 xen_netfront xen_blkfront ipv6 iscsi_tcp libiscsi scsi_transport_iscsi scsi_mod ext2 ext3 jbd ext4dev jbd2 mbcache crc16 squashfs pcspkr edd loop nfs lockd nfs_acl sunrpc vfat fat cramfs 
Pid: 4824, comm: ldconfig Not tainted 2.6.25-2.fc9.x86_64.xen #1 
RIP: e030:[<ffffffff80465fc0>]  [<ffffffff80465fc0>] xen_failsafe_callback+0x0/0x10                                                      
RSP: e02b:ffff880016e25e08  EFLAGS: 00010003           
RAX: 0000000000000001 RBX: ffffffff80628410 RCX: ffffffff80465fc0
RDX: ffffffffff516000 RSI: 0000000000000004 RDI: 000000000ac4a000
RBP: ffff880016e25ea0 R08: ffffffff8063d730 R09: 0000000000000000
R10: 00000054bd94c057 R11: 0000000000000203 R12: 0000000000000001 
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 
FS:  0000000000000000(0000) GS:ffffffff805bf000(0000) knlGS:0000000000000000 
CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033        
CR2: 0000000000cd1028 CR3: 000000000ac4a000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000000
Process ldconfig (pid: 4824, threadinfo ffff88000ba00000, task ffff88001aa48000)
Stack:  ffffffff80627398 0000000000000000 0000000000000000 0000000000000000 
 0000000000000063 0000000000000000 ffffffff8020b55e 000000010000e030       
 0000000000000003 ffff880016e25e60 000000000000e02b ffff880016e25ea0   
Call Trace:               

Code: 0f 07 66 0f 1f 84 00 00 00 00 00 48 8b 0c 24 4c 8b 5c 24 08 48 83 c4 10 6a 00 50 48 8d 05 19 00 00 00 e9 54 fb ff ff 0f 1f 40 00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 fc 65 ff        
RIP  [<ffffffff80465fc0>] xen_failsafe_callback+0x0/0x10
 RSP <ffff880016e25e08>

Comment 8 Mark McLoughlin 2009-03-20 17:41:18 UTC

Host is running kernel-xen-2.6.18-105.el5

Orion: could you try and reproduce with RHEL5.3 ?

Comment 9 Chris Lalancette 2009-03-21 09:53:05 UTC

Also, does the same bug happen in an F-10/F-11 guest, and/or a RHEL 5.3 guest?  I would just like to know if this is limited to the F-9 xen kernel (which is somewhat of an odd beast), or is in some other guests as well.

Thanks,
Chris Lalancette

Comment 10 Orion Poplawski 2009-03-24 22:34:36 UTC

Dom0 is currently running:

kernel-xen-2.6.18-132.el5virttest10
xen-3.0.3-73.el5

F-10 guest doesn't seem to crash.  Does seem to have some spurious SIGTRAPs though:

(gdb) run ..                                                                         
Starting program: /usr/bin/cmake ..                                                  

Program received signal SIGTRAP, Trace/breakpoint trap.
0x000000323aa17807 in access () from /lib64/ld-linux-x86-64.so.2
(gdb) c                                                                       
Continuing.                                                                   

Program received signal SIGTRAP, Trace/breakpoint trap.
0x000000323aa17957 in munmap () from /lib64/ld-linux-x86-64.so.2

Hope to be able to get you 5.3 info next week....

Comment 11 Mark Wielaard 2009-04-05 20:42:02 UTC

Running DomU fedora rawhide kernel 2.6.29.1-46.fc11.x86_64 under Dom0 2.6.18-128.1.6.el5xen I am seeing the same issues when running gdb. Spurious SIGTRAP signals that make debugging things really hard. It also interferes with ltrace:

$ ltrace /bin/ls
unexpected breakpoint at 0x7f9a5fbc6bbf
unexpected breakpoint at 0x7f9a5fbb939f
unexpected breakpoint at 0x7f9a5fbbc9d3
[...]

And with systemtap user space probes:

$ sudo stap -e 'probe process("/bin/ls").function("*"){log(pp())}' &
$ ls
Trace/breakpoint trap

Comment 12 Mark Wielaard 2009-04-05 21:39:04 UTC

Jeremy Fitzhardinge posted an analysis and a possible fix:
http://lkml.org/lkml/2009/3/29/317

Comment 13 Chris Lalancette 2009-04-06 07:00:45 UTC

Mark,
     Can you confirm whether that patch fixes the issue?  Assuming it does, and that patch is queued for upstream (I'll take a look shortly), then we can do a backport to the appropriate kernels.

Thanks,
Chris Lalancette

Comment 14 Mark Wielaard 2009-04-06 21:13:17 UTC

I build a fedora kernel from the F-10 rpm spec with the patch applied (2.6.29.1-15.mjw.x86_64) and can confirm that gdb, ltrace and systemtap uprobe support work with it running under 2.6.18-128.1.6.el5xen.

But during bootup I do get:

WARNING: at arch/x86/xen/enlighten.c:453 cvt_gate_to_trap+0x80/0xc0() (Not tainted)
Modules linked in:
Pid: 0, comm: swapper Not tainted 2.6.29.1-15.mjw.x86_64 #1
Call Trace:
[<ffffffff81048e84>] warn_slowpath+0xdb/0xfa
[<ffffffff8100e8a1>] ? __xen_spin_lock+0xae/0xc1
[<ffffffff8100e5d9>] ? xen_spin_unlock+0x11/0x2e
[<ffffffff810993bc>] ? trace_hardirqs_off+0x9/0x20
[<ffffffff810993bc>] ? trace_hardirqs_off+0x9/0x20
[<ffffffff81365246>] ? _spin_unlock_irqrestore+0x27/0x3e
[<ffffffff8104959d>] ? release_console_sem+0x1d4/0x1e0
[<ffffffff81049af9>] ? vprintk+0x313/0x326
[<ffffffff810993bc>] ? trace_hardirqs_off+0x9/0x20
[<ffffffff81365246>] ? _spin_unlock_irqrestore+0x27/0x3e
[<ffffffff8100c4d5>] ? get_phys_to_machine+0x1a/0x31
[<ffffffff810993bc>] ? trace_hardirqs_off+0x9/0x20
[<ffffffff8102ac6b>] ? pvclock_clocksource_read+0x42/0x7b
[<ffffffff8102ac6b>] ? pvclock_clocksource_read+0x42/0x7b
[<ffffffff81365890>] ? nmi+0x0/0x51
[<ffffffff8100aa86>] cvt_gate_to_trap+0x80/0xc0
[<ffffffff8100ab14>] xen_convert_trap_info+0x4e/0x7e
[<ffffffff8100b731>] xen_load_idt+0x47/0x71
[<ffffffff8135d8e3>] cpu_init+0xd6/0x331
[<ffffffff8100b4cc>] ? xen_write_idt_entry+0x41/0xa5
[<ffffffff8118679a>] ? generic_swap+0x0/0x1c
[<ffffffff8117f418>] ? cmp_ex+0x0/0x15
[<ffffffff815d2a08>] trap_init+0x1b5/0x1b7
[<ffffffff815cbc06>] start_kernel+0x1f1/0x3c8
[<ffffffff815cb2c3>] x86_64_start_reservations+0xae/0xb2
[<ffffffff815d1c6c>] xen_start_kernel+0x584/0x593

Comment 15 Chris Lalancette 2009-04-07 06:47:20 UTC

OK, so it looks like it probably setup the alternate stacks for debug and int3, but it was also called with something else where the val->ist wasn't 0, causing that WARN_ON() to fire.  So the patch is in the right direction, but isn't 100% correct yet.  We'll need to follow up with the upstream thread.

Chris Lalancette

Comment 16 Mark Wielaard 2009-04-07 07:52:11 UTC

I did post a followup here:
http://thread.gmane.org/gmane.comp.emulators.xen.devel/63804/focus=64374

Comment 17 Mark Wielaard 2009-04-30 10:04:57 UTC

Unfortunately no reply on the list. I am not subscribed though, so I check it from gmane from time to time. The patch does work though if one wants to use gdb (or some other debugging or tracing tool) in a xen client.

Comment 18 Mark Wielaard 2009-05-11 09:15:31 UTC

More testers and a reply upstream by Jeremy Fitzhardinge:
http://article.gmane.org/gmane.comp.emulators.xen.devel/65181

The patches are here:
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=bb2105d9743a902fd6035ba6d72cef6cda871664
http://git.kernel.org/?p=linux/kernel/git/jeremy/xen.git;a=commit;h=19a1071c33c4aa7531ac5c5a634a12607aefa90a

Comment 22 Mark McLoughlin 2009-05-22 16:02:48 UTC

Justin: another one worth backporting to F-11, maybe?

Comment 23 Justin M. Forbes 2009-05-22 16:17:56 UTC

Seems so. Won't make the F-11 release, but we should get it into an update kernel.

Comment 24 Bug Zapper 2009-06-09 09:39:39 UTC

This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 25 Fedora Update System 2009-08-28 22:28:09 UTC

kernel-2.6.30.5-43.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/kernel-2.6.30.5-43.fc11

Comment 26 Fedora Update System 2009-09-06 20:43:48 UTC

kernel-2.6.30.5-43.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.