Description of problem: Cannot terminate the process when hang with running dettach device on Xen. Version-Release number of selected component (if applicable): rhel-5.4(Xen) libvirt 0.6.3-6.el5 How reproducible: Always Steps to Reproduce: [root@dhcp-66-70-85 ~]# virsh nodedev-list --tree [root@dhcp-66-70-85 ~]# virsh nodedev-dumpxml pci_10de_26c <device> <name>pci_10de_26c</name> <parent>computer</parent> <capability type='pci'> <domain>0</domain> <bus>0</bus> <slot>16</slot> <function>1</function> <product id='0x026c'>MCP51 High Definition Audio</product> <vendor id='0x10de'>nVidia Corporation</vendor> </capability> </device> [root@dhcp-66-70-85 ~]# virsh nodedev-dettach pci_10de_26c [root@dhcp-66-70-85 ~]# ps -ef | grep virsh root 4170 4123 0 10:56 pts/1 00:00:00 virsh nodedev-dettach pci_10de_26c root 4227 4209 0 10:56 pts/2 00:00:00 grep virsh [root@dhcp-66-70-85 ~]# kill -9 4170 [root@dhcp-66-70-85 ~]# ps -ef | grep virsh root 4170 4123 0 10:56 pts/1 00:00:00 virsh nodedev-dettach pci_10de_26c root 4230 4209 0 10:56 pts/2 00:00:00 grep virsh [root@dhcp-66-70-85 ~]# Actual results: Hang with the 'nodedev-dettach' command. Expected results: The node device can be dettached or give some errors. Additional info: No this issue on KVM.
Can you install the matching 'libvirt-debuginfo' RPM, and then when virsh hangs, attach to it with GDB and obtain a stack trace, eg # gdb `pgrep virsh` (gdb) backtrace
Can not obtain stack? [root@dhcp-66-70-85 ~]# pgrep virsh 4437 [root@dhcp-66-70-85 ~]# gdb `pgrep virsh` GNU gdb Fedora (6.8-35.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... 4437: No such file or directory. (gdb) backtrace No stack. (gdb)
I really hate GDB sometimes, try adding in the command name too when launching it, then it should give a backtrace, eg launch it with gdb /usr/bin/virsh `pgrep virsh`
Sometimes you have to use it, got the backtrace :) [root@dhcp-66-70-85 ~]# gdb /usr/bin/virsh `pgrep virsh` GNU gdb Fedora (6.8-35.el5) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... Attaching to program: /usr/bin/virsh, process 4699 Reading symbols from /usr/lib64/libvirt.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libvirt.so.0.6.3.debug...done. done. Loaded symbols for /usr/lib64/libvirt.so.0 Reading symbols from /usr/lib64/libxml2.so.2...done. Loaded symbols for /usr/lib64/libxml2.so.2 Reading symbols from /usr/lib64/libz.so.1...done. Loaded symbols for /usr/lib64/libz.so.1 Reading symbols from /lib64/libm.so.6...done. Loaded symbols for /lib64/libm.so.6 Reading symbols from /lib64/libselinux.so.1...done. Loaded symbols for /lib64/libselinux.so.1 Reading symbols from /usr/lib64/libnuma.so.1...done. Loaded symbols for /usr/lib64/libnuma.so.1 Reading symbols from /usr/lib64/libgnutls.so.13...done. Loaded symbols for /usr/lib64/libgnutls.so.13 Reading symbols from /usr/lib64/libsasl2.so.2...done. Loaded symbols for /usr/lib64/libsasl2.so.2 Reading symbols from /usr/lib64/libxenstore.so.3.0...done. Loaded symbols for /usr/lib64/libxenstore.so.3.0 Reading symbols from /usr/lib64/libreadline.so.5...done. Loaded symbols for /usr/lib64/libreadline.so.5 Reading symbols from /usr/lib64/libncurses.so.5...done. Loaded symbols for /usr/lib64/libncurses.so.5 Reading symbols from /lib64/libpthread.so.0...done. [Thread debugging using libthread_db enabled] [New Thread 0x2b109df3cf20 (LWP 4699)] Loaded symbols for /lib64/libpthread.so.0 Reading symbols from /lib64/libc.so.6...done. Loaded symbols for /lib64/libc.so.6 Reading symbols from /lib64/libdl.so.2...done. Loaded symbols for /lib64/libdl.so.2 Reading symbols from /lib64/libsepol.so.1...done. Loaded symbols for /lib64/libsepol.so.1 Reading symbols from /lib64/ld-linux-x86-64.so.2...done. Loaded symbols for /lib64/ld-linux-x86-64.so.2 Reading symbols from /usr/lib64/libgcrypt.so.11...done. Loaded symbols for /usr/lib64/libgcrypt.so.11 Reading symbols from /usr/lib64/libgpg-error.so.0...done. Loaded symbols for /usr/lib64/libgpg-error.so.0 Reading symbols from /lib64/libresolv.so.2...done. Loaded symbols for /lib64/libresolv.so.2 Reading symbols from /lib64/libcrypt.so.1...done. Loaded symbols for /lib64/libcrypt.so.1 0x000000324d8cad5f in poll () from /lib64/libc.so.6 (gdb)
You seem to have missed off the actual stack trace in that last comment, only pasting the GDB startup-noise. Can you type 'backtrace' now you have a GDB prompt & provide the result here.
Oh, sorry, missing the backtrace info. (gdb) backtrace #0 0x000000324d8cad5f in poll () from /lib64/libc.so.6 #1 0x0000003260e48007 in call (conn=<value optimized out>, priv=<value optimized out>, flags=<value optimized out>, proc_nr=<value optimized out>, args_filter=<value optimized out>, args=<value optimized out>, ret_filter=Could not find the frame base for "call". ) at remote_internal.c:6407 #2 0x0000003260e48a58 in remoteNodeDeviceDettach (dev=<value optimized out>) at remote_internal.c:4931 #3 0x0000003260e243e8 in virNodeDeviceDettach (dev=<value optimized out>) at libvirt.c:7409 #4 0x000000000040ae85 in cmdNodeDeviceDettach (ctl=0x7fffa4f350a0, cmd=<value optimized out>) at virsh.c:4759 #5 0x0000000000410b40 in vshCommandRun (ctl=0x7fffa4f350a0, cmd=0x19bfa5a0) at virsh.c:6548 #6 0x0000000000411487 in main (argc=1, argv=0x7fffa4f35208) at virsh.c:7500 (gdb)
assigned for triage.
Hmm, that stack trace does not appear to be calling into the Xen driver at all all, rather using the remote driver (aka QEMU). Can you provide libvirt logging info # LIBVIRT_DEBUG=1 virsh nodedev-dettach pci_10de_26c And details of the host capabilities when on the Xen machine # virsh capabilities
Sorry, please ignore the Comment #9. [root@dhcp-66-70-85 ~]# LIBVIRT_DEBUG=1 virsh nodedev-dettach pci_10de_26c 13:24:43.534: debug : virInitialize:290 : register drivers 13:24:43.534: debug : virRegisterDriver:667 : registering Test as driver 0 13:24:43.534: debug : virRegisterNetworkDriver:567 : registering Test as network driver 0 13:24:43.534: debug : virRegisterStorageDriver:598 : registering Test as storage driver 0 13:24:43.534: debug : virRegisterDeviceMonitor:629 : registering Test as device driver 0 13:24:43.534: debug : xenHypervisorInit:1922 : Using new hypervisor call: 30001 13:24:43.534: debug : xenHypervisorInit:1991 : Using hypervisor call v2, sys ver3 dom ver5 13:24:43.534: debug : virRegisterDriver:667 : registering Xen as driver 1 13:24:43.534: debug : virRegisterDriver:667 : registering remote as driver 2 13:24:43.534: debug : virRegisterNetworkDriver:567 : registering remote as network driver 1 13:24:43.534: debug : virRegisterStorageDriver:598 : registering remote as storage driver 1 13:24:43.534: debug : virRegisterDeviceMonitor:629 : registering remote as device driver 1 13:24:43.534: debug : virConnectOpenAuth:1100 : name=(null), auth=0x326109b360, flags=0 13:24:43.534: debug : do_open:922 : no name, allowing driver auto-select 13:24:43.534: debug : do_open:930 : trying driver 0 (Test) ... 13:24:43.534: debug : do_open:936 : driver 0 Test returned DECLINED 13:24:43.534: debug : do_open:930 : trying driver 1 (Xen) ... 13:24:43.534: debug : xenUnifiedOpen:295 : Trying hypervisor sub-driver 13:24:43.534: debug : xenUnifiedOpen:298 : Activated hypervisor sub-driver 13:24:43.534: debug : xenUnifiedOpen:306 : Trying XenD sub-driver 13:24:43.536: debug : xenUnifiedOpen:309 : Activated XenD sub-driver 13:24:43.536: debug : xenUnifiedOpen:315 : Trying XM sub-driver 13:24:43.536: debug : xenUnifiedOpen:318 : Activated XM sub-driver 13:24:43.536: debug : xenUnifiedOpen:322 : Trying XS sub-driver 13:24:43.537: debug : xenStoreOpen:346 : Failed to add event handle, disabling events 13:24:43.537: debug : xenUnifiedOpen:325 : Activated XS sub-driver 13:24:43.543: debug : xenUnifiedOpen:361 : Trying Xen inotify sub-driver 13:24:43.543: debug : xenInotifyOpen:439 : Adding a watch on /etc/xen 13:24:43.543: debug : xenInotifyOpen:451 : Building initial config cache 13:24:43.543: debug : xenXMConfigCacheAddFile:387 : Adding file /etc/xen/foo1 13:24:43.543: debug : xenXMConfigCacheAddFile:465 : Added config foo1 /etc/xen/foo1 13:24:43.543: debug : xenXMConfigCacheAddFile:387 : Adding file /etc/xen/demo 13:24:43.543: debug : xenXMConfigCacheAddFile:465 : Added config demo /etc/xen/demo 13:24:43.543: debug : xenInotifyOpen:458 : Registering with event loop 13:24:43.543: debug : xenInotifyOpen:462 : Failed to add inotify handle, disabling events 13:24:43.543: debug : virConnectRef:1165 : conn=0x183e8160 refs=3 13:24:43.543: debug : xenUnifiedOpen:364 : Activated Xen inotify sub-driver 13:24:43.543: debug : do_open:936 : driver 1 Xen returned SUCCESS 13:24:43.543: debug : do_open:956 : network driver 0 Test returned DECLINED 13:24:43.543: debug : doRemoteOpen:511 : proceeding with name = xen:/// 13:24:43.543: debug : call:6543 : Doing call 66 (nil) 13:24:43.543: debug : call:6613 : We have the buck 66 0x2aaaaaaf0010 0x2aaaaaaf0010 13:24:43.544: debug : processCallRecvLen:6201 : Got length, now need 36 total (32 more) 13:24:43.544: debug : processCalls:6469 : Giving up the buck 66 0x2aaaaaaf0010 (nil) 13:24:43.544: debug : call:6644 : All done with our call 66 (nil) 0x2aaaaaaf0010 13:24:43.544: debug : call:6543 : Doing call 1 (nil) 13:24:43.544: debug : call:6613 : We have the buck 1 0x18412b60 0x18412b60 13:24:43.555: debug : processCallRecvLen:6201 : Got length, now need 28 total (24 more) 13:24:43.555: debug : processCalls:6469 : Giving up the buck 1 0x18412b60 (nil) 13:24:43.555: debug : call:6644 : All done with our call 1 (nil) 0x18412b60 13:24:43.555: debug : doRemoteOpen:822 : Adding Handler for remote events 13:24:43.555: debug : doRemoteOpen:829 : virEventAddHandle failed: No addHandleImpl defined. continuing without events. 13:24:43.555: debug : do_open:956 : network driver 1 remote returned SUCCESS 13:24:43.555: debug : do_open:978 : storage driver 0 Test returned DECLINED 13:24:43.555: debug : do_open:978 : storage driver 1 remote returned SUCCESS 13:24:43.555: debug : do_open:999 : node driver 0 Test returned DECLINED 13:24:43.555: debug : do_open:999 : node driver 1 remote returned SUCCESS 13:24:43.555: debug : virNodeDeviceLookupByName:7123 : conn=0x183e8160, name=0x183e5560 13:24:43.555: debug : call:6543 : Doing call 113 (nil) 13:24:43.556: debug : call:6613 : We have the buck 113 0x18412b60 0x18412b60 13:24:43.556: debug : processCallRecvLen:6201 : Got length, now need 44 total (40 more) 13:24:43.556: debug : processCalls:6469 : Giving up the buck 113 0x18412b60 (nil) 13:24:43.556: debug : call:6644 : All done with our call 113 (nil) 0x18412b60 13:24:43.556: debug : virNodeDeviceDettach:7398 : dev=0x18407960, conn=0x183e8160 13:24:43.556: debug : virNodeDeviceGetXMLDesc:7166 : dev=0x18407960, conn=0x183e8160 13:24:43.556: debug : call:6543 : Doing call 114 (nil) 13:24:43.556: debug : call:6613 : We have the buck 114 0x18412b60 0x18412b60 13:24:43.557: debug : processCallRecvLen:6201 : Got length, now need 352 total (348 more) 13:24:43.557: debug : processCalls:6469 : Giving up the buck 114 0x18412b60 (nil) 13:24:43.557: debug : call:6644 : All done with our call 114 (nil) 0x18412b60 13:24:43.557: debug : pciGetDevice:894 : 10de 026c 0000:00:10.1: initialized 13:24:43.557: debug : virRun:678 : /sbin/modprobe pciback [root@dhcp-66-70-85 ~]# virsh capabilities <capabilities> <host> <cpu> <arch>x86_64</arch> <features> <svm/> </features> </cpu> <migration_features> <live/> <uri_transports> <uri_transport>xenmigr</uri_transport> </uri_transports> </migration_features> <topology> <cells num='1'> <cell id='0'> <cpus num='2'> <cpu id='0'/> <cpu id='1'/> </cpus> </cell> </cells> </topology> </host> <guest> <os_type>xen</os_type> <arch name='x86_64'> <wordsize>64</wordsize> <emulator>/usr/lib64/xen/bin/qemu-dm</emulator> <machine>xenpv</machine> <domain type='xen'> </domain> </arch> </guest> <guest> <os_type>xen</os_type> <arch name='i686'> <wordsize>32</wordsize> <emulator>/usr/lib64/xen/bin/qemu-dm</emulator> <machine>xenpv</machine> <domain type='xen'> </domain> </arch> <features> <pae/> </features> </guest> <guest> <os_type>hvm</os_type> <arch name='i686'> <wordsize>32</wordsize> <emulator>/usr/lib64/xen/bin/qemu-dm</emulator> <loader>/usr/lib/xen/boot/hvmloader</loader> <machine>xenfv</machine> <domain type='xen'> </domain> </arch> <features> <pae/> <nonpae/> <acpi default='on' toggle='yes'/> <apic default='on' toggle='yes'/> </features> </guest> <guest> <os_type>hvm</os_type> <arch name='x86_64'> <wordsize>64</wordsize> <emulator>/usr/lib64/xen/bin/qemu-dm</emulator> <loader>/usr/lib/xen/boot/hvmloader</loader> <machine>xenfv</machine> <domain type='xen'> </domain> </arch> <features> <acpi default='on' toggle='yes'/> <apic default='on' toggle='yes'/> </features> </guest> </capabilities>
Wow, this is bizarre, the last thing printed is 13:24:43.557: debug : virRun:678 : /sbin/modprobe pciback Looking at the code the only way I see this hanging, is if modprobe never exits, which doesn't make much sense either. Can you run 'ps -auxwwf' and see if there are any 'modprobe' processes still running ?
It seems no 'modprobe' process was running. [root@dhcp-66-70-85 libvirt]# ps auxwwf | grep modprobe root 4896 0.0 0.0 61188 724 pts/2 S+ 21:58 0:00 \_ grep modprobe
I'm pretty much out of debugging ideas at this point and I can't reproduce the problem myself.
I retest this issue with the latest package 0.6.3-8 on rhel-5.4(2.6.18-153.el5xen), and also can not reproduce on my side. This could be closed, thanks! [root@dhcp-66-70-85 ~]# virsh nodedev-dettach pci_10de_26c Device pci_10de_26c dettached 22:07:16.720: debug : virNodeDeviceFree:7331 : dev=0x1332c640, conn=0x1330c160 22:07:16.720: debug : virUnrefNodeDevice:871 : unref dev 0x1332c640 pci_10de_26c 1 22:07:16.720: debug : virReleaseNodeDevice:834 : release dev 0x1332c640 pci_10de_26c 22:07:16.720: debug : virReleaseNodeDevice:845 : unref connection 0x1330c160 5 22:07:16.720: debug : virConnectClose:1118 : conn=0x1330c160 22:07:16.720: debug : call:6545 : Doing call 2 (nil) 22:07:16.720: debug : call:6615 : We have the buck 2 0x133380b0 0x133380b0 22:07:16.721: debug : processCallRecvLen:6203 : Got length, now need 28 total (24 more) 22:07:16.721: debug : processCalls:6471 : Giving up the buck 2 0x133380b0 (nil) 22:07:16.721: debug : call:6646 : All done with our call 2 (nil) 0x133380b0 22:07:16.722: debug : virUnrefConnect:210 : unref connection 0x1330c160 4 22:07:16.722: debug : virUnrefConnect:210 : unref connection 0x1330c160 3 22:07:16.723: debug : virUnrefConnect:210 : unref connection 0x1330c160 2 22:07:16.723: debug : virUnrefConnect:210 : unref connection 0x1330c160 1 22:07:16.723: debug : virReleaseConnect:171 : release connection 0x1330c160