Bug 504542 - Using 'virsh nodedev-dettach' will hang when dettach the audio device on Xen
Summary: Using 'virsh nodedev-dettach' will hang when dettach the audio device on Xen
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: libvirt
Version: 5.4
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Daniel Berrangé
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-06-08 03:13 UTC by Nan Zhang
Modified: 2009-12-14 21:17 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-12 11:28:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Nan Zhang 2009-06-08 03:13:06 UTC
Description of problem:
Cannot terminate the process when hang with running dettach device on Xen.

Version-Release number of selected component (if applicable):
rhel-5.4(Xen)
libvirt 0.6.3-6.el5

How reproducible:
Always

Steps to Reproduce:
[root@dhcp-66-70-85 ~]# virsh nodedev-list --tree

[root@dhcp-66-70-85 ~]# virsh nodedev-dumpxml pci_10de_26c
<device>
  <name>pci_10de_26c</name>
  <parent>computer</parent>
  <capability type='pci'>
    <domain>0</domain>
    <bus>0</bus>
    <slot>16</slot>
    <function>1</function>
    <product id='0x026c'>MCP51 High Definition Audio</product>
    <vendor id='0x10de'>nVidia Corporation</vendor>
  </capability>
</device>


[root@dhcp-66-70-85 ~]# virsh nodedev-dettach pci_10de_26c

[root@dhcp-66-70-85 ~]# ps -ef | grep virsh
root      4170  4123  0 10:56 pts/1    00:00:00 virsh nodedev-dettach pci_10de_26c
root      4227  4209  0 10:56 pts/2    00:00:00 grep virsh
[root@dhcp-66-70-85 ~]# kill -9 4170
[root@dhcp-66-70-85 ~]# ps -ef | grep virsh
root      4170  4123  0 10:56 pts/1    00:00:00 virsh nodedev-dettach pci_10de_26c
root      4230  4209  0 10:56 pts/2    00:00:00 grep virsh
[root@dhcp-66-70-85 ~]#
  
Actual results:
Hang with the 'nodedev-dettach' command.

Expected results:
The node device can be dettached or give some errors.

Additional info:
No this issue on KVM.

Comment 1 Daniel Berrangé 2009-06-08 10:43:32 UTC
Can you install the matching 'libvirt-debuginfo' RPM, and then when virsh hangs, attach to it with GDB and obtain a stack trace, eg


  # gdb `pgrep virsh`
  (gdb) backtrace

Comment 2 Nan Zhang 2009-06-08 11:15:42 UTC
Can not obtain stack?

[root@dhcp-66-70-85 ~]# pgrep virsh
4437
[root@dhcp-66-70-85 ~]# gdb `pgrep virsh`
GNU gdb Fedora (6.8-35.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
4437: No such file or directory.
(gdb) backtrace
No stack.
(gdb)

Comment 3 Daniel Berrangé 2009-06-08 12:30:42 UTC
I really hate GDB sometimes, try adding in the command name too when launching it, then it should give a backtrace, eg launch it with


  gdb /usr/bin/virsh `pgrep virsh`

Comment 4 Nan Zhang 2009-06-08 13:13:34 UTC
Sometimes you have to use it, got the backtrace :)

[root@dhcp-66-70-85 ~]# gdb /usr/bin/virsh `pgrep virsh`
GNU gdb Fedora (6.8-35.el5)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Attaching to program: /usr/bin/virsh, process 4699
Reading symbols from /usr/lib64/libvirt.so.0...Reading symbols from /usr/lib/debug/usr/lib64/libvirt.so.0.6.3.debug...done.
done.
Loaded symbols for /usr/lib64/libvirt.so.0
Reading symbols from /usr/lib64/libxml2.so.2...done.
Loaded symbols for /usr/lib64/libxml2.so.2
Reading symbols from /usr/lib64/libz.so.1...done.
Loaded symbols for /usr/lib64/libz.so.1
Reading symbols from /lib64/libm.so.6...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libselinux.so.1...done.
Loaded symbols for /lib64/libselinux.so.1
Reading symbols from /usr/lib64/libnuma.so.1...done.
Loaded symbols for /usr/lib64/libnuma.so.1
Reading symbols from /usr/lib64/libgnutls.so.13...done.
Loaded symbols for /usr/lib64/libgnutls.so.13
Reading symbols from /usr/lib64/libsasl2.so.2...done.
Loaded symbols for /usr/lib64/libsasl2.so.2
Reading symbols from /usr/lib64/libxenstore.so.3.0...done.
Loaded symbols for /usr/lib64/libxenstore.so.3.0
Reading symbols from /usr/lib64/libreadline.so.5...done.
Loaded symbols for /usr/lib64/libreadline.so.5
Reading symbols from /usr/lib64/libncurses.so.5...done.
Loaded symbols for /usr/lib64/libncurses.so.5
Reading symbols from /lib64/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 0x2b109df3cf20 (LWP 4699)]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libc.so.6...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/libdl.so.2...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libsepol.so.1...done.
Loaded symbols for /lib64/libsepol.so.1
Reading symbols from /lib64/ld-linux-x86-64.so.2...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /usr/lib64/libgcrypt.so.11...done.
Loaded symbols for /usr/lib64/libgcrypt.so.11
Reading symbols from /usr/lib64/libgpg-error.so.0...done.
Loaded symbols for /usr/lib64/libgpg-error.so.0
Reading symbols from /lib64/libresolv.so.2...done.
Loaded symbols for /lib64/libresolv.so.2
Reading symbols from /lib64/libcrypt.so.1...done.
Loaded symbols for /lib64/libcrypt.so.1
0x000000324d8cad5f in poll () from /lib64/libc.so.6
(gdb)

Comment 5 Daniel Berrangé 2009-06-08 13:22:10 UTC
You seem to have missed off the actual stack trace in that last comment, only pasting the GDB startup-noise. Can you type 'backtrace' now you have a GDB prompt & provide the result here.

Comment 6 Nan Zhang 2009-06-08 13:27:38 UTC
Oh, sorry, missing the backtrace info.

(gdb) backtrace
#0  0x000000324d8cad5f in poll () from /lib64/libc.so.6
#1  0x0000003260e48007 in call (conn=<value optimized out>, priv=<value optimized out>, flags=<value optimized out>, 
    proc_nr=<value optimized out>, args_filter=<value optimized out>, args=<value optimized out>, ret_filter=Could not find the frame base for "call".
) at remote_internal.c:6407
#2  0x0000003260e48a58 in remoteNodeDeviceDettach (dev=<value optimized out>) at remote_internal.c:4931
#3  0x0000003260e243e8 in virNodeDeviceDettach (dev=<value optimized out>) at libvirt.c:7409
#4  0x000000000040ae85 in cmdNodeDeviceDettach (ctl=0x7fffa4f350a0, cmd=<value optimized out>) at virsh.c:4759
#5  0x0000000000410b40 in vshCommandRun (ctl=0x7fffa4f350a0, cmd=0x19bfa5a0) at virsh.c:6548
#6  0x0000000000411487 in main (argc=1, argv=0x7fffa4f35208) at virsh.c:7500
(gdb)

Comment 7 Bill Burns 2009-06-08 14:25:04 UTC
assigned for triage.

Comment 8 Daniel Berrangé 2009-06-08 14:40:05 UTC
Hmm, that stack trace does not appear to be calling into the Xen driver at all all, rather using the remote driver (aka QEMU).

Can you provide libvirt logging info

  # LIBVIRT_DEBUG=1 virsh nodedev-dettach pci_10de_26c


And details of the host capabilities when on the Xen machine

  # virsh capabilities

Comment 10 Nan Zhang 2009-06-09 02:22:36 UTC
Sorry, please ignore the Comment #9.

[root@dhcp-66-70-85 ~]# LIBVIRT_DEBUG=1 virsh nodedev-dettach pci_10de_26c
13:24:43.534: debug : virInitialize:290 : register drivers
13:24:43.534: debug : virRegisterDriver:667 : registering Test as driver 0
13:24:43.534: debug : virRegisterNetworkDriver:567 : registering Test as network driver 0
13:24:43.534: debug : virRegisterStorageDriver:598 : registering Test as storage driver 0
13:24:43.534: debug : virRegisterDeviceMonitor:629 : registering Test as device driver 0
13:24:43.534: debug : xenHypervisorInit:1922 : Using new hypervisor call: 30001

13:24:43.534: debug : xenHypervisorInit:1991 : Using hypervisor call v2, sys ver3 dom ver5

13:24:43.534: debug : virRegisterDriver:667 : registering Xen as driver 1
13:24:43.534: debug : virRegisterDriver:667 : registering remote as driver 2
13:24:43.534: debug : virRegisterNetworkDriver:567 : registering remote as network driver 1
13:24:43.534: debug : virRegisterStorageDriver:598 : registering remote as storage driver 1
13:24:43.534: debug : virRegisterDeviceMonitor:629 : registering remote as device driver 1
13:24:43.534: debug : virConnectOpenAuth:1100 : name=(null), auth=0x326109b360, flags=0
13:24:43.534: debug : do_open:922 : no name, allowing driver auto-select
13:24:43.534: debug : do_open:930 : trying driver 0 (Test) ...
13:24:43.534: debug : do_open:936 : driver 0 Test returned DECLINED
13:24:43.534: debug : do_open:930 : trying driver 1 (Xen) ...
13:24:43.534: debug : xenUnifiedOpen:295 : Trying hypervisor sub-driver
13:24:43.534: debug : xenUnifiedOpen:298 : Activated hypervisor sub-driver
13:24:43.534: debug : xenUnifiedOpen:306 : Trying XenD sub-driver
13:24:43.536: debug : xenUnifiedOpen:309 : Activated XenD sub-driver
13:24:43.536: debug : xenUnifiedOpen:315 : Trying XM sub-driver
13:24:43.536: debug : xenUnifiedOpen:318 : Activated XM sub-driver
13:24:43.536: debug : xenUnifiedOpen:322 : Trying XS sub-driver
13:24:43.537: debug : xenStoreOpen:346 : Failed to add event handle, disabling events

13:24:43.537: debug : xenUnifiedOpen:325 : Activated XS sub-driver
13:24:43.543: debug : xenUnifiedOpen:361 : Trying Xen inotify sub-driver
13:24:43.543: debug : xenInotifyOpen:439 : Adding a watch on /etc/xen
13:24:43.543: debug : xenInotifyOpen:451 : Building initial config cache
13:24:43.543: debug : xenXMConfigCacheAddFile:387 : Adding file /etc/xen/foo1
13:24:43.543: debug : xenXMConfigCacheAddFile:465 : Added config foo1 /etc/xen/foo1
13:24:43.543: debug : xenXMConfigCacheAddFile:387 : Adding file /etc/xen/demo
13:24:43.543: debug : xenXMConfigCacheAddFile:465 : Added config demo /etc/xen/demo
13:24:43.543: debug : xenInotifyOpen:458 : Registering with event loop
13:24:43.543: debug : xenInotifyOpen:462 : Failed to add inotify handle, disabling events
13:24:43.543: debug : virConnectRef:1165 : conn=0x183e8160 refs=3
13:24:43.543: debug : xenUnifiedOpen:364 : Activated Xen inotify sub-driver
13:24:43.543: debug : do_open:936 : driver 1 Xen returned SUCCESS
13:24:43.543: debug : do_open:956 : network driver 0 Test returned DECLINED
13:24:43.543: debug : doRemoteOpen:511 : proceeding with name = xen:///
13:24:43.543: debug : call:6543 : Doing call 66 (nil)
13:24:43.543: debug : call:6613 : We have the buck 66 0x2aaaaaaf0010 0x2aaaaaaf0010
13:24:43.544: debug : processCallRecvLen:6201 : Got length, now need 36 total (32 more)
13:24:43.544: debug : processCalls:6469 : Giving up the buck 66 0x2aaaaaaf0010 (nil)
13:24:43.544: debug : call:6644 : All done with our call 66 (nil) 0x2aaaaaaf0010
13:24:43.544: debug : call:6543 : Doing call 1 (nil)
13:24:43.544: debug : call:6613 : We have the buck 1 0x18412b60 0x18412b60
13:24:43.555: debug : processCallRecvLen:6201 : Got length, now need 28 total (24 more)
13:24:43.555: debug : processCalls:6469 : Giving up the buck 1 0x18412b60 (nil)
13:24:43.555: debug : call:6644 : All done with our call 1 (nil) 0x18412b60
13:24:43.555: debug : doRemoteOpen:822 : Adding Handler for remote events
13:24:43.555: debug : doRemoteOpen:829 : virEventAddHandle failed: No addHandleImpl defined. continuing without events.
13:24:43.555: debug : do_open:956 : network driver 1 remote returned SUCCESS
13:24:43.555: debug : do_open:978 : storage driver 0 Test returned DECLINED
13:24:43.555: debug : do_open:978 : storage driver 1 remote returned SUCCESS
13:24:43.555: debug : do_open:999 : node driver 0 Test returned DECLINED
13:24:43.555: debug : do_open:999 : node driver 1 remote returned SUCCESS
13:24:43.555: debug : virNodeDeviceLookupByName:7123 : conn=0x183e8160, name=0x183e5560
13:24:43.555: debug : call:6543 : Doing call 113 (nil)
13:24:43.556: debug : call:6613 : We have the buck 113 0x18412b60 0x18412b60
13:24:43.556: debug : processCallRecvLen:6201 : Got length, now need 44 total (40 more)
13:24:43.556: debug : processCalls:6469 : Giving up the buck 113 0x18412b60 (nil)
13:24:43.556: debug : call:6644 : All done with our call 113 (nil) 0x18412b60
13:24:43.556: debug : virNodeDeviceDettach:7398 : dev=0x18407960, conn=0x183e8160
13:24:43.556: debug : virNodeDeviceGetXMLDesc:7166 : dev=0x18407960, conn=0x183e8160
13:24:43.556: debug : call:6543 : Doing call 114 (nil)
13:24:43.556: debug : call:6613 : We have the buck 114 0x18412b60 0x18412b60
13:24:43.557: debug : processCallRecvLen:6201 : Got length, now need 352 total (348 more)
13:24:43.557: debug : processCalls:6469 : Giving up the buck 114 0x18412b60 (nil)
13:24:43.557: debug : call:6644 : All done with our call 114 (nil) 0x18412b60
13:24:43.557: debug : pciGetDevice:894 : 10de 026c 0000:00:10.1: initialized
13:24:43.557: debug : virRun:678 : /sbin/modprobe pciback


[root@dhcp-66-70-85 ~]# virsh capabilities 
<capabilities>

  <host>
    <cpu>
      <arch>x86_64</arch>
      <features>
        <svm/>
      </features>
    </cpu>
    <migration_features>
      <live/>
      <uri_transports>
        <uri_transport>xenmigr</uri_transport>
      </uri_transports>
    </migration_features>
    <topology>
      <cells num='1'>
        <cell id='0'>
          <cpus num='2'>
            <cpu id='0'/>
            <cpu id='1'/>
          </cpus>
        </cell>
      </cells>
    </topology>
  </host>

  <guest>
    <os_type>xen</os_type>
    <arch name='x86_64'>
      <wordsize>64</wordsize>
      <emulator>/usr/lib64/xen/bin/qemu-dm</emulator>
      <machine>xenpv</machine>
      <domain type='xen'>
      </domain>
    </arch>
  </guest>

  <guest>
    <os_type>xen</os_type>
    <arch name='i686'>
      <wordsize>32</wordsize>
      <emulator>/usr/lib64/xen/bin/qemu-dm</emulator>
      <machine>xenpv</machine>
      <domain type='xen'>
      </domain>
    </arch>
    <features>
      <pae/>
    </features>
  </guest>

  <guest>
    <os_type>hvm</os_type>
    <arch name='i686'>
      <wordsize>32</wordsize>
      <emulator>/usr/lib64/xen/bin/qemu-dm</emulator>
      <loader>/usr/lib/xen/boot/hvmloader</loader>
      <machine>xenfv</machine>
      <domain type='xen'>
      </domain>
    </arch>
    <features>
      <pae/>
      <nonpae/>
      <acpi default='on' toggle='yes'/>
      <apic default='on' toggle='yes'/>
    </features>
  </guest>

  <guest>
    <os_type>hvm</os_type>
    <arch name='x86_64'>
      <wordsize>64</wordsize>
      <emulator>/usr/lib64/xen/bin/qemu-dm</emulator>
      <loader>/usr/lib/xen/boot/hvmloader</loader>
      <machine>xenfv</machine>
      <domain type='xen'>
      </domain>
    </arch>
    <features>
      <acpi default='on' toggle='yes'/>
      <apic default='on' toggle='yes'/>
    </features>
  </guest>

</capabilities>

Comment 11 Daniel Berrangé 2009-06-09 09:27:44 UTC
Wow, this is bizarre, the last thing printed is

13:24:43.557: debug : virRun:678 : /sbin/modprobe pciback

Looking at the code the only way I see this hanging, is if modprobe never exits, which doesn't make much sense either. Can you run 'ps -auxwwf' and see if there are any 'modprobe' processes still running ?

Comment 12 Nan Zhang 2009-06-09 11:19:49 UTC
It seems no 'modprobe' process was running.

[root@dhcp-66-70-85 libvirt]# ps auxwwf | grep modprobe
root      4896  0.0  0.0  61188   724 pts/2    S+   21:58   0:00          \_ grep modprobe

Comment 13 Daniel Berrangé 2009-06-12 10:52:15 UTC
I'm pretty much out of debugging ideas at this point and I can't reproduce the problem myself.

Comment 14 Nan Zhang 2009-06-12 11:16:26 UTC
I retest this issue with the latest package 0.6.3-8 on rhel-5.4(2.6.18-153.el5xen), and also can not reproduce on my side. This could be closed, thanks!

[root@dhcp-66-70-85 ~]# virsh nodedev-dettach pci_10de_26c
Device pci_10de_26c dettached
22:07:16.720: debug : virNodeDeviceFree:7331 : dev=0x1332c640, conn=0x1330c160
22:07:16.720: debug : virUnrefNodeDevice:871 : unref dev 0x1332c640 pci_10de_26c 1
22:07:16.720: debug : virReleaseNodeDevice:834 : release dev 0x1332c640 pci_10de_26c
22:07:16.720: debug : virReleaseNodeDevice:845 : unref connection 0x1330c160 5

22:07:16.720: debug : virConnectClose:1118 : conn=0x1330c160
22:07:16.720: debug : call:6545 : Doing call 2 (nil)
22:07:16.720: debug : call:6615 : We have the buck 2 0x133380b0 0x133380b0
22:07:16.721: debug : processCallRecvLen:6203 : Got length, now need 28 total (24 more)
22:07:16.721: debug : processCalls:6471 : Giving up the buck 2 0x133380b0 (nil)
22:07:16.721: debug : call:6646 : All done with our call 2 (nil) 0x133380b0
22:07:16.722: debug : virUnrefConnect:210 : unref connection 0x1330c160 4
22:07:16.722: debug : virUnrefConnect:210 : unref connection 0x1330c160 3
22:07:16.723: debug : virUnrefConnect:210 : unref connection 0x1330c160 2
22:07:16.723: debug : virUnrefConnect:210 : unref connection 0x1330c160 1
22:07:16.723: debug : virReleaseConnect:171 : release connection 0x1330c160


Note You need to log in before you can comment on or make changes to this bug.