Bug 687993

Summary: libvirt leaks a pci configfd file into qemu on pci hostdev hotplug failure
Product: Red Hat Enterprise Linux 6 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: ajia, dallan, eblake, jdenemar, mzhan, syeghiay, yoyzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.8.7-14.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 13:29:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Eric Blake 2011-03-15 23:59:57 UTC
Description of problem:
In qemu_hotplug.c:qemuDomainAttachHostPciDevice, it is possible for libvirt to open a PCI device configfd file and pass that to qemu with SCM_RIGHTS in qemuMonitorSendFileHandle.  Normally, this fd is then used by qemu with the subsequent qemuMonitorAddDevice, and qemu closes the fd when the hostdev is later detached from the domain.  Therefore, libvirt is able to do managed hostdev, and guarantee that the host will not be using the device for as long as qemu owns the configfd.

However, if something goes wrong during the attach, then the fd remains open in qemu indefinitely.  At least any subsequent getfd monitor command with the same alias will overwrite the fd, so there is no risk of fd exhaustion in qemu by repeating a loop that triggers device attach failure.  However, because libvirt returns the managed PCI device back to the host on failure to hotplug, it means that the host can now be actively using the PCI device while qemu still owns the configfd, and that a bug in qemu could result in resetting a PCI device in use by the host.

Version-Release number of selected component (if applicable):
libvirt-0.8.7-12.el6

How reproducible:
not sure how easy it is to trigger hotplug failure in real life (perhaps by specifying XML with an impossible target PCI address?) - I triggered it under gdb.

Steps to Reproduce:
1. attach gdb to libvirtd, and put a breakpoint in qemu_hotplug.c:qemuDomainAttachHostPciDevice
2. using virsh or virt-manager, try to hot-plug a PCI hostdev device
3. in gdb, after the configfd is opened and passed to qemu, but before the device is hot-plugged, fake a failure.  I did this by stepping past the line of devstr = qemuBuildPCIHostdevDevStr, then issuing 'p devstr=""', then continuing
4. ls -l /proc/<pid-of-qemu>/fd shows that the fd file is still opened rw by qemu
  
Actual results:
step 3 reported hotplug failure
step 4 included this line
lrwx------. 1 root root 64 Mar 15 17:17 20 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:0a:0a.0/config


Expected results:
step 3 should report hotplug failure, but also issue the qemu closefd monitor command to clean up the config fd
step 4 should not have any config files for the aborted hostdev hotplug open

Additional info:

Comment 1 Eric Blake 2011-03-16 00:00:53 UTC
Fixed by this upstream patch:
https://www.redhat.com/archives/libvir-list/2011-March/msg00714.html

Comment 5 Alex Jia 2011-03-23 08:31:51 UTC
I can reproduce the bug on rhel6 with libvirt-0.8.7-12.el6.x86_64:

The following gdb debugging information:
......
(gdb) s
qemuDomainAttachHostPciDevice (driver=0x710ba0, vm=0x7695b0, hostdev=0x7fffdc010280, qemuCmdFlags=10214162292588674414)
    at qemu/qemu_hotplug.c:868
868             if (!virDomainObjIsActive(vm)) {
(gdb) s
874             if (!(devstr = qemuBuildPCIHostdevDevStr(hostdev, configfd_name,
(gdb) p devstr
$24 = 0x0
(gdb) p devstr=""
$25 = 0x7fffdc014520 ""
(gdb) s
qemuBuildPCIHostdevDevStr (dev=0x7fffdc010280, configfd=0x7fffdc00b7a0 "fd-hostdev0", qemuCmdFlags=10214162292588674414)
    at qemu/qemu_command.c:1960
......

And open another terminal to check whether config file exists:
# ps -ef|grep qemu
 qemu      7546     1  2 14:49 ?        00:00:56 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name vr-rhel6-x86_64-kvm -uuid fbe7bfbe-3e92-2e86-c678-9524e45eed8b -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vr-rhel6-x86_64-kvm.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot c -drive file=/var/lib/libvirt/images/vr-rhel6-x86_64-kvm,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=23,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:4f:83:c7,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device pci-assign,host=00:19.0,id=hostdev0,configfd=24,bus=pci.0,addr=0x6 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
 root     10994  6055  0 15:33 pts/3    00:00:00 grep qemu

# ls -l /proc/7546/fd | grep config
lrwx------. 1 qemu qemu 64 Mar 23 14:51 15 -> /sys/devices/pci0000:00/0000:00:19.0/config


The following testing bases on libvirt-0.8.7-14.el6.x86_64:
(2.6.32-120.el6.x86_64, qemu-kvm-0.12.1.2-2.152.el6.x86_64)

......
(gdb) s
850             if (!(devstr = qemuBuildPCIHostdevDevStr(hostdev, configfd_name,
(gdb) p devstr=""
$12 = 0x7fffd409f340 ""
(gdb) s
qemuBuildPCIHostdevDevStr (dev=0x7fffd40a0400, configfd=0x7fffd409f590 "fd-hostdev0", qemuCmdFlags=10214162292588674414)
    at qemu/qemu_command.c:1960
......
(gdb) finish
Run till exit from #0  virFree (ptrptr=0x7fffe97fa948) at util/memory.c:308
qemuDomainAttachHostPciDevice (driver=0x757da0, vm=0x766e00, hostdev=0x7fffd40a0400, qemuCmdFlags=10214162288428974080)
    at qemu/qemu_hotplug.c:878
878         VIR_FORCE_CLOSE(configfd);
(gdb) s
virClose (fdptr=0x7fffe97fa95c, preserve_errno=true) at util/files.c:32
32      {
(gdb) finish
Run till exit from #0  virClose (fdptr=0x7fffe97fa95c, preserve_errno=true) at util/files.c:32
0x00000000004661ba in qemuDomainAttachHostPciDevice (driver=0x757da0, vm=0x766e00, hostdev=0x7fffd40a0400,
    qemuCmdFlags=10214162288428974080) at qemu/qemu_hotplug.c:878
878         VIR_FORCE_CLOSE(configfd);
Value returned is $23 = 0
......

And open another terminal to check whether config file exists:
# ps -ef|grep qemu
qemu     15350     1  4 15:53 ?        00:00:26 /usr/libexec/qemu-kvm -S -M rhel6.1.0 -enable-kvm -m 512 -smp 1,sockets=1,cores=1,threads=1 -name vr-rhel6-x86_64-kvm -uuid fbe7bfbe-3e92-2e86-c678-9524e45eed8b -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/vr-rhel6-x86_64-kvm.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -boot c -drive file=/var/lib/libvirt/images/vr-rhel6-x86_64-kvm,if=none,id=drive-ide0-0-0,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=22,id=hostnet0 -device e1000,netdev=hostnet0,id=net0,mac=52:54:00:4f:83:c7,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus -device AC97,id=sound0,bus=pci.0,addr=0x4 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
root     16261  6055  0 16:03 pts/3    00:00:00 grep qemu
You have new mail in /var/spool/mail/root
# ls -l /proc/15350/fd | grep config
# ls -l /proc/15350/fd
total 0
lrwx------. 1 qemu qemu 64 Mar 23 16:03 0 -> /dev/null
l-wx------. 1 qemu qemu 64 Mar 23 16:03 1 -> /var/log/libvirt/qemu/vr-rhel6-x86_64-kvm.log
lrwx------. 1 qemu qemu 64 Mar 23 16:03 10 -> anon_inode:[signalfd]
lrwx------. 1 qemu qemu 64 Mar 23 16:03 11 -> anon_inode:kvm-vcpu
lrwx------. 1 qemu qemu 64 Mar 23 16:03 12 -> socket:[82073]
lrwx------. 1 qemu qemu 64 Mar 23 16:03 13 -> socket:[82071]
lrwx------. 1 qemu qemu 64 Mar 23 16:03 14 -> anon_inode:[eventfd]
lrwx------. 1 qemu qemu 64 Mar 23 16:03 15 -> anon_inode:[eventfd]
lrwx------. 1 qemu qemu 64 Mar 23 16:03 16 -> anon_inode:[signalfd]
lrwx------. 1 qemu qemu 64 Mar 23 16:03 17 -> socket:[83536]
lrwx------. 1 qemu qemu 64 Mar 23 16:03 18 -> socket:[85334]
l-wx------. 1 qemu qemu 64 Mar 23 16:03 2 -> /var/log/libvirt/qemu/vr-rhel6-x86_64-kvm.log
lrwx------. 1 qemu qemu 64 Mar 23 16:03 22 -> /dev/net/tun
lrwx------. 1 qemu qemu 64 Mar 23 16:03 3 -> socket:[82055]
lrwx------. 1 qemu qemu 64 Mar 23 16:03 4 -> /dev/ptmx
lrwx------. 1 qemu qemu 64 Mar 23 16:03 5 -> /dev/kvm
lrwx------. 1 qemu qemu 64 Mar 23 16:03 6 -> anon_inode:kvm-vm
lr-x------. 1 qemu qemu 64 Mar 23 16:03 7 -> pipe:[82066]
l-wx------. 1 qemu qemu 64 Mar 23 16:03 8 -> pipe:[82066]
lrwx------. 1 qemu qemu 64 Mar 23 16:03 9 -> /var/lib/libvirt/images/vr-rhel6-x86_64-kvm

config file has been cleaned up.


However, I can't see any hotplug failure information, in the above 2 testing, I just catch the following error information:
VIR_WARN0("Unable to restore host device labelling on hotplug fail")

Comment 8 Alex Jia 2011-04-01 03:42:27 UTC
Hi Dave and Eric,
According to Eric advice, the test result should meet the following conditions:
1. should report hotplug failure
2. should issue the qemu closefd monitor command to clean up the config fd
3. should clean up config fd

For 1, I can catch 'VIR_WARN0("Unable to restore host device labelling on
hotplug fail")' by gdb, Is it a expected hotplug failure? and although I input 
'p devstr=" "' in gdb, finally, the NICs device still can be successfully
assigned to guest, it seems 'p devstr=" "' is invalid:
......
(gdb) s
qemuMonitorAddDeviceWithFd (mon=0x869960, devicestr=0x7fffdc001470
"pci-assign,host=00:19.0,id=hostdev0,configfd=fd-hostdev0,bus=pci.0,addr=0x6",
fd=26, fdname=0x7fffdc002530 "fd-hostdev0")
    at qemu/qemu_monitor.c:1925
......
(gdb) s
qemuMonitorJSONMakeCommand (cmdname=0x4bc02c "device_add") at
qemu/qemu_monitor_json.c:367
......

I think the correct result should be libvirt raises a hotplug failure
information, and don't assign the device to guest.

For 2, I guess the programming should call qemuMonitorCloseFileHandle function,
but I haven't found it in qemuDomainAttachHostPciDevice function, is it a
expected result?

For 3, It's fine for me.


Alex

Comment 9 Eric Blake 2011-04-21 22:49:44 UTC
(In reply to comment #5)
> I can reproduce the bug on rhel6 with libvirt-0.8.7-12.el6.x86_64:
> 
> The following gdb debugging information:
> ......
> (gdb) s
> qemuDomainAttachHostPciDevice (driver=0x710ba0, vm=0x7695b0,
> hostdev=0x7fffdc010280, qemuCmdFlags=10214162292588674414)
>     at qemu/qemu_hotplug.c:868
> 868             if (!virDomainObjIsActive(vm)) {
> (gdb) s
> 874             if (!(devstr = qemuBuildPCIHostdevDevStr(hostdev,
> configfd_name,
> (gdb) p devstr
> $24 = 0x0
> (gdb) p devstr=""
> $25 = 0x7fffdc014520 ""

Remember, you're debugging a binary compiled at -O2.  You can't set devstr to "" until after qemuBuildPCIHostdevDevStr has completed, or you will just lose your debug modification.

When I stepped through, I saw:

850	        if (!(devstr = qemuBuildPCIHostdevDevStr(hostdev, configfd_name,
(gdb) 
854	        qemuDomainObjEnterMonitorWithDriver(driver, vm);
(gdb) p devstr
$1 = 0x7f97c800b970 "pci-assign,host=0a:0a.0,id=hostdev0,configfd=fd-hostdev0,bus=pci.0,addr=0x7"
(gdb) p devstr=""
$2 = 0x7f97c8001e10 ""
(gdb) n
855	        ret = qemuMonitorAddDeviceWithFd(priv->mon, devstr,
(gdb) 
857	        qemuDomainObjExitMonitorWithDriver(driver, vm);
(gdb) p ret
$4 = -1

> # ls -l /proc/7546/fd | grep config
> lrwx------. 1 qemu qemu 64 Mar 23 14:51 15 ->
> /sys/devices/pci0000:00/0000:00:19.0/config

Your test still sees a config file because you didn't kill the devstr in the right place; it got overwritten and you ended up still passing a valid monitor command to qemu.

Comment 10 Alex Jia 2011-04-22 06:39:07 UTC
Hi Eric,
Thanks for your remember and very nice comment. now I can get above debugging information by gdb on rhel6.1 snapshot4(2.6.32-131.0.1.el6.x86_64) with libvirt-0.8.7-18.el6.x86_64, so change the bug status to VERIFIED.


Additional information:
......
(gdb) s
qemuMonitorJSONAddDevice (mon=0x23578e0, devicestr=0x7fa55002d380 "") at qemu/qemu_monitor_json.c:2360
2360	{
(gdb) fin
Run till exit from #0  qemuMonitorJSONAddDevice (mon=0x23578e0, devicestr=0x7fa55002d380 "") at qemu/qemu_monitor_json.c:2360
13:14:18.003: 30640: error : qemuMonitorJSONCheckError:317 : internal error unable to execute QEMU command 'device_add': Parameter 'driver' expects a driver name
0x000000000046affb in qemuMonitorAddDeviceWithFd (mon=0x23578e0, devicestr=0x7fa55002d380 "", fd=26, fdname=0x7fa55015f720 "fd-hostdev0") at qemu/qemu_monitor.c:1941
1941	        ret = qemuMonitorJSONAddDevice(mon, devicestr);
Value returned is $30 = -1
......
(gdb) fin
Run till exit from #0  qemuMonitorCloseFileHandle (mon=0x23578e0, fdname=0x7fa55015f720 "fd-hostdev0") at qemu/qemu_monitor.c:1703
0x000000000046b012 in qemuMonitorAddDeviceWithFd (mon=0x23578e0, devicestr=0x7fa55002d380 "", fd=26, fdname=0x7fa55015f720 "fd-hostdev0") at qemu/qemu_monitor.c:1946
1946	        if (qemuMonitorCloseFileHandle(mon, fdname) < 0)
Value returned is $32 = 0
......
(gdb) s
qemuDomainAttachHostPciDevice (driver=0x21f2090, vm=0x220d800, hostdev=0x7fa5501dec50, qemuCmdFlags=10214162292588674414) at qemu/qemu_hotplug.c:857
857	        qemuDomainObjExitMonitorWithDriver(driver, vm);
......
(gdb) s
857	        qemuDomainObjExitMonitorWithDriver(driver, vm);
(gdb) p ret
$35 = -1



# ls -l /proc/$(pidof qemu-kvm)/fd | grep config
# 

In here, config file indeed doesn't exist, and the PCI device hasn't been 
assign the guest.


Another question, whether libvirt need to automatically reattach the PCI device to host, I saw the PCI device is still put into pci-stub driver:

# readlink -f /sys/bus/pci/devices/0000\:00\:19.0/driver/
/sys/bus/pci/drivers/pci-stub

So, Users can't directly use the PCI device, they need to manually return the PCI device to host:

# virsh nodedev-reattach pci_0000_00_19_0
Device pci_0000_00_19_0 re-attached

# readlink -f /sys/bus/pci/devices/0000\:00\:19.0/driver/
/sys/bus/pci/drivers/e1000e


Alex

Comment 13 errata-xmlrpc 2011-05-19 13:29:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0596.html