Bug 827544

Summary: can't add new netdevs after most recently added netdev is detached
Product: Red Hat Enterprise Linux 6 Reporter: Laine Stump <laine>
Component: libvirtAssignee: Laine Stump <laine>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.3CC: acathrow, dallan, dyasny, dyuan, lersek, mpavlik, mzhan, rwu, whuang, ydu
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 846869 (view as bug list) Environment:
Last Closed: 2012-07-25 17:47:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 846869    
Attachments:
Description Flags
trace of json monitor commands sent to qemu by libvirt none

Description Laine Stump 2012-06-01 18:07:13 UTC
Created attachment 588548 [details]
trace of json monitor commands sent to qemu by libvirt

Description of problem: If the most recently added network device (or the last <interface> listed in the domain config when none have been hotplugged) is detached from a running domain, all further attempts to attach a new network device will fail.


Version-Release number of selected component (if applicable):

  libvirt-0.9.10-21
  qemu-kvm-0.12.1.2-2.295

How reproducible: 100%


Steps to Reproduce:
1. start a guest, any guest
2. virsh attach-interface $guest network no-ip --model virtio \
         --mac 52:54:00:12:34:56
Interface attached successfully

3. virsh detach-interface $guest network --mac 52:54:00:12:34:56
Interface detached successfully
  
4.  virsh attach-interface $guest network no-ip --model virtio \
         --mac 52:54:00:12:34:56

Actual results:

error: Failed to attach interface
error: internal error unable to execute QEMU command 'device_add': Duplicate ID 'net1' for device

Expected results:

Interface attached successfully

Additional info:

It appears that qemu doesn't re-use the device id's (which are called "alias" by libvirt"). According to the attached trace, libvirt is detaching the old device with id "net1", then attempting to add a new device with id "net1". Either qemu needs to recycle device IDs, or libvirt needs to never reissue the same id for any particular domain. The latter would require storing the current "highest alias number" for each type of device in the active XML, so that it would survive restarts of libvirtd.

Comment 1 Laszlo Ersek 2012-06-09 08:19:35 UTC
According to the trace, the netdev_del/netdev_add pair works just fine; indeed do_netdev_del() in the qemu source removes the id:

  int do_netdev_del(Monitor *mon, const QDict *qdict, QObject **ret_data)
  {
      const char *id = qdict_get_str(qdict, "id");
      VLANClientState *vc;

      vc = qemu_find_netdev(id);
      if (!vc) {
          qerror_report(QERR_DEVICE_NOT_FOUND, id);
          return -1;
      }
      qemu_del_vlan_client(vc);
      qemu_opts_del(qemu_opts_find(&qemu_netdev_opts, id)); /* HERE */
      return 0;
  }

do_device_add/do_device_del seem to work differently (device_add is the one failing in the trace).

do_device_del()
  qdev_unplug()
    dev->info->unplug(dev)

I didn't try to track the funcptr, but I guess it doesn't clean up the qemu options. However do_device_add() checks for unicity:

do_device_add()
  qemu_opts_from_qdict()
    qemu_opts_create()
      qemu_opts_find()
      qerror_report(QERR_DUPLICATE_ID, ...) -- reports the cited error

Upstream qemu lacks do_device_del() completely, only an unused prototype exists in hw/qdev.h. Instead it has qmp_device_del() (and hmp_device_del(), calling it), but the underlying qdev_unplug() doesn't seem to be very different. This could be a problem in upstream qemu as well (ie. device_del not pruning the qemu_find_opts("device") QemuOptsList, that is, "qemu_device_opts").

Comment 2 Laine Stump 2012-06-09 08:28:52 UTC
The problem is narrower than I initially thought. The guest where I saw the problem was RHEL5. I tried the same operation on RHEL6 and WinXP guests and they had no problem re-using the id of a previously detached device.

I also noticed that this guest doesn't show any signs that the device has been added even the first time - nothing in dmesg, no new device showing up in ifconfig -a.

I had thought that adding and removing devices was happening at a low enough level that it didn't matter what OS was running on the guest, or whether or not the guest acknowledged it. Is this not the case? Beyond that, is there some known issue with RHEL5 and hotplugging of network devices, or is my guest somehow broken?

Comment 3 Laine Stump 2012-07-25 17:47:44 UTC
From discussion with qemu people, I've learned that that qemu *does* recycle the device alias names, but not until it receives confirmation from the guest that it really has released the hardware and that pci hotplug support is more or less non-existent in RHEL5. So my choice of test OS was unfortunate, since qemu is waiting for the guest to release the hardware, but the guest doesn't even know that it had it to begin with (and wouldn't know how to release it if it did).

On the other hand, when I used a RHEL6 guest, the device alias *is* properly recycled and can be re-used.

Since the problem is only manifests itself as an inability to hotplug any more devices, and it's only a problem when using a guest that doesn't support hotplug anyway, this is (mostly) a non-issue. 

The one issue is that there appears to be a race, since libvirt makes the alias available for re-use immediately, while qemu won't recycle it until the guest has notified qemu that it's completely finished, and apparently that happens asynchronous to the detach command, and there is no way for libvirt to be notified of that event. Until that theoretical race is actually witnessed on a guest that properly supports hotplug, I think we can close this as NOTABUG.

Comment 4 Dave Allan 2012-07-25 18:22:39 UTC
(In reply to comment #3)
> The one issue is that there appears to be a race, since libvirt makes the
> alias available for re-use immediately, while qemu won't recycle it until
> the guest has notified qemu that it's completely finished, and apparently
> that happens asynchronous to the detach command, and there is no way for
> libvirt to be notified of that event. Until that theoretical race is
> actually witnessed on a guest that properly supports hotplug, I think we can
> close this as NOTABUG.

Can you open a BZ against qemu for an event in this case?  There are other hotplug cases which behave similarly, and qemu will be providing an event so that libvirt can tell when the operation's complete.

Comment 5 Laine Stump 2012-08-05 23:34:52 UTC
*** Bug 844622 has been marked as a duplicate of this bug. ***