Created attachment 1577540 [details] ENGINE and VDSM logs Description of problem: ------------------------ Given I have a VM which is 'down' with a plugged virtIO interface bridged to the cluster management network When I start the VM via REST and perform a hot-unplug of the interface Then the unplug fails when VDSM throws the following Exception: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 124, in method ret = func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/API.py", line 417, in hotunplugNic return self.vm.hotunplugNic(params) File "<string>", line 2, in hotunplugNic File "/usr/lib/python2.7/site-packages/vdsm/common/api.py", line 101, in method return func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 3325, in hotunplugNic nicParams = {'macAddr': nic.macAddr} AttributeError: macAddr Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHV 4.3.4.2-0.1.el7 vdsm-4.30.17-1.el7ev.x86_64 How reproducible: ----------------- Only in automation testing Steps to Reproduce (in automation test): ----------------------------------------- 1. Provision a VM with a virtIO interface 2. Start the VM 3. HotUnplug the interface Actual results: ---------------- HotUnplug fails Expected results: ------------------ HotUnplug should succeed Additional info: ---------------- I believe there are problems here with Engine and VDSM both. On Engine side I observed in the logs that the XML sent to VDSM is the following: <?xml version="1.0" encoding="UTF-8"?><hotunplug> <devices> <interface> <alias name=""/> </interface> </devices> </hotunplug> Clearly there is something wrong here since the alias name is empty. On VDSM side looking at lib/vdsm/virt/vm.py 3005:9 it seems that VDSM is looking to receive the 'nic' object from the XML sent by engine, and does not defend itself (output a more concise error) in case there is no value as 'macAddr' (as is happening here) or possibly the 'nic' object is empty
I see the unplug request is <devices> <interface> <alias name=""/> </interface> </devices> </hotunplug> which is certainly wrong. Reassigning to network team
Adding some more details: ------------------------- In the test the exact flow is: ============================== 1. Create a MAC pool with range: 00:00:00:10:10:10 - 00:00:00:10:10:11 (exactly 2 MAC addresses). 2. Update host cluster to use the newly provisioned MAC pool. 3. Add an interface to the guest with ovirtmgmt network and MAC address 00:00:00:10:10:10 4. Create a snapshot of the guest and wait for it to be in status 'OK'. 5. Power up the guest and wait for it to be in status 'up' Now we send a request to engine to unplug the guest's interface as such: PUT /ovirt-engine/api/vms/<ID>/nics/<ID> <nic> <mac> <address>00:00:00:10:10:13</address> # intentional out-of-scope MAC </mac> <plugged>false</plugged> </nic> And that's when we come across the error: (and all of the above in the bug description) Status: 400 Reason: Bad Request Detail: [General Exception]
(In reply to msheena from comment #2) > Adding some more details: > ------------------------- > > In the test the exact flow is: > ============================== > 1. Create a MAC pool with range: 00:00:00:10:10:10 - 00:00:00:10:10:11 > (exactly 2 MAC addresses). > 2. Update host cluster to use the newly provisioned MAC pool. > 3. Add an interface to the guest with ovirtmgmt network and MAC address > 00:00:00:10:10:10 > 4. Create a snapshot of the guest and wait for it to be in status 'OK'. > 5. Power up the guest and wait for it to be in status 'up' > > Now we send a request to engine to unplug the guest's interface as such: > PUT /ovirt-engine/api/vms/<ID>/nics/<ID> > <nic> > <mac> > <address>00:00:00:10:10:13</address> # intentional out-of-scope MAC > </mac> > <plugged>false</plugged> > </nic> > > And that's when we come across the error: (and all of the above in the bug > description) > Status: 400 > Reason: Bad Request > Detail: [General Exception] Does this flow triggers the VDSM problem "AttributeError: macAddr", or is this flow handled inside Engine?
(In reply to Dominik Holler from comment #3) > (In reply to msheena from comment #2) > > Adding some more details: > > ------------------------- > > > > In the test the exact flow is: > > ============================== > > 1. Create a MAC pool with range: 00:00:00:10:10:10 - 00:00:00:10:10:11 > > (exactly 2 MAC addresses). > > 2. Update host cluster to use the newly provisioned MAC pool. > > 3. Add an interface to the guest with ovirtmgmt network and MAC address > > 00:00:00:10:10:10 > > 4. Create a snapshot of the guest and wait for it to be in status 'OK'. > > 5. Power up the guest and wait for it to be in status 'up' > > > > Now we send a request to engine to unplug the guest's interface as such: > > PUT /ovirt-engine/api/vms/<ID>/nics/<ID> > > <nic> > > <mac> > > <address>00:00:00:10:10:13</address> # intentional out-of-scope MAC > > </mac> > > <plugged>false</plugged> > > </nic> > > > > And that's when we come across the error: (and all of the above in the bug > > description) > > Status: 400 > > Reason: Bad Request > > Detail: [General Exception] > > Does this flow triggers the VDSM problem "AttributeError: macAddr", or is > this flow handled inside Engine? This is the flow of the test that in my opinion causes engine to send a missing xml. About VDSM this is due to what I wrote in the description where VDSM code simply does not handle empty objects, and throws this error instead of throwing something in the like of: "Hotunplug NIC failed - NIC not found"
(In reply to msheena from comment #4) > Created attachment 1577845 [details] > engine log in debug level Looks like the flow in this logfile is 1. libvirtXml, including the alias for the VM interface, is generated for the first VM and send to the host 2. the first VM is started 3. another VM is started the same way 4. VM monitoring fetches data of the second VM 5. XML for hotunplug on the first VM is generated without alias, because the alias is not yet in the db, because VMMonitoring is not yet triggered for the first VM. From my understanding, the events from the host ("processing event for host") are missing in the flow, which should trigger the VM monitoring for the first VM. Michal, what is your view on this logfile?
Seems 5 is wrong, the alias is defined at the VM start by LibvirtVmXmlBuilder::writeAlias() for each device (since 4.2). If the command would use the right alias then the operation should work even early before monitoring reports anything. Additionally, blocking hotplug/unplug early might make sense too. It's usually cooperative, and you may want to avoid it during bios or OS boot. In other places we have a crude but simple check on UP VM state.
(In reply to Michal Skrivanek from comment #7) > Seems 5 is wrong, the alias is defined at the VM start by > LibvirtVmXmlBuilder::writeAlias() for each device (since 4.2). > If the > command would use the right alias then the operation should work even early > before monitoring reports anything. Ack, this would work but might hide the problem that VMMonitoring is not working as expected. > Additionally, blocking hotplug/unplug early might make sense too. It's > usually cooperative, and you may want to avoid it during bios or OS boot. In > other places we have a crude but simple check on UP VM state. In this case, the VM is in state 'up', see line 18517.
the only interface in VM has alias <alias name='ua-838f52d8-6806-4189-8554-9e7635edd383'/> which means the device id is 838f52d8-6806-4189-8554-9e7635edd383. Later it is deviceId='b6ac415b-49f9-4ec0-bb6d-1304e644c3bf' which is trying to be unplugged. That's a different device then
(In reply to Michal Skrivanek from comment #9) > the only interface in VM There seems to be two VMs. > has alias <alias > name='ua-838f52d8-6806-4189-8554-9e7635edd383'/> which means the device id > is 838f52d8-6806-4189-8554-9e7635edd383. This is in the second VM mac_pool_vm_1. > Later it is > deviceId='b6ac415b-49f9-4ec0-bb6d-1304e644c3bf' which is trying to be > unplugged. That's a different device then This is in the first VM mac_pool_vm_0.
oh, I'm sorry! Yeah, that's correct and the command matches for mac_pool_vm_0. Still, doesn't change a thing that the problem is in ActivateDeactivateVmNicCommand most likely...you just shouldn't send an invalid(empty) device to unplug
(In reply to Michal Skrivanek from comment #11) > oh, I'm sorry! Yeah, that's correct and the command matches for > mac_pool_vm_0. > > Still, doesn't change a thing that the problem is in > ActivateDeactivateVmNicCommand most likely...you just shouldn't send an > invalid(empty) device to unplug Ack, I will provide a code change which extends the validation of ActivateDeactivateVmNicCommand. We can discuss on the code change about the error message. But never the less the reported flow should work, and the validation rule will change only the error message.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Moshe, can you please try to have a sleep before unplugging the vNIC as a workaround?
Hi Dominik, Unfortunately waiting will not solve the issue. Tried waiting for 60, 90 and 120 seconds without any success.
Could not reproduce manually on master in following cases: A. flow as in description of this bug: "Steps to Reproduce (in automation test): ----------------------------------------- 1. Provision a VM with a virtIO interface 2. Start the VM 3. HotUnplug the interface" tried without mac address, with mac address in range, with mac address out of range B. flow as in comment #1 1. Create a MAC pool with range: 00:00:00:10:10:10 - 00:00:00:10:10:11 (exactly 2 MAC addresses). 2. Update host cluster to use the newly provisioned MAC pool. 3. Add an interface to the guest with ovirtmgmt network and MAC address 00:00:00:10:10:10 4. Create a snapshot of the guest and wait for it to be in status 'OK'. 5. Power up the guest and wait for it to be in status 'up' Now we send a request to engine to unplug the guest's interface... tried with mac address out of range, in range, no mac address
*** Bug 1766666 has been marked as a duplicate of this bug. ***
Verified on =========== ovirt-engine-4.4.0-0.25.master.el8ev.noarch vdsm-4.40.5-1.el8ev.x86_64
This bugzilla is included in oVirt 4.4.0 release, published on May 20th 2020. Since the problem described in this bug report should be resolved in oVirt 4.4.0 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.
Notes: Previously, if you hot-unplugged a virtual machine interface shortly after booting the virtual machine, the unplugging action failed with an error. Cause: happened because the vNIC alias was not set for hot unplug to accommodate old VMs which have been running in a user's environment with a libvirt alias. But for new VMs, the vNIC alias name is generated by the engine, sent to the host, reported back by VM monitoring, and only then set in the DB. So in some cases, the alias was not reported by host monitoring in time for the hot unplug to have it on hand, which caused an empty alias to be sent. Result: The missing alias name prevented VDSM from identifying the vNIC that required hot unplug, thereby failing the execution thereof. Fix: If no alias name exists in the Engine during hot unplug, the Engine generates it on the fly.