Bug 1766666
Summary: | [z-stream clone - 4.3.7] [REST] VM interface hot-unplug right after VM boot up fails over missing vnic alias name | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> |
Component: | ovirt-engine | Assignee: | eraviv |
Status: | CLOSED ERRATA | QA Contact: | msheena |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | unspecified | CC: | bugs, dholler, eraviv, mburman, michal.skrivanek, nhalevy, pelauter, rbarry, Rhev-m-bugs |
Target Milestone: | ovirt-4.3.7 | Keywords: | Automation, Regression, ZStream |
Target Release: | 4.3.7 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.3.7.2 | Doc Type: | Bug Fix |
Doc Text: |
A missing alias name prevented the Virtual Desktop Server Manager from identifying the VNIC which required a hot unplug.
As a result, the hot unplug failed.
In this release, if an alias name is not defined in the RHV Manager, it will be generated on the fly, and the hot unplug will succeed.
|
Story Points: | --- |
Clone Of: | 1717390 | Environment: | |
Last Closed: | 2019-12-12 10:36:35 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1717390 | ||
Bug Blocks: |
Description
RHV bug bot
2019-10-29 15:46:05 UTC
I see the unplug request is <devices> <interface> <alias name=""/> </interface> </devices> </hotunplug> which is certainly wrong. Reassigning to network team (Originally by michal.skrivanek) Adding some more details: ------------------------- In the test the exact flow is: ============================== 1. Create a MAC pool with range: 00:00:00:10:10:10 - 00:00:00:10:10:11 (exactly 2 MAC addresses). 2. Update host cluster to use the newly provisioned MAC pool. 3. Add an interface to the guest with ovirtmgmt network and MAC address 00:00:00:10:10:10 4. Create a snapshot of the guest and wait for it to be in status 'OK'. 5. Power up the guest and wait for it to be in status 'up' Now we send a request to engine to unplug the guest's interface as such: PUT /ovirt-engine/api/vms/<ID>/nics/<ID> <nic> <mac> <address>00:00:00:10:10:13</address> # intentional out-of-scope MAC </mac> <plugged>false</plugged> </nic> And that's when we come across the error: (and all of the above in the bug description) Status: 400 Reason: Bad Request Detail: [General Exception] (Originally by Moshe Sheena) (In reply to msheena from comment #2) > Adding some more details: > ------------------------- > > In the test the exact flow is: > ============================== > 1. Create a MAC pool with range: 00:00:00:10:10:10 - 00:00:00:10:10:11 > (exactly 2 MAC addresses). > 2. Update host cluster to use the newly provisioned MAC pool. > 3. Add an interface to the guest with ovirtmgmt network and MAC address > 00:00:00:10:10:10 > 4. Create a snapshot of the guest and wait for it to be in status 'OK'. > 5. Power up the guest and wait for it to be in status 'up' > > Now we send a request to engine to unplug the guest's interface as such: > PUT /ovirt-engine/api/vms/<ID>/nics/<ID> > <nic> > <mac> > <address>00:00:00:10:10:13</address> # intentional out-of-scope MAC > </mac> > <plugged>false</plugged> > </nic> > > And that's when we come across the error: (and all of the above in the bug > description) > Status: 400 > Reason: Bad Request > Detail: [General Exception] Does this flow triggers the VDSM problem "AttributeError: macAddr", or is this flow handled inside Engine? (Originally by Dominik Holler) (In reply to Dominik Holler from comment #3) > (In reply to msheena from comment #2) > > Adding some more details: > > ------------------------- > > > > In the test the exact flow is: > > ============================== > > 1. Create a MAC pool with range: 00:00:00:10:10:10 - 00:00:00:10:10:11 > > (exactly 2 MAC addresses). > > 2. Update host cluster to use the newly provisioned MAC pool. > > 3. Add an interface to the guest with ovirtmgmt network and MAC address > > 00:00:00:10:10:10 > > 4. Create a snapshot of the guest and wait for it to be in status 'OK'. > > 5. Power up the guest and wait for it to be in status 'up' > > > > Now we send a request to engine to unplug the guest's interface as such: > > PUT /ovirt-engine/api/vms/<ID>/nics/<ID> > > <nic> > > <mac> > > <address>00:00:00:10:10:13</address> # intentional out-of-scope MAC > > </mac> > > <plugged>false</plugged> > > </nic> > > > > And that's when we come across the error: (and all of the above in the bug > > description) > > Status: 400 > > Reason: Bad Request > > Detail: [General Exception] > > Does this flow triggers the VDSM problem "AttributeError: macAddr", or is > this flow handled inside Engine? This is the flow of the test that in my opinion causes engine to send a missing xml. About VDSM this is due to what I wrote in the description where VDSM code simply does not handle empty objects, and throws this error instead of throwing something in the like of: "Hotunplug NIC failed - NIC not found" (Originally by Moshe Sheena) (In reply to msheena from comment #4) > Created attachment 1577845 [details] > engine log in debug level Looks like the flow in this logfile is 1. libvirtXml, including the alias for the VM interface, is generated for the first VM and send to the host 2. the first VM is started 3. another VM is started the same way 4. VM monitoring fetches data of the second VM 5. XML for hotunplug on the first VM is generated without alias, because the alias is not yet in the db, because VMMonitoring is not yet triggered for the first VM. From my understanding, the events from the host ("processing event for host") are missing in the flow, which should trigger the VM monitoring for the first VM. Michal, what is your view on this logfile? (Originally by Dominik Holler) Seems 5 is wrong, the alias is defined at the VM start by LibvirtVmXmlBuilder::writeAlias() for each device (since 4.2). If the command would use the right alias then the operation should work even early before monitoring reports anything. Additionally, blocking hotplug/unplug early might make sense too. It's usually cooperative, and you may want to avoid it during bios or OS boot. In other places we have a crude but simple check on UP VM state. (Originally by michal.skrivanek) (In reply to Michal Skrivanek from comment #7) > Seems 5 is wrong, the alias is defined at the VM start by > LibvirtVmXmlBuilder::writeAlias() for each device (since 4.2). > If the > command would use the right alias then the operation should work even early > before monitoring reports anything. Ack, this would work but might hide the problem that VMMonitoring is not working as expected. > Additionally, blocking hotplug/unplug early might make sense too. It's > usually cooperative, and you may want to avoid it during bios or OS boot. In > other places we have a crude but simple check on UP VM state. In this case, the VM is in state 'up', see line 18517. (Originally by Dominik Holler) the only interface in VM has alias <alias name='ua-838f52d8-6806-4189-8554-9e7635edd383'/> which means the device id is 838f52d8-6806-4189-8554-9e7635edd383. Later it is deviceId='b6ac415b-49f9-4ec0-bb6d-1304e644c3bf' which is trying to be unplugged. That's a different device then (Originally by michal.skrivanek) (In reply to Michal Skrivanek from comment #9) > the only interface in VM There seems to be two VMs. > has alias <alias > name='ua-838f52d8-6806-4189-8554-9e7635edd383'/> which means the device id > is 838f52d8-6806-4189-8554-9e7635edd383. This is in the second VM mac_pool_vm_1. > Later it is > deviceId='b6ac415b-49f9-4ec0-bb6d-1304e644c3bf' which is trying to be > unplugged. That's a different device then This is in the first VM mac_pool_vm_0. (Originally by Dominik Holler) oh, I'm sorry! Yeah, that's correct and the command matches for mac_pool_vm_0. Still, doesn't change a thing that the problem is in ActivateDeactivateVmNicCommand most likely...you just shouldn't send an invalid(empty) device to unplug (Originally by michal.skrivanek) (In reply to Michal Skrivanek from comment #11) > oh, I'm sorry! Yeah, that's correct and the command matches for > mac_pool_vm_0. > > Still, doesn't change a thing that the problem is in > ActivateDeactivateVmNicCommand most likely...you just shouldn't send an > invalid(empty) device to unplug Ack, I will provide a code change which extends the validation of ActivateDeactivateVmNicCommand. We can discuss on the code change about the error message. But never the less the reported flow should work, and the validation rule will change only the error message. (Originally by Dominik Holler) This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP. (Originally by pm-rhel) Moshe, can you please try to have a sleep before unplugging the vNIC as a workaround? (Originally by Dominik Holler) Hi Dominik, Unfortunately waiting will not solve the issue. Tried waiting for 60, 90 and 120 seconds without any success. (Originally by Moshe Sheena) Could not reproduce manually on master in following cases: A. flow as in description of this bug: "Steps to Reproduce (in automation test): ----------------------------------------- 1. Provision a VM with a virtIO interface 2. Start the VM 3. HotUnplug the interface" tried without mac address, with mac address in range, with mac address out of range B. flow as in comment #1 1. Create a MAC pool with range: 00:00:00:10:10:10 - 00:00:00:10:10:11 (exactly 2 MAC addresses). 2. Update host cluster to use the newly provisioned MAC pool. 3. Add an interface to the guest with ovirtmgmt network and MAC address 00:00:00:10:10:10 4. Create a snapshot of the guest and wait for it to be in status 'OK'. 5. Power up the guest and wait for it to be in status 'up' Now we send a request to engine to unplug the guest's interface... tried with mac address out of range, in range, no mac address (Originally by Eitan Raviv) This bug should have been the z-stream clone instead the downstream clone. *** This bug has been marked as a duplicate of bug 1717390 *** this is zstream for 4.3.7 Verified with ============= ovirt-engine-4.3.7.2-0.1.el7.noarch Hi Eitan, can you please review this doc text as soon as possible as we need it for errata approval today? A missing alias name prevented the Virtual Desktop Server Manager from identifying the VNIC which required a hot unplug. As a result, the hot unplug failed. In this release, if an alias name is not defined in the RHV Manager, it will be generated on the fly, and the hot unplug will succeed. ack Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:4229 |