Bug 1561080

Summary: [downstream clone - 4.1.11] MacPool fails to initialize when it contains duplicates and user disallows duplicates
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: ovirt-engineAssignee: eraviv
Status: CLOSED ERRATA QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.9CC: bgraveno, danken, eraviv, gveitmic, lsurette, mburman, rbalakri, Rhev-m-bugs, srevivo, ykaul, ylavi
Target Milestone: ovirt-4.1.11Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
This update prevents the user from clearing the "Allow Duplicates" check box in the Edit MAC Address Pool dialog or via a corresponding REST request if duplicate MAC addresses exist.
Story Points: ---
Clone Of: 1554180 Environment:
Last Closed: 2018-04-24 15:30:28 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1554180, 1561865    
Bug Blocks:    

Description RHV bug bot 2018-03-27 15:07:13 UTC
+++ This bug is a downstream clone. The original bug is: +++
+++   bug 1554180 +++
======================================================================

Description of problem:

When a MacPool contains duplicate macs and the user attempts to uncheck the "Allow Duplicates" option for the Pool, it fails with no clear indication on the GUI on why it failed, and even worse, now the entire engine starts misbehaving as the MacPool is not reinitialized again after it failed to initialize with allow duplicates disabled.

So.. the engine is running fine, MacPool does not allow duplicates. Then the user configures it to allow duplicates, add a duplicate MAC VM. All fine.

Now the user tries to disable the allow duplicates option, everything goes to limbo.

1) To start with, this is the message on the UI the user gets when attempting to disable the allow duplicate mac option:

"Error while executing action UpdateMacPool: Internal Engine Error"

On engine logs, we get this, indicating there is are duplicate MACs. Why is the user not notified about this in the UI?

2018-03-12 11:21:57,383+10 ERROR [org.ovirt.engine.core.bll.UpdateMacPoolCommand] (default task-4) [b7b3a181-659b-411e-9da6-4b7ba16802e8] Command 'org.ovirt.engine.core.bll.UpdateMacPoolCommand' failed: EngineException: Unable to initialize MAC pool due to existing duplicates (Failed with error MAC_POOL_INITIALIZATION_FAILED and code 5010)

But here comes the worst part:

2) All sorts of operations start to fail, as the MacPool did not initialize again after the above and the mac pool cannot be found. AddVm, RemoveVm, AddNic, RemoveNic... all fail:

2018-03-12 11:36:31,896+10 ERROR [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-15) [97376231-9e23-4ca7-b349-e7d6dcdb7725] Exception: javax.ejb.EJBTransactionRolledbackException: Pool for id="0000002f-002f-002f-002f-000000000108" does not exist

3) And things get ugly, even if the user identifies a duplicate MAC, removing a VM/NIC now fails with confusing unrelated messages. See what is displayed when a VM fails to remove, due to the missing MacPool.

2018-03-12 11:36:31,896+10 ERROR [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-15) [97376231-9e23-4ca7-b349-e7d6dcdb7725] Exception: javax.ejb.EJBTransactionRolledbackException: Pool for id="0000002f-002f-002f-002f-000000000108" does not exist

2018-03-12 11:36:31,905+10 INFO  [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-15) [97376231-9e23-4ca7-b349-e7d6dcdb7725] Command [id=c941804f-510e-4e4d-974a-c731a5c258de]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.VmDynamic; snapshot: EntityStatusSnapshot:{id='61537e85-1363-4ce6-b949-d57d9a886b49', status='Down'}.

2018-03-12 11:36:31,933+10 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-15) [97376231-9e23-4ca7-b349-e7d6dcdb7725] EVENT_ID: USER_REMOVE_VM_FINISHED_WITH_ILLEGAL_DISKS(172), Correlation ID: 97376231-9e23-4ca7-b349-e7d6dcdb7725, Job ID: 1a6c79a0-c1a3-4eaf-92fb-c09d579464da, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM germano-test5 has been removed, but the following disks could not be removed: <UNKNOWN>. These disks will appear in the main disks tab in illegal state, please remove manually when possible.

3) Even editing the MAC Pool fails with the same as above (Pool id does not exists)

The average user will be completely lost on what is going on.

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.9.2-0.1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Enable duplicate MACs on the Pool
2. Create some duplicate MAC
3. Disable duplicate MACs on the Pool.
4. Try any flow that uses the MacPool.

Actual results:
MacPool fails to initialize, user has no hints on what is going on. Engine is misbehaving with no clear indications.

Expected results:
- If there are duplicate MACs, MacPool must initialize again after disabling duplicate macs, otherwise the whole engine is broken.
- Adequate message displayed to the user when disabling duplicate MACs for a Pool that has duplicate MACs.

(Originally by Germano Veit Michel)

Comment 1 RHV bug bot 2018-03-27 15:07:22 UTC
Just to document here the way out of this:

1) reinitialize engine (thankfully allow duplicate macs is still enabled in the DB), so macpool will initialize fine if we restart the engine.

2) remove duplicate mac
To find them:
/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select vm_static.vm_name,mac_addr,creation_date from vm_interface,vm_static where vm_interface.vm_guid = vm_static.vm_guid and mac_addr in (select mac_addr from vm_interface group by mac_addr having (count(*) >1)) order by mac_addr;"

3) uncheck allow duplicates

(Originally by Germano Veit Michel)

Comment 4 RHV bug bot 2018-03-27 15:07:32 UTC
(In reply to Germano Veit Michel from comment #1)
> Just to document here the way out of this:
> 
> 1) reinitialize engine (thankfully allow duplicate macs is still enabled in
> the DB), so macpool will initialize fine if we restart the engine.
> 
> 2) remove duplicate mac
> To find them:
> /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select
> vm_static.vm_name,mac_addr,creation_date from vm_interface,vm_static where
> vm_interface.vm_guid = vm_static.vm_guid and mac_addr in (select mac_addr
> from vm_interface group by mac_addr having (count(*) >1)) order by mac_addr;"

Can you contribute this script? 


> 
> 3) uncheck allow duplicates

(Originally by Yaniv Kaul)

Comment 5 RHV bug bot 2018-03-27 15:07:36 UTC
(In reply to Yaniv Kaul from comment #3)
> Can you contribute this script?

This was already contributed in a much better form here as a tool to detect and fix duplicate macs: https://gerrit.ovirt.org/#/c/83415/
Looks like it may ship with 4.3 if it gets merged someday.

Or do you have something else in mind where such logic would be useful?

(Originally by Germano Veit Michel)

Comment 6 RHV bug bot 2018-03-27 15:07:40 UTC
(In reply to Germano Veit Michel from comment #4)
> This was already contributed in a much better form here as a tool to detect
> and fix duplicate macs: https://gerrit.ovirt.org/#/c/83415/
> Looks like it may ship with 4.3 if it gets merged someday.

We can merge and ship your script even earlier, but it needs to be Verfied+1, CI+1 (and make sure it is really shipped in ovirt-engine.rpm)

(Originally by danken)

Comment 8 Dan Kenigsberg 2018-03-27 15:14:59 UTC
*** Bug 1561081 has been marked as a duplicate of this bug. ***

Comment 10 Michael Burman 2018-04-12 15:24:03 UTC
As Eitan wrote in BZ 1554180 comment10#

"http://gerrit.ovirt.org/89578 is not strictly required to make sure  this bug does not reproduce. The 4.2.2-merged validation patch http://gerrit.ovirt.org/89513 is enough for that.

This bug can, and should be, tested by QA now."

Based on this, testing - 

Validation is added when trying to unset the 'Allow Duplicates' when MAC pool contains duplicate MACs.

'Error while executing action: Cannot edit MAC Pool. Cannot unset 'Allow Duplicates' when mac pool contains duplicate macs.'

Bug can't reproduced. 
No regression introduced.

After removing the duplicate MAC and unsettling the 'Allow Duplicates' , engine behaves as expected. 

Verified on - 4.1.11.1-0.1.el7

Eitan, I don't think that BZ 1561865 should be a blocker to this bug and it's any how targeted only for 4.2
And also, i think that the summary/title of this bug and BZ 1554180 should be changed according to actual change that was done, the validation of duplicate MACs in the pool that was added.
Thanks,

Comment 16 errata-xmlrpc 2018-04-24 15:30:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1219

Comment 17 eraviv 2018-04-26 18:07:54 UTC
Added description in

Comment 18 eraviv 2018-04-26 18:10:01 UTC
Added description in doc text.

Comment 19 Franta Kust 2019-05-16 13:06:45 UTC
BZ<2>Jira Resync