Bug 1554180

Summary: [downstream clone 4.2.2] MacPool fails to initialize when it contains duplicates and user disallows duplicates
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-engineAssignee: eraviv
Status: CLOSED ERRATA QA Contact: Michael Burman <mburman>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.9CC: apinnick, danken, eraviv, gveitmic, lsurette, mburman, mkalinin, rbalakri, Rhev-m-bugs, srevivo, ykaul, ylavi
Target Milestone: ovirt-4.2.2Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
This update prevents the user from clearing the "Allow Duplicates" check box in the Edit MAC Address Pool dialog or via a corresponding REST request if duplicate MAC addresses exist.
Story Points: ---
Clone Of:
: 1561080 1561081 1561865 (view as bug list) Environment:
Last Closed: 2018-05-15 17:48:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1561080, 1561081, 1561865    

Description Germano Veit Michel 2018-03-12 01:48:09 UTC
Description of problem:

When a MacPool contains duplicate macs and the user attempts to uncheck the "Allow Duplicates" option for the Pool, it fails with no clear indication on the GUI on why it failed, and even worse, now the entire engine starts misbehaving as the MacPool is not reinitialized again after it failed to initialize with allow duplicates disabled.

So.. the engine is running fine, MacPool does not allow duplicates. Then the user configures it to allow duplicates, add a duplicate MAC VM. All fine.

Now the user tries to disable the allow duplicates option, everything goes to limbo.

1) To start with, this is the message on the UI the user gets when attempting to disable the allow duplicate mac option:

"Error while executing action UpdateMacPool: Internal Engine Error"

On engine logs, we get this, indicating there is are duplicate MACs. Why is the user not notified about this in the UI?

2018-03-12 11:21:57,383+10 ERROR [org.ovirt.engine.core.bll.UpdateMacPoolCommand] (default task-4) [b7b3a181-659b-411e-9da6-4b7ba16802e8] Command 'org.ovirt.engine.core.bll.UpdateMacPoolCommand' failed: EngineException: Unable to initialize MAC pool due to existing duplicates (Failed with error MAC_POOL_INITIALIZATION_FAILED and code 5010)

But here comes the worst part:

2) All sorts of operations start to fail, as the MacPool did not initialize again after the above and the mac pool cannot be found. AddVm, RemoveVm, AddNic, RemoveNic... all fail:

2018-03-12 11:36:31,896+10 ERROR [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-15) [97376231-9e23-4ca7-b349-e7d6dcdb7725] Exception: javax.ejb.EJBTransactionRolledbackException: Pool for id="0000002f-002f-002f-002f-000000000108" does not exist

3) And things get ugly, even if the user identifies a duplicate MAC, removing a VM/NIC now fails with confusing unrelated messages. See what is displayed when a VM fails to remove, due to the missing MacPool.

2018-03-12 11:36:31,896+10 ERROR [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-15) [97376231-9e23-4ca7-b349-e7d6dcdb7725] Exception: javax.ejb.EJBTransactionRolledbackException: Pool for id="0000002f-002f-002f-002f-000000000108" does not exist

2018-03-12 11:36:31,905+10 INFO  [org.ovirt.engine.core.bll.RemoveVmCommand] (org.ovirt.thread.pool-6-thread-15) [97376231-9e23-4ca7-b349-e7d6dcdb7725] Command [id=c941804f-510e-4e4d-974a-c731a5c258de]: Compensating CHANGED_STATUS_ONLY of org.ovirt.engine.core.common.businessentities.VmDynamic; snapshot: EntityStatusSnapshot:{id='61537e85-1363-4ce6-b949-d57d9a886b49', status='Down'}.

2018-03-12 11:36:31,933+10 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (org.ovirt.thread.pool-6-thread-15) [97376231-9e23-4ca7-b349-e7d6dcdb7725] EVENT_ID: USER_REMOVE_VM_FINISHED_WITH_ILLEGAL_DISKS(172), Correlation ID: 97376231-9e23-4ca7-b349-e7d6dcdb7725, Job ID: 1a6c79a0-c1a3-4eaf-92fb-c09d579464da, Call Stack: null, Custom ID: null, Custom Event ID: -1, Message: VM germano-test5 has been removed, but the following disks could not be removed: <UNKNOWN>. These disks will appear in the main disks tab in illegal state, please remove manually when possible.

3) Even editing the MAC Pool fails with the same as above (Pool id does not exists)

The average user will be completely lost on what is going on.

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.9.2-0.1.el7.noarch

How reproducible:
Always

Steps to Reproduce:
1. Enable duplicate MACs on the Pool
2. Create some duplicate MAC
3. Disable duplicate MACs on the Pool.
4. Try any flow that uses the MacPool.

Actual results:
MacPool fails to initialize, user has no hints on what is going on. Engine is misbehaving with no clear indications.

Expected results:
- If there are duplicate MACs, MacPool must initialize again after disabling duplicate macs, otherwise the whole engine is broken.
- Adequate message displayed to the user when disabling duplicate MACs for a Pool that has duplicate MACs.

Comment 1 Germano Veit Michel 2018-03-12 01:57:34 UTC
Just to document here the way out of this:

1) reinitialize engine (thankfully allow duplicate macs is still enabled in the DB), so macpool will initialize fine if we restart the engine.

2) remove duplicate mac
To find them:
/usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select vm_static.vm_name,mac_addr,creation_date from vm_interface,vm_static where vm_interface.vm_guid = vm_static.vm_guid and mac_addr in (select mac_addr from vm_interface group by mac_addr having (count(*) >1)) order by mac_addr;"

3) uncheck allow duplicates

Comment 3 Yaniv Kaul 2018-03-12 05:26:16 UTC
(In reply to Germano Veit Michel from comment #1)
> Just to document here the way out of this:
> 
> 1) reinitialize engine (thankfully allow duplicate macs is still enabled in
> the DB), so macpool will initialize fine if we restart the engine.
> 
> 2) remove duplicate mac
> To find them:
> /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select
> vm_static.vm_name,mac_addr,creation_date from vm_interface,vm_static where
> vm_interface.vm_guid = vm_static.vm_guid and mac_addr in (select mac_addr
> from vm_interface group by mac_addr having (count(*) >1)) order by mac_addr;"

Can you contribute this script? 


> 
> 3) uncheck allow duplicates

Comment 4 Germano Veit Michel 2018-03-12 05:49:19 UTC
(In reply to Yaniv Kaul from comment #3)
> Can you contribute this script?

This was already contributed in a much better form here as a tool to detect and fix duplicate macs: https://gerrit.ovirt.org/#/c/83415/
Looks like it may ship with 4.3 if it gets merged someday.

Or do you have something else in mind where such logic would be useful?

Comment 5 Dan Kenigsberg 2018-03-12 10:19:51 UTC
(In reply to Germano Veit Michel from comment #4)
> This was already contributed in a much better form here as a tool to detect
> and fix duplicate macs: https://gerrit.ovirt.org/#/c/83415/
> Looks like it may ship with 4.3 if it gets merged someday.

We can merge and ship your script even earlier, but it needs to be Verfied+1, CI+1 (and make sure it is really shipped in ovirt-engine.rpm)

Comment 9 Sandro Bonazzola 2018-04-05 13:17:49 UTC
Moving back to POST status since referenced patch https://gerrit.ovirt.org/#/c/89578/ has not been merged yet.
Being 4.2.2 already released, please re-target to 4.2.3 or later

Comment 10 eraviv 2018-04-08 07:40:52 UTC
http://gerrit.ovirt.org/89578 is not strictly required to make sure  this bug does not reproduce. The 4.2.2-merged validation patch http://gerrit.ovirt.org/89513 is enough for that.

This bug can, and should be, tested by QA now.

Comment 11 Michael Burman 2018-04-08 08:21:05 UTC
Hi 
What about a fix for 4.1? it was originally reported for 4.1

Comment 12 eraviv 2018-04-08 08:32:29 UTC
BZ1561080 is for 4.1

Comment 13 Michael Burman 2018-04-08 09:00:17 UTC
Validation is added when trying to unset the 'Allow Duplicates' when MAC pool contains duplicate MACs.

'Error while executing action: Cannot edit MAC Pool. Cannot unset 'Allow Duplicates' when mac pool contains duplicate macs.'

Bug can't reproduced. 

After removing the duplicate MAC and unsettling the 'Allow Duplicates' , engine behaves as expected. 

Verified on - 4.2.2.6-0.1.el7

Comment 17 Michael Burman 2018-05-01 10:34:15 UTC
Eitan, i would like to change the summary of this bug to it's correct fix, which is validation of duplicate MACs..what you think?

Comment 20 eraviv 2018-05-02 04:33:10 UTC
Avital: 
a sibling bug BZ1561080 has passed doc text review by Byron Gravenorst. The doc text should be the same here.

Michael:
agreed, but then let's change its siblings (1561865 1561080) as well for consistency and searchability

Thanks

Comment 24 errata-xmlrpc 2018-05-15 17:48:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:1488

Comment 25 Franta Kust 2019-05-16 13:09:14 UTC
BZ<2>Jira Resync