Description of problem: We have a report of the following BZ not being fixed by it's 4.0.7 clone: BZ1400043 - [Vm Pool] VMs are created with duplicate MAC addresses First try of new version (4.0.7) resulted in 12 VMs with Duplicate MACs. Version-Release number of selected component (if applicable): rhevm-4.0.7.4-0.1.el7ev.noarch
Upgrade to 4.0.7 was from 4.0.6
One interesting thing is that they have 5-6 VM Pools. Could this increase the probability of hitting the bug? It seems very easy to hit in that environment.
A possible reproduction of the bug - -Make sure the MacPool used by the dc doesn't allow duplicates. 1. Create a template with one vnic ('tmp1'). 2. Create VmPool ('pool') from 'tmp1' with 2 vms ('pool-1' and 'pool-2'). Set the number of prestarted vms as 2. 3. Wait for the vms to be up. 4. Unplug the nic from vm 'pool-1' (lets call its current mac address 'x'). Change its mac address (new mac 'y'). Plug it back. 5. Add a vnic to vm 'pool-2' and set its mac address to 'x' (the old mac address of the vnic we uplugged and plugged). 6. Stop vm 'pool-1'. Result - Both vms 'pool-1' and 'pool-2' have vnic with 'x' mac. Explanation of what causes the bug - when stopping a vm that was started by the pool, the original snapshot (before the run) is restored. The macs of the vnics in the original snapshot are added to the mac pool using 'forceAdd'. It means that it ignores if the mac is already in the pool. So if a mac in the original snapshot was taken by another vm. We will end up with duplicate macs.
Latest logs after a new test (with the snapshot related errors fixed) do not show the problem anymore. I believe we are hitting the scenario Alona described, as the MAC Pool was close to been exhausted therefore the chances of another VM taking the MAC of the original snapshot were quite high.
This is failed QA, the result is the same a was befor ehte fix. We don't reserve the origin MAC address for stateless VM and we can end up with duplicate MAC addresses because of that when the stateless VM is shutdown.
Rumor has it that currently merged code is fitting for QA.
ovirt-engine-4.2.0-0.0.master.20170926175518.git0d20200.el7.centos.noarch
Verified on - 4.2.0-0.0.master.20170927183005.git49790b2.el7.centos
Summary and results: Stateless scenarios - PASS Statefull/snapshot scenarios - PASS Regression - All new regression bugs which has been caused by the fix for this report has been verified Tier 2 - PASS MAC pool per cluster - no regression in the feature
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Open patch attached] For more info please contact: rhv-devops
INFO: Bug status (VERIFIED) wasn't changed but the folowing should be fixed: [Open patch attached] For more info please contact: rhv-devops
Dan, this bug is verified but has an open patch attached: either drop the attached patch or move back this bug to POST.
This BZ has been (ab)used for examples on how to work around it. They have been merged 3 weeks ago, so it is not clear to me which patches you refer to.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1488
BZ<2>Jira Resync