Bug 1492577 - overlap between MAC pools causes annoying "mac already in use" errors
Summary: overlap between MAC pools causes annoying "mac already in use" errors
Keywords:
Status: CLOSED DUPLICATE of bug 1593800
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 4.2.0
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: ovirt-4.3.0
: ---
Assignee: eraviv
QA Contact: Meni Yakove
URL:
Whiteboard:
Depends On:
Blocks: 1537414
TreeView+ depends on / blocked
 
Reported: 2017-09-18 08:36 UTC by Michael Burman
Modified: 2018-08-08 08:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-08 08:49:58 UTC
oVirt Team: Network
Embargoed:
ylavi: ovirt-4.3+
rule-engine: blocker+


Attachments (Terms of Use)

Description Michael Burman 2017-09-18 08:36:02 UTC
Description of problem:
MAC pool per cluster behaves as MAC pool per DC. 

If 2 clusters in the DC using the same MAC pool range, engine consider the taken MAC addresses in cluster1 as well in cluster2. For example, if MAC 00:00:00:00:00:20 is used by a VM in cluster1 and we trying to add the same MAC address to VM in clsuter2 we will fail with:
"VM2:
MAC Address 00:00:00:00:00:20 is already in use. It can be used either by some nic of VM or by nic in one of snapshots in system."

- This behavior is since the feature has been introduced and it's not new, but we should let the users/admin be aware of it
- The fix is complicated and risky and we aware of it
- Release note should be added? 
- Dan if the summary doesn't fit, feel free to change it

Version-Release number of selected component (if applicable):
4.2.0-0.0.master.20170917124606.gita804ef7.el7.centos

How reproducible:
100%

- Create 2 clusters in the DC, CL1 and CL2
- Create custom MAC pool range called 'mac1' with range 00:00:00:00:00:20-00:00:00:00:00:21 and set for both clusters.
- Create 2 VMs, each VM in a different cluster VM1-CL1 and VM2-CL2
- Add vNIC to each VM
- Each VM got 00:00:00:00:00:20
- Add new vNIC to VM1 in CL1 - It has now 2 vNICs with 00:00:00:00:00:20 and 00:00:00:00:00:21 MAC addresses.
- Try to add vNIC for VM2 in CL2 - Failed. 00:00:00:00:00:21 MAC address is already in use.

Comment 1 Michael Burman 2017-09-18 08:39:05 UTC
Correction - 

- Create 2 clusters in the DC, CL1 and CL2
- Create custom MAC pool range called 'mac1' with range 00:00:00:00:00:20-00:00:00:00:00:21 and set for both clusters.
- Create 2 VMs, each VM in a different cluster VM1-CL1 and VM2-CL2
- Add vNIC to each VM
- VM1 got 00:00:00:00:00:20 and VM2 got 00:00:00:00:00:21
- Try to add vNIC for VM1 in CL1 - Failed. 00:00:00:00:00:21 MAC address is already in use.
- Try to add vNIC for VM2 in CL2 - Failed. 00:00:00:00:00:20 MAC address is already in use.

Comment 2 Dan Kenigsberg 2017-09-18 10:41:36 UTC
(In reply to Michael Burman from comment #1)
> Correction - 
> 
> - Create 2 clusters in the DC, CL1 and CL2
> - Create custom MAC pool range called 'mac1' with range
> 00:00:00:00:00:20-00:00:00:00:00:21 and set for both clusters.

Are you using a single mac pool for both clusters? if so, what you describe is the expected behavior. Please check what happens when you create TWO mac pools (with overlapping ranges), attaching each to its own cluster.

Comment 3 Michael Burman 2017-09-18 10:53:35 UTC
(In reply to Dan Kenigsberg from comment #2)
> (In reply to Michael Burman from comment #1)
> > Correction - 
> > 
> > - Create 2 clusters in the DC, CL1 and CL2
> > - Create custom MAC pool range called 'mac1' with range
> > 00:00:00:00:00:20-00:00:00:00:00:21 and set for both clusters.
> 
> Are you using a single mac pool for both clusters? if so, what you describe
> is the expected behavior. Please check what happens when you create TWO mac
> pools (with overlapping ranges), attaching each to its own cluster.

What do you mean it's the expected behavior?? so what is the meaning of this feature exactly? this is NOT the expected behavior.

Comment 4 Michael Burman 2017-09-18 12:10:24 UTC
And creating TWO mac pools (with overlapping ranges), attaching each to its own cluster give the same result.
Engine trying to re-assign the same MAC addresses and failing with MAC already in use.

Comment 5 Martin Mucha 2017-09-19 15:05:47 UTC
we did not read description correctly. 
2 mac pools with overlapping ranges does not provide any guarantee.
1 mac used in two clusters should not return same mac even if duplicates are allowed. This is problem we have to investigate.

Comment 8 Martin Mucha 2017-10-23 09:27:47 UTC
TLDR: 
1. UI is severely broken, it does not show reality consistent with DB and present nonsenses. Debug this bug using DB. See details below if you want.
2. burman is somewhat correct in his claims, however it has nothing to do with mac pools. Jump to paragraph marked with (*******) for details.
3. please ask testers testing VM tab, to re-verify my claims about broken UI.


———
During extended period of testing, I did not find anything wrong with mac pools. I did find a lot of errors on UI though. And I did find one MAYBE unwanted behavior, I don't know.

Let me define, up to my best understanding, how system should behave:
• one mac pool, used by 2 clusters: pool behaves like fridge; it's shared by all members, whoever comes first, takes an item. Therefore, 2 VMs, each from different cluster, both of same mac pool, no two VMs should got same mac.
• two mac pools, used by 2 cluster: just as before, only now we have 2 identical fridges. Therefore, we should see same allocations in same order in both clusters. If VM1 got mac A, VM2 in other cluster, accessing different, but completely identical map pool, should get same mac A.

As aforementioned, I did not see any wrongdoing of mac pools. 

(*******)
When we test scenario, when we have 2 different, but identical mac pools in their settings, then I expected, that there will be duplicate allocations; not in one mac pool, but across multiple pools. This is blocked by validation. We have validation constraining, that there cannot be, regardless of mac pools and who is using mac, two or more plugged nics with same mac. It has nothing to do with mac pools, though. Please decide this behavior, relevant code is in: org.ovirt.engine.core.bll.network.VmInterfaceManager#existsPluggedInterfaceWithSameMac


Now about UI:
• when I'm creating new VM, at very bottom, there is nic creation. If I select ovirt-mgmt network, and then select it back to blank, I will endup with new vm WITH nic.
• when I do not touch nic combo at all, result is randomized. Sometimes system always create nic, another day it never create nic (as it should, at least as I'd expect).
• If you happen to use UI when it works like: "creating VM without nic", then you create 2 VMs, each does not have any NIC. Then select 1 VM, and add 1 nic to it. Go do db and verify it. 1 nic. Go back to UI, check both VMs nics. Based what I see now, both of them has 1 nic. Meaning, UI is not displaying reality. Even more. I have created 2 vms, each with multitude of nics. Random subsets of created nics were reported for each VM after very short period of testing. I deleted both VMs, and created empty VMs as described with same names. There were 0 nics in DB, but both VMs reported 2 nics. Page refresh fixed that, one nic addition broke UI again.
--> these are just things I see when using UI. I did not find actual cause, and I wasn't searching for it, as this is not part of this bug. Individual claims need not to be true, something else may cause these failures. For me it was sufficient to verify, that I cannot trust what UI is presenting.

◦ unrelated error — multiselection of VMs is not possible without opening VM detail. After detail opened after trying to multiselect, remove does not remove individual VM, which it should since we're in VM detail, but all selected VMs.

Comment 9 Dan Kenigsberg 2017-10-23 10:54:54 UTC
Dominik, it seems to me that existsPluggedInterfaceWithSameMac is an arcane system-wide validation, defying the purpose of MAC pools, but you have (relatively recently) used it when fixing bug 1404130. Could you see if it can be removed?

Comment 10 Dominik Holler 2017-11-03 16:08:36 UTC
Unfortunately, this issue has a long story.
In the beginning, there was https://bugzilla.redhat.com/show_bug.cgi?id=873338 , which is the reason to introduce existsPluggedInterfaceWithSameMac.
But then, there was https://bugzilla.redhat.com/show_bug.cgi?id=1212461 .
This was fixed by https://gerrit.ovirt.org/#/c/40052/ .
But this fix introduced https://bugzilla.redhat.com/show_bug.cgi?id=1266172 .
So https://gerrit.ovirt.org/#/c/40052/ was reverted by https://gerrit.ovirt.org/#/c/46704/ .
Later an incarnation of the initial bug popped up as https://bugzilla.redhat.com/show_bug.cgi?id=1404130 .

I have to dive in the relation of snapshots and mac pools more deeper, to decide how this issue can be fixed.

Comment 11 Dan Kenigsberg 2017-11-09 20:57:22 UTC
Moving to Leon, since his new test suite makes it possible to write a proper test coverage, making sure we do not reintroduced a known bug again.

Comment 12 Edward Haas 2018-02-22 10:34:18 UTC
Lets get back to the requirements and design stage: Why will anyone want to have overlapping ranges between mac pools?
mac addresses, by definitions are unique, across all domains (vlans, networks). Opening it up here make no sense and requires a really good reason.

Overlapping ranges should be blocked.

Comment 13 Michael Burman 2018-05-01 06:13:31 UTC
Doesn't exist in 4.1, only 4.2

Comment 14 Red Hat Bugzilla Rules Engine 2018-05-01 06:25:34 UTC
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.

Comment 15 Michael Burman 2018-05-01 14:32:29 UTC
(In reply to Michael Burman from comment #13)
> Doesn't exist in 4.1, only 4.2

Alona managed to reproduce on 4.1, removing the regression flag.

Comment 16 eraviv 2018-08-08 08:49:58 UTC

*** This bug has been marked as a duplicate of bug 1593800 ***


Note You need to log in before you can comment on or make changes to this bug.