Bug 1462198

Summary: Cluster assigning MACs from a different's cluster MAC pool
Product: [oVirt] ovirt-engine Reporter: nicolas
Component: Backend.CoreAssignee: Martin Mucha <mmucha>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Meni Yakove <myakove>
Severity: low Docs Contact:
Priority: medium    
Version: 4.1.1.8CC: bugs, danken, ylavi
Target Milestone: ovirt-4.2.0Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-09 13:41:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Network RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
MAC addresses result none

Description nicolas 2017-06-16 12:13:04 UTC
Description of problem:

This bug is opened after BZ #1459143 having been closed.

We created a second cluster, so our students could create their machines. As stated in the mentioned bug, there are 2 DC, with 2 MAC pools:

Default:
00:1a:4a:4d:cc:00 - 00:1a:4a:4d:cc:ff
00:1a:4a:4d:dd:00 - 00:1a:4a:4d:dd:ff
00:1a:4a:97:5f:00 - 00:1a:4a:97:5f:ff
00:1a:4a:97:6e:00 - 00:1a:4a:97:6f:ff

Adicional-DOCINT2:
00:1a:4a:97:ee:00 - 00:1a:4a:97:ee:ff

The problem is that VMs from cluster 2, which has Adicional-DOCINT2 as MAC pool, are taking MACs from ranges in Default's MAC pool.

I understand there can be 2 same MACs in different cluster, but this machines are brand new creation and they shouldn't be taking MAC addresses from a MAC pool on other cluster.

This is not a separate case, in the other cluster (VDI, you can download the ovirt-engine log from the other BZ or ask for a new one) all of them have MACs from the first cluster, which triggers a high number of MAC collisioning and connectivity problems.

As I know this can be hard to debug, we have some credentials created specifically for these problems, just ask for them so you can try to reproduce the problem on our infrastructure.

Version-Release number of selected component (if applicable):

4.1.1.8

How reproducible:

Not sure under what circumstances this problem happens.

Comment 1 Martin Mucha 2017-06-23 12:40:04 UTC
this is extremely unlikely to happen, that would have to mean, that some code related to cluster A holds ID of cluster B and use that ID to get related MAC pool instead of one it should use...

1.can you give us example of MAC which were badly allocated from different ranges?
2. do you know what operation led to this state? 'What' asked for mac allocation?

Comment 2 nicolas 2017-06-23 13:13:46 UTC
Created attachment 1291046 [details]
MAC addresses result

1) I have plenty of them, I run the code below in Python to extract MAC addresses on the VDI cluster (which contains the Adicional-DOCINT2 pool) and I'm attaching the result. Note that ONLY MAC addresses in the 00:1a:4a:97:ee:XX format should be valid according to the configuration.

  for vm in vms_serv.list(search='Datacenter=VDI'):
      nics = conn.follow_link(vm.nics)
      for nic in nics:
          mac = nic.mac.address
          print "VM: %s: %s" % (vm.name, mac)

2) All machines are new machines based on a Windows template. The funny thing is that the base machine (from which the template was created) seems to have a correct MAC address. Every instantiated machine has 2 interfaces. However, the template only has one, from what I assume that students added manually one interface. Another thing you can see in the attached file is that most of the machines have an incorrect range's MAC addresses for both for the "template" interface and the one that they added manually. I tried to reproduce the issue creating a new machine based on this template, but I couldn't. I tried both from the adminportal and the userportal, and this time it seems to be assigning valid MAC addresses, however, as you can see, there are lots of invalid addresses. As far as I know, no engine update has been applied from the time these machines were created up until now.

If you need to debug something on our platform we can provide you some credentials so you can try by yourself if you wish.

Comment 3 Martin Mucha 2017-06-30 10:37:04 UTC
foreword: I'm sorry, I'm not that familiar with python & python api, maybe others will help and correct me if needed.

Ok, I took you attachment and did:

at MAC-addresses.txt | sed "s/.* //" | sed "s/:..$//" | sort | uniq -c
     12 00:1a:4a:4d:cc
     11 00:1a:4a:4d:dd
      7 00:1a:4a:97:5f
     59 00:1a:4a:97:6e
    208 00:1a:4a:97:6f
     21 00:1a:4a:97:ee

which shows, that range 

00:1a:4a:97:6e:00 - 00:1a:4a:97:6f:ff

which belongs to Default MacPool, is much more often used. This is little bit odd. Because after patch

I00f4ebf8371c1a0e531baf7ef764c99d0be63ab2

was merged (it is contained in your version), all ranges should be visited equally often when serving mac requests. Lets assume, that all ranges are part of one mac pool(to simulate, that 'stealing' macs is possible). That being said, range 00:1a:4a:97:6e:00 - 00:1a:4a:97:6f:ff should serve ± same amount of macs as other ranges. But here that does not hold; this range holds 267 while second most frequest range holds only 21.

From this I'd conclude, that machines from report:
a) were created when mac pool settings were different.
b) mac addresses were assigned manually, and not assigned by engine.

———

I'll try to debug creating new vms from template into different cluster, as I'm not sure what will happen.

===

(note for me) 
found bugs in: 
I00f4ebf8371c1a0e531baf7ef764c99d0be63ab2
while checking for free macs in pool, last used range advances, leading to situation, that first all odd/even ranges are empties, and only then we proceed on remaining ones. Check for full macpool must not be operation considered as one 'using' mac range.

Comment 4 Martin Mucha 2017-06-30 10:57:22 UTC
1. I created new VM in Default cluster with one nic, letting engine decide which mac to use. I verified, that selected mac (	
00:1a:4a:16:01:03)belongs to default cluster mac pool(00:1a:4a:16:01:00-00:1a:4a:16:01:01, 00:1a:4a:16:01:02-00:1a:4a:16:01:03)

2. I created another mac pool with range: 00:1a:4a:16:02:01-00:1a:4a:16:02:ff

3. made template out of VM(from step1), using VirtualMachines->Make Template. UI does not show mac when examining created template.

4. I created new cluster, and assigned to it newly created mac pool(from step2).

5. I created new VM using this template(from step 3), via Templates->newVm. I selected newly created cluster(from step4), named it, and asked for extra vmnic, as you students theoretially did.

results: several attempts all creates only VMs residing in cluster created in step 4, using mac belonging to relevant mac pool(created in step 2), and not from one belonging to default cluster.

So, creating vm from template does not reproduce your issue, WHILE TESTING ON MASTER. I'll retest on your version specifically, but there's very slim chance of change, there is almost no code touching of mac pools, and one present is almost certainly unrelated.

Comment 5 nicolas 2017-06-30 11:15:02 UTC
For the record, on comment #3, manually assigning MACs is discarded in our case, I talked to the teacher and he told me how they created their machines (basically cloning) and no nic MACs were touched manually.

However, I cannot discard case a (different pools at machine creation time). It's quite possible that some of us created an additional MAC pool on the second cluster (VDI).

Comment 6 Martin Mucha 2017-06-30 12:50:43 UTC
same behavior in 4.1.1.8.

so to recap, I cannot reproduce claim, that creating VM from template (into different cluster) creates invalid macs.

So if you tell, that no macs were added manually, it most probably must have been comment #3/ conclusion b. If you change mac pool setting related to cluster, then all already assigned macs are not reallocated (because VM can be running and can cause issues).

I'd ask you to fix all invalid macs (or just be able to identify them somehow in future) and monitor if new invalid mac is assigned in future, and ideally how it was created, via which action.

So far I tested everything I was able to come up with without any luck finding error. For one problem I found (should not affect your case) I provided a fix already.

Comment 7 nicolas 2017-06-30 13:03:48 UTC
Ok, I'll monitor it again. Probably in a few months we well be able to check if this happens again, because the course will start again, with new machines, new students and so.

Thanks.

Comment 8 Martin Mucha 2017-07-02 19:38:58 UTC
ok, if you have new info, please let me know. But please remember, if you just tell me: "it happened again", it won't help us(me) much. I need some certainty — we have this env, we did this, and this was the result. I did all I could come up with, but did not find any issue; if it happen again, I need narrower area so search. Thanks for you understanding.

Comment 9 Dan Kenigsberg 2017-07-03 10:33:26 UTC
Ok, I'll lower the severity and postpone it to 4.2.0