Bug 1293931 - Duplication of MAC addresses discovered after upgrade (no duplication was allowed)
Summary: Duplication of MAC addresses discovered after upgrade (no duplication was all...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Network
Version: 3.6.1.3
Hardware: x86_64
OS: Linux
unspecified
medium vote
Target Milestone: ovirt-3.6.5
: ---
Assignee: Martin Mucha
QA Contact: Michael Burman
URL:
Whiteboard: network
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-23 15:06 UTC by Michael Burman
Modified: 2016-02-01 14:33 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-02-01 14:33:10 UTC
oVirt Team: Network
danken: ovirt-3.6.z?
mburman: planning_ack?
mburman: devel_ack?
mburman: testing_ack?


Attachments (Terms of Use)
All_engine_logs (5.93 MB, application/x-gzip)
2015-12-23 15:06 UTC, Michael Burman
no flags Details

Description Michael Burman 2015-12-23 15:06:20 UTC
Created attachment 1108976 [details]
All_engine_logs

Description of problem:
Duplication of MAC addresses discovered after upgrade (no duplication was allowed).

A MAC address duplication was discovered after upgrading engine from 3.5.7 to 3.6.1.3 

There are no steps to reproduce and we can't understand how this happened. 

I can describe what was before and after the upgrade(not sure it is related to upgrade).

Note, it is a mixed QE upgrade environment.
   
- I created 2 VMs with 1 vNIC each on engine 3.5.7 from the default MAC pool range.
 00:1a:4a:ef:de:05 and 00:1a:4a:ef:de:06 are the MAC addresses i got for my VMs.
- NO changes were made on the MAC pool range via the engine-config before running upgrade and it had the default values.
- Upgraded engine from 3.5.7 to 3.6.1.3 
- Other member of the QE created a VM pool of 10 VMs in the updated engine, based on a template from 3.5.7 engine(1 vNIC).
- VM pool created successfully with 10 VMs and 1 vNIC per VM.
- the first 2 VMs got the same MAC addresses i had on my VMs -->
00:1a:4a:ef:de:05 and 00:1a:4a:ef:de:06. 
- In DB we saw the duplication as well.
- The other 8 VMs, got fresh MAC addresses from the MAC pool range.
- With DEV, we tried to understand what and how this happened. We checked if my MACs are considered as taken by engine, and they are taken. 
- We can't understand how to reproduce it, maybe it is a race and engine in some point wasn't finished initialization while the VM pool was created on 3.6.

- For the engine logs:
Relevant VMs are : 
'vm-n1' - 00:1a:4a:ef:de:05
'vm-n2' - 00:1a:4a:ef:de:06

VMs that got duplicated MAC addresses from the VM pool:
VM pool - 'vam'
VMs : 
'vam1' got 00:1a:4a:ef:de:05
'vam2' got 00:1a:4a:ef:de:06 

Version-Release number of selected component (if applicable):
3.5.7 >> 3.6.1.3

Comment 1 Dan Kenigsberg 2015-12-24 07:08:39 UTC
From the description it seems that 3.5.7-allocated MACs were completely ignored on upgrade. If this reproduces, its severity is much worse.

Can you reproduce this (allocate macs in 3.5.7, upgrade to 3.6.1, verify that preallocated macs are acknowledged as such)?

Comment 2 Michael Burman 2015-12-24 13:15:22 UTC
Hi Dan,

Unfortunately, i can't reproduce this report.

Comment 3 Martin Mucha 2016-01-04 14:04:03 UTC
I'll try to add what I know about mac pools. I don't know what's "mixed QE upgrade environment" and how that can relate, so I'll talk only about production environment.

MacPools was and are in-memory objects only. MacPools content does not have it's own db storage. In 3.5, mac pool settings were stored in vdc_options table; 3 options were stored there: MacPoolRanges, MaxMacsCountInPool and AllowDuplicateMacAddresses (all constants taken from ConfigValues.java). During engine startup one global mac pool was created using these options. Then, as a part of global mac pool initialization, all VmNics were queried from db, and their mac addresses were registered in this pool. So if you (somehow) created VmNic record outside of engine, duplicity could be easily created.

3.5 and 3.6 brought some changes. Upgrade process takes three options mentioned above, and creates one record in mac_pool table, naming such pool 'Default', and this pool agregates 0..N ranges in mac_pool_ranges table, which were parsed from option MacPoolRanges. Rest remains the same. Pool is initialized on engine startup, and it's content in initialized using VmNics records.

some comments:
• There's *no* manipulation with VmNics / mac addresses during upgrade.
• even if duplication is allowed, when asking for *some* mac address, pool will never return already used MAC address.
• To get already used mac address from mac pool, client [of mac pool] must specifically ask for this address. When happens next, depends on several conditions. Already used MAC may be returned, if duplicates are allowed OR [in logical operator meaning] or method 'forceAddMac' of mac pool is used to register MAC address. This method does not check duplicity setting and does not care about duplicates. And this method (aside from other usages) is called from ImportVmCommandBase when getParameters().isImportAsNewEntity() is false. So if your college used this way to create VM it's ‘normal’ that there's duplicity.

Comment 4 Martin Mucha 2016-01-05 10:05:58 UTC
elaborating some more: I've checked how are macs obtained when creating VM from pool and mentioned 'possibly problematic' method is not used. That means, that for mac to be used again pool must have thought, that this mac is not used. Since you created 2 vm in 'old' engine, then turned it off, and did upgrade, then no possible errors are that important, because pool got reinitialized on next engine start. And because you can see mac assigned in those 2 vms, there probably wasn't any error. All MACs initialized on engine start are obtained using: GetAllMacsByMacPoolId db stored procedure. So if they were not initialized on app startup following db query probably did not return them:

SELECT
  mac_addr
FROM  vm_interface
WHERE  
  EXISTS(
    SELECT 1
    FROM  
      vm_static
    JOIN  vds_groups ON vm_static.vds_group_id = vds_groups.vds_group_id
    WHERE 
      vds_groups.storage_pool_id IN (
        SELECT 
          sp.id
        FROM 
          storage_pool sp
        WHERE 
          sp.mac_pool_id = v_id
      )
      AND vm_static.vm_guid = vm_interface.vm_guid
  );

where 'v_id' is ID of some mac pool (there should be only one named 'Default' after upgrade).

———
So probable resolution:
a) default mac pool was not created during upgrade (see 03_06_0090_add_mac_pool_ranges_to_storage_pool.sql)

b) your datacenter wasn't updated to reference to default mac pool during upgrade (see 03_06_0090_add_mac_pool_ranges_to_storage_pool.sql)

c) all options above is invalid and given db query just did not return MAC addresses of those VMs, because of some error in that query I'm not aware of.

Can any of these make sense?

Comment 5 Michael Burman 2016-01-05 10:48:12 UTC
Hi Martin, i don't know, but my 2 VMs that were created in the 'old' 3.5.7 engine were ON(they didn't turned off) during the upgrade process.

Comment 6 Martin Mucha 2016-01-05 11:12:27 UTC
Those VMs should be out of play, their state/content shouldn't make a difference. Mac pool works only using data in engine DB. If you still have access to engine exhibiting this error, please verify whether a) or b) isn't met. Query I posted seems ok to me, so if a) or b) isn't met and query is valid, I'm out of ideas (so far).

Comment 7 Martin Mucha 2016-01-07 10:21:50 UTC
also — I know that you mentioned in initial description, that it did not happen, but are you really sure, that your college did not created another mac pool for example? If he would created another mac pool with same ranges (or overlapping another pool), this scenario can easily happen ...

Comment 8 Michael Burman 2016-01-11 07:15:59 UTC
(In reply to Martin Mucha from comment #6)
> Those VMs should be out of play, their state/content shouldn't make a
> difference. Mac pool works only using data in engine DB. If you still have
> access to engine exhibiting this error, please verify whether a) or b) isn't
> met. Query I posted seems ok to me, so if a) or b) isn't met and query is
> valid, I'm out of ideas (so far).

Hi Martin,

I don't have access to the engine that exhibiting this error and i couldn't reproduce this report. (maybe we have a snapshot of the engine before the upgrade, not sure. If we have i will let you know)

No one has created another mac pool range, only the default one(from 3.5) is the one that exists.

Comment 9 Dan Kenigsberg 2016-02-01 14:33:10 UTC
This smells serious, but it does not reproduce and we fail to understand where it could come from. Please reopen when another instance of the bug shows up.


Note You need to log in before you can comment on or make changes to this bug.