Created attachment 1233037 [details] engine log Description of problem: Failed to import VM because engine complains about MAC Address is already in use, while there are no MACs are in use. It's not possible to import a VM from date domain, bec ause engine think that VMs MAC address is in use on the destination cluster, while there are no VMs at all.. 2016-12-18 09:56:48,979+02 WARN [org.ovirt.engine.core.bll.exportimport.ImportVmFromConfigurationCommand] (default task-13) [5c48763e-973e-4a1f-b8ed-bb6f3d4ffafc] Validation of action 'ImportVmFromConfiguration' failed for user admin@internal-authz. Reasons: VAR__ACTION__IMPORT,VAR__TYPE__VM,NETWORK_MAC_ADDRESS_IN_USE_DETAILED,$NETWORK_MAC_ADDRESS_IN_USE_DETAILED_LIST 00:00:00:00:00:21,$NETWORK_MAC_ADDRESS_IN_USE_DETAILED_LIST_COUNTER 1 2016-12-18 09:56:48,981+02 INFO [org.ovirt.engine.core.bll.exportimport.ImportVmFromConfigurationCommand] (default task-13) [5c48763e-973e-4a1f-b8ed-bb6f3d4ffafc] Lock freed to object 'EngineLock:{exclusiveLocks='[a8f359d8-5cc8-40df-bfbc-886d7c453eea=<VM, ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName t2>, t2=<VM_NAME, ACTION_TYPE_FAILED_NAME_ALREADY_USED>]', sharedLocks='[a8f359d8-5cc8-40df-bfbc-886d7c453eea=<REMOTE_VM, ACTION_TYPE_FAILED_VM_IS_BEING_IMPORTED$VmName t2>]'}' Maybe related to latest changes on the import VM feature... Version-Release number of selected component (if applicable): 4.1.0-0.2.master.20161216212250.gitc040969.el7.centos How reproducible: 100 Steps to Reproduce: 1. Create VM with disk and network interface 2. Set storage domain in maintenance mode, detach from dc and remove(VM should removed as well) 3. Import the data domain back to the DC and try to import the VM back. Actual results: Failed with Error while executing action: t2: MAC Address is already in use: 00:00:00:00:00:21. Engine for some reason complain that the mac is in use. but no VMs are in the destination cluster Expected results: Should work as expected.
Looks like detaching/removing a storage domain doesn't remove the MACs of the VMs (that their disks are on the storage domain and are being removed together with the storage domain) from a mac-pool. @see StorageHandlingCommandBase.removeEntitiesFromStorageDomain that uses @see Remove_Entities_From_storage_domain @ storages_sp.sql (strored procedure) IMHO the bug exists from 3.6 and could be reproduced by the following scenario in (3.6 & 4.0): * create mac-pool of n macs. * create a VM with n macs and a disk. * remove the storage that contains the VM's disk. * attach the storage back and try to import the VM back. The result would be: no free MACs available.
The last step in the (proposed for 3.6 & 4.0) scenario could be replaced by create a VM+vNic.
Sounds like a reasonable analysis. However, making sure the mac pool thingy is managed correctly is definitely the network team's domain. Dan - if Yevgeny can't or won't fix this, please assign to someone who can.
I understand that Yevgeni and Maor have taken care of the issue together.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
This fix didn't made the current 4.1 downstream build 4.1.0-0.3.beta2.el7
Verified on - 4.1.0-0.4.master.20170104122005.git51b1bcf.el7.centos
How could it be that i see this bug again on latest.tested 4.1 upstream version?? 4.1.0-0.4.master.20170105161132.gitf4e2c11.el7.centos
Michael, does this reproduce as often (100%) as before? Maybe something else leaks mac addresses?
(In reply to Michael Burman from comment #8) > How could it be that i see this bug again on latest.tested 4.1 upstream > version?? > 4.1.0-0.4.master.20170105161132.gitf4e2c11.el7.centos Hi Michael, Can you please add the engine logs
No, sometimes. It happen to me few times this morning, since then i can't reproduce it. Maybe something else leak mac addresses, i really don't know. Attaching log, but i don't have reproduction from the last hours.
Created attachment 1238325 [details] new engine log_
Attaching new log We have a mac leakage for sure, but i can't understand what triggers it. I have a situation in which i have no macs in use on the destination cluster, but engine complains about no macs left in the pool when trying to import the vm. When i add vNIC to VM on the destination cluster, vNIC added with success from the pool, but when i try to import VM to this cluster it complain about that all macs are in use..
Created attachment 1238574 [details] mac leakage
Ok, so i managed to reproduce it with the next steps, still not sure exactly what is the trigger here, but i think we should consider re-open this bug, as the MAC addresses are not freed when a storage domain is detached in this scenario: 1) Create two MAC pool ranges, 2 MACs in each 2) Create VM with 1 vNIC from pool1 3) Edit cluster and switch to the second mac pool2 4) Add second vNIC to the VM from pool2 5) Detach the storage domain, remove it and import back. 6) One mac should be warned about out of range. re-assign it and import VM 7) Detach the storage domain again and re-import Result: MAC address didn't got freed and MAC addresses are in use on destination, but they should be free. Failed to import VM It is 100% reproducible in the above steps^^
Once a storage domain is detached the engine gets the VM's mac addresses based on the cluster of the VM. Since this cluster's mac pool was changed the engine was not able to fetch it and free the macs. Dan and Yevgeny, is it possible to block the update of the mac pool in a cluster once there are VMs which run on this cluster and consist mac addresses.
Hi Ok, so the scenario for the mac leakage is simple(and doesn't related to update of mac pool ranges in the cluster). MAC address that has been re-assigned during import VM from data domain, no matter from what reason(mac in use or out of range), won't get freed next time the domain will be detached. So it actually looks like another scenario not covered for BZ 1405761 and i believe we should track it here. 'macs that has been re-assigned are not freed when detaching data domain.' This is 100% reproducible Please decide how you would like to continue and sorry for the noise.
We decided to track the new issue in a new bug - BZ 1411780