Bug 1637078
Summary: | [downstream clone - 4.2.7] Snapshot deletion fails with "MaxNumOfVmSockets has no value for version" | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> |
Component: | ovirt-engine | Assignee: | Eyal Shenitzky <eshenitz> |
Status: | CLOSED ERRATA | QA Contact: | Elad <ebenahar> |
Severity: | urgent | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.2.3 | CC: | apinnick, eedri, eshenitz, gveitmic, gwatson, hchatter, lsurette, lveyde, michal.skrivanek, mkalinin, mwest, nashok, ratamir, Rhev-m-bugs, sirao, srevivo, tnisan, ycui |
Target Milestone: | ovirt-4.2.7 | Keywords: | ZStream |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | vdsm v4.20.43 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | 1628150 | Environment: | |
Last Closed: | 2018-11-05 15:03:18 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1628150 | ||
Bug Blocks: |
Description
RHV bug bot
2018-10-08 14:49:01 UTC
What does “engine-config -g MaxNumOfVmSockets“ return? (Originally by michal.skrivanek) Hello Michal, Thanks for your inputs on this, (In reply to Michal Skrivanek from comment #8) > What does “engine-config -g MaxNumOfVmSockets“ return? ~~~~ MaxNumOfCpuPerSocket: 16 version: 3.6 MaxNumOfCpuPerSocket: 16 version: 4.0 MaxNumOfCpuPerSocket: 254 version: 4.1 MaxNumOfCpuPerSocket: 254 version: 4.2 MaxNumOfThreadsPerCpu: 8 version: 3.6 MaxNumOfThreadsPerCpu: 8 version: 4.0 MaxNumOfThreadsPerCpu: 8 version: 4.1 MaxNumOfThreadsPerCpu: 8 version: 4.2 MaxNumOfVmCpus: 240 version: 3.6 MaxNumOfVmCpus: 240 version: 4.0 MaxNumOfVmCpus: 288 version: 4.1 MaxNumOfVmCpus: 384 version: 4.2 MaxNumOfVmSockets: 16 version: 3.6 MaxNumOfVmSockets: 16 version: 4.0 MaxNumOfVmSockets: 16 version: 4.1 MaxNumOfVmSockets: 16 version: 4.2 ~~~~ Regards, Siddhant Rao (Originally by Siddhant Rao) thanks, that looks good. And the snapshot is surely from 3.6 or newer? Is it possible it was created in 3.5 or earlier? (even the one you're trying to delete) (Originally by michal.skrivanek) Hello Michal, Apparently yes, The VM is cloned from a 3.5 environment. Let me know your inputs. Regards, Siddhant Rao (Originally by Siddhant Rao) VM - ok. Can you please check its current cluster level? (both the cluster setting and any potential VM-level override) the snapshot which is being deleted - is it from 3.5? Can you get the XML from db and check what cluster version it has? is there only one snapshot (so the data is being merged to the current running config) or is it being merged with another snapshot? If it's the latter, can you also check its cluster level? (Originally by michal.skrivanek) Hello Michal, The issue is reproducible in my test environment. The issue will happen for all VMs if the snapshot was taken from 3.5 or lower and if we try to delete it in 4.2 environment. Steps to reproduce: 1. Create a VM snapshot in 3.5 environment. 2. Export the VM to export domain and import it to 4.2. 3. Try to do a live merge of the snapshot. It will fail with the error below. === 2018-09-13 06:18:12,285-04 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [de1edc42-c863-4780-b23e-0003db4a1066] Failed invoking callback end method 'onSucceeded' for command 'd326f470-04c9-4265-9c41-4bd5ad114b94' with exception 'MaxNumOfVmSockets has no value for version: ', the callback is marked for end method retries but max number of retries have been attempted. The command will be marked as Failed. 2018-09-13 06:18:12,285-04 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [de1edc42-c863-4780-b23e-0003db4a1066] Error invoking callback method 'onSucceeded' for 'SUCCEEDED' command 'd326f470-04c9-4265-9c41-4bd5ad114b94' 2018-09-13 06:18:12,286-04 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [de1edc42-c863-4780-b23e-0003db4a1066] Exception: java.lang.IllegalArgumentException: MaxNumOfVmSockets has no value for version: === The 3.5 snapshot is not having "ClusterCompatibilityVersion" in the OVF. So the "version" will be empty here. 60 public <T> T getValue(ConfigValues name, String version) { 61 Map<String, T> values = getValuesForAllVersions(name); 62 if (valueExists(name, version)) { 63 return values.get(version); 64 } 65 throw new IllegalArgumentException(name.toString() + " has no value for version: " + version); 66 } The error also shows an empty string for the version. === 2018-09-13 06:18:12,286-04 ERROR [org.ovirt.engine.core.bll.tasks.CommandCallbacksPoller] (EE-ManagedThreadFactory-engineScheduled-Thread-37) [de1edc42-c863-4780-b23e-0003db4a1066] Exception: java.lang.IllegalArgumentException: MaxNumOfVmSockets has no value for version: at org.ovirt.engine.core.dal.dbbroker.generic.DBConfigUtils.getValue(DBConfigUtils.java:65) [dal.jar:] at org.ovirt.engine.core.common.config.Config.getValue(Config.java:28) [common.jar:] === If I manually edit the vm_configuration of the snapshot and add the "<ClusterCompatibilityVersion>3.6</ClusterCompatibilityVersion>" in the xml before the live merge, everything works well. So the issue is because of the absence of "<ClusterCompatibilityVersion>" in 3.5 snapshot VM xml. I can confirm that the customer's xml also don't have "<ClusterCompatibilityVersion>" and it's from 3.5 (the xml is having ovf:version="3.5.0.0"). (Originally by Nijin Ashok) yes, that is correct and expected (at least from code POV:). Anything with <=3.5 will fail in 4.0+ because we stopped supporting previous clusters in 4.0. Now why is it touched when that snapshot is being deleted I do not know, that needs analysis from Storage whether it's really required or it can be removed. Generally the code shouldn't be attempting to write a 3.5 OVF because that version is no longer supported. Tal? (Originally by michal.skrivanek) Ala, any idea why we touch the snapshot while deleting a snapshot after live merge? (Originally by Tal Nisan) I added this request to upgrade helper. https://bugzilla.redhat.com/show_bug.cgi?id=1631896 (Originally by Marina Kalinin) (In reply to Michal Skrivanek from comment #17) > yes, that is correct and expected (at least from code POV:). Anything with > <=3.5 will fail in 4.0+ because we stopped supporting previous clusters in > 4.0. > Now why is it touched when that snapshot is being deleted I do not know, > that needs analysis from Storage whether it's really required or it can be > removed. The OVF update should occur because the VM images were changed. > Generally the code shouldn't be attempting to write a 3.5 OVF > because that version is no longer supported. > Tal? The problem is why the field is missing/contains empty string after the environment was updated to 4.2 / the 3.5 VM was imported to a 4.2 environment. It doesn't seem like a storage issue. (Originally by Eyal Shenitzky) (In reply to Eyal Shenitzky from comment #20) > (In reply to Michal Skrivanek from comment #17) > > yes, that is correct and expected (at least from code POV:). Anything with > > <=3.5 will fail in 4.0+ because we stopped supporting previous clusters in > > 4.0. > > Now why is it touched when that snapshot is being deleted I do not know, > > that needs analysis from Storage whether it's really required or it can be > > removed. > > The OVF update should occur because the VM images were changed. we should touch the VM - the current version of it - which is 4.2. AFAICT that works fine. But we should not touch the 3.5 OVF from the snapshot because it's not supported anymore and any attempt to produce 3.5 OVF will fail. > > Generally the code shouldn't be attempting to write a 3.5 OVF > > because that version is no longer supported. > > Tal? > > The problem is why the field is missing/contains empty string after the > environment was updated to 4.2 / the 3.5 VM was imported to a 4.2 > environment. The VM itself (the current version) should be updated just fine, but no one updates or touches past snapshots. <3.6 didn't have the ClusterCompatibilityVersion field at all. (Originally by michal.skrivanek) > we should touch the VM - the current version of it - which is 4.2. AFAICT > that works fine. > But we should not touch the 3.5 OVF from the snapshot because it's not > supported anymore and any attempt to produce 3.5 OVF will fail. When performing a change in the VM like removing/adding an image we should update the OVF of the VM, it can be done immediately like in live merge or automatically after some period of time by the OVF update mechanism. If we will not do so, the VM will not have a backup and it will affect different flows. For e.g - In case of disaster recovery, the VM you will try to recreate will be irrelevant. > The VM itself (the current version) should be updated just fine, but no one > updates or touches past snapshots. > <3.6 didn't have the ClusterCompatibilityVersion field at all. Maybe we should consider filling those gaps synthetically. (Originally by Eyal Shenitzky) Tested the following: - Created a VM in a 3.5 DC (3.6 env) - Created a snapshot - Exported the VM - Detached the export domain and attached it to 4.2 DC (4.2 env) - Imported the VM - Live merged the snapshot Live merge succeeded Used: 3.6 setup (tested on 3.5 DC): rhevm-3.6.13.4-0.1.el6.noarch vdsm-4.16.38-1.el6ev.x86_64 libvirt-0.10.2-62.el6.x86_64 4.2 setup: ovirt-engine-4.2.7.3-0.0.master.20181012152958.gitfc1595b.el7.noarch vdsm-4.20.42-4.git43e2555.el7.x86_64 libvirt-4.5.0-10.el7.x86_64 INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Project 'ovirt-engine'/Component 'vdsm' mismatch] For more info please contact: rhv-devops Tal, which component / Errata should this bug be added to? The attached fixed are from engine repo, but 'fixed in version' points to VDSM. This currently blocks adding the bug to the Errata moving to ON_QA in the meantime to not block QE. QE verification bot: the bug was verified upstream Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3480 |