Bug 1649685

Summary: After increase of ClusterCompatibilityVersion, an additional API-change will persist CustomerCompatibilityVersion to previous ClusterCompatibility Version
Product: Red Hat Enterprise Virtualization Manager Reporter: Steffen Froemer <sfroemer>
Component: ovirt-engineAssignee: Shmuel Melamud <smelamud>
Status: CLOSED ERRATA QA Contact: Liran Rotenberg <lrotenbe>
Severity: high Docs Contact:
Priority: high    
Version: 4.2.0CC: fgarciad, mavital, michal.skrivanek, omachace, pdwyer, rbarry, Rhev-m-bugs, sfroemer, smelamud
Target Milestone: ovirt-4.3.0Keywords: Rebase, ZStream
Target Release: ---Flags: lrotenbe: testing_plan_complete+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.0_rc Doc Type: Bug Fix
Doc Text:
Previously, when a running VM was updated through REST API and next_run parameter was not set to true, the NEXT_RUN snapshot was completely ignored. This caused an unintended change of custom compatibility version of a running VM, if the VM was updated through REST API just after upgrade of the compatibility version of the cluster. This behaviour was changed, and now custom compatibility version of a running VM is left intact after updating it through REST API, regardless of next_run parameter.
Story Points: ---
Clone Of:
: 1662921 (view as bug list) Environment:
Last Closed: 2019-05-08 12:38:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1662921    

Description Steffen Froemer 2018-11-14 08:53:07 UTC
Description of problem:
If Clustercompatibility version is increased and the virtual machine is not rebooted instantly, an additional configuration change will persist the CustomCompattibilityVersion 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install RHV Cluster in Version 4.1 and start a VM in this cluster
2. Increase ClusterCompatibilityVersion to 4.2. The VM will be markes with outstanding config-change and CustomCompatibilityVersion is set on active configuration [1]
3. Make an additional configuration change through API. For example, iothreads=1 [2]
4. Check results [3]

Actual results:
The previous CustomCompatibilityVersion will be persisted and prevent the virtual machine from restarting.

Expected results:
The CustomCompatibilityVersion should not be persisted to old configuration in NEXT_RUN configuration

Additional info:

[1]:
[root@rhhi ~]# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select num_of_io_threads,custom_compatibility_version from vm_static where vm_name='faye-pulanco.crazy.lab'" num_of_io_threads | custom_compatibility_version
-------------------+------------------------------
                 0 | 4.1
(1 row)

[root@rhhi ~]# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -t -c "select vm_configuration from snapshots where snapshot_type='NEXT_RUN' and vm_id=(select vm_guid from vm_static where vm_name='faye-pulanco.crazy.lab');"| cut -c 2- | xmllint --format - | grep CompatibilityVersion
    <ClusterCompatibilityVersion>4.2</ClusterCompatibilityVersion>

[2]:
[root@rhhi ansible]# ansible localhost -m ovirt_vms -e @cred.yml --args='auth={{ ovirt_auth }} name=faye-pulanco.crazy
.lab io_threads=1'   


[3]:
root@rhhi ansible]# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select num_of_io_threads,custom_compatibility_version from vm_static where vm_name='faye-pulanco.crazy.lab'" num_of_io_threads | custom_compatibility_version
-------------------+------------------------------
                 0 | 4.1
(1 row)

[root@rhhi ansible]# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -t -c "select vm_configuration from snapshots where snapshot_type='NEXT_RUN' and vm_id=(select vm_guid from vm_static where vm_name='faye-pulanco.crazy.lab');"| cut -c 2- | xmllint --format - | grep CompatibilityVersion
    <CustomCompatibilityVersion>4.1</CustomCompatibilityVersion>
    <ClusterCompatibilityVersion>4.2</ClusterCompatibilityVersion>

Comment 1 Michal Skrivanek 2018-11-14 09:17:04 UTC
Seems the ansible fix of bug 1639894 wasn't entirely complete? Ondro?

It took me some time to understand:) But the problem is that after step 2 the temporary CustomCompatibilityVersion is written to the NEXT_RUN which then cause problems when the changes are applied on reboot.

The intended behavior is that during step 2 the NEXT_RUN is created (copied from current, and run through UpdateVmCommand to match the new cluster parameters) and the current running config is modified to CustomCompatibilityVersion=<old cluster version> and it's otherwise unchanged. That is then thrown away on reboot.

Comment 2 Ondra Machacek 2018-11-14 12:55:48 UTC
Well, I am unsure I understand. The fix in bug 1639894, is about the check if the update should be done, not doing the actual update. The PR which is solving it is here:

  https://github.com/ansible/ansible/pull/48286/

So if you want to update the VM and keep the old next_run configurations, you should pass next_run=true, if you want to remove all previsou configuration and do update to current configuration just pass next_run=False.

Comment 3 Michal Skrivanek 2018-11-14 13:50:35 UTC
ah, it makes sense, actually.
when the first change is done (by cluster update) it stores CustomCompatibilityVersion to the current config which is supposed to be overwritten later when next_run is applied.
But if you make another permanent configuration change in the meantime it creates a new next_run which now contains everything from the current config - i.e. also CustomCompatibilityVersion. 
And then it's all wrong as on reboot it will get applied and becomes permanent.

Comment 5 Shmuel Melamud 2018-12-12 10:07:16 UTC
(In reply to Steffen Froemer from comment #0)
> [root@rhhi ansible]# ansible localhost -m ovirt_vms -e @cred.yml
> --args='auth={{ ovirt_auth }} name=faye-pulanco.crazy.lab io_threads=1'

Did you try to execute update with next_run=1 parameter? Does it work properly in this situation?

Shmuel

Comment 6 Steffen Froemer 2018-12-20 10:21:35 UTC
(In reply to Shmuel Melamud from comment #5)
> Did you try to execute update with next_run=1 parameter? Does it work
> properly in this situation?
> 

Yes, after using the not already general available ovirt_vm module, I was able to update the system without causing an issue.
But I'm not confirming this as solution for the initial problem.

How a customer can be sure, he need to use this flag for his configuration change via API?
Imagine there are different responsibilities for managing the ClusterUpgrade version and Configuration of virtual machines. In big old companies, those guys would not talk each other, so they would not know of the changes. In the end, this problem will be caused again.

Are there plans to cover this?

Comment 7 Shmuel Melamud 2018-12-20 12:26:01 UTC
(In reply to Steffen Froemer from comment #6)
> Are there plans to cover this?

Yes, I've already posted the patch.

Comment 10 Michal Skrivanek 2019-01-02 12:25:19 UTC
Shmuel, can you add proper doc text please?

Comment 13 Liran Rotenberg 2019-01-16 19:33:38 UTC
Verified on:
ovirt-engine-4.3.0-0.8.rc2.el7.noarch

Steps:
1. Install RHV Cluster in Version 4.2 and start a VM in this cluster
2. Increase ClusterCompatibilityVersion to 4.3. The VM will be markes with outstanding config-change and CustomCompatibilityVersion is set on active configuration [1]
3. Make an additional configuration change through API. For example, iothreads=1 [2]
4. Check results [3]

Results:

In [1] and [2]:

# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select num_of_io_threads,custom_compatibility_version from vm_static where vm_name='test'" num_of_io_threads | custom_compatibility_version 
-------------------+------------------------------
                 0 | 
(1 row)

# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -t -c "select vm_configuration from snapshots where snapshot_type='NEXT_RUN' and vm_id=(select vm_guid from vm_static where vm_name='test');"| cut -c 2- | xmllint --format - | grep CompatibilityVersion
    <CustomCompatibilityVersion>4.3</CustomCompatibilityVersion>
    <ClusterCompatibilityVersion>4.2</ClusterCompatibilityVersion>

In [3]:
# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -c "select num_of_io_threads,custom_compatibility_version from vm_static where vm_name='test'" num_of_io_threads | custom_compatibility_version 
-------------------+------------------------------
                 1 | 4.3
(1 row)

# /usr/share/ovirt-engine/dbscripts/engine-psql.sh -t -c "select vm_configuration from snapshots where snapshot_type='NEXT_RUN' and vm_id=(select vm_guid from vm_static where vm_name='test');"| cut -c 2- | xmllint --format - | grep CompatibilityVersion
-:2: parser error : Start tag expected, '<' not found

The CustomCompatibilityVersion did not persist to old configuration in NEXT_RUN configuration. The changes(iothreads and compatibility version) were set correctly.

Comment 15 errata-xmlrpc 2019-05-08 12:38:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:1085