Bug 1710740
Summary: | [downstream clone - 4.3.5] Do not change DC level if there are VMs running/paused with older CL. | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | RHV bug bot <rhv-bugzilla-bot> |
Component: | ovirt-engine | Assignee: | shani <sleviim> |
Status: | CLOSED ERRATA | QA Contact: | Polina <pagranat> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.2.6 | CC: | aefrat, emarcus, frolland, gveitmic, klaas, michal.skrivanek, mkalinin, pagranat, rbarry, Rhev-m-bugs, sleviim, tnisan |
Target Milestone: | ovirt-4.3.4 | Keywords: | ZStream |
Target Release: | 4.3.1 | Flags: | lsvaty:
testing_plan_complete-
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ovirt-engine-4.3.4.1 | Doc Type: | Bug Fix |
Doc Text: |
Updating the Data Center level while the virtual machine was suspended, resulted in the virtual machine not resuming activity following the update.
In this release, the suspended virtual machine must be resumed before the Data Center level update.
Otherwise, the operation fails.
|
Story Points: | --- |
Clone Of: | 1693813 | Environment: | |
Last Closed: | 2019-06-20 14:48:33 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1693813 | ||
Bug Blocks: |
Description
RHV bug bot
2019-05-16 08:18:17 UTC
There was an older issue about customcompatibilityversion which is still present in 4.2.6, but I'd to be clear, the cluster was no updated while these were paused, correct? If not, please update to the latest 4.2 (Originally by Ryan Barry) Hi, yes; cluster was upgraded a while back (December). A lot of the VMs were not rebooted because usually there is no need to do that asap (and docs don't suggest I would need to do that asap). Also 4.2.7/8 release notes do not show that there is a fix for a problem of this magnitude... But lets sum this up: 1) the VMs should have resumed; even if they are still running with 4.1 compatibility because they have not been rebooted yet 2) they haven't because of a known bug in 4.2.6 manager that is not mentioned in release notes? 3) Could you point me to the bz that shows this problem? 4) Bonus question: Could I have resumed them with virsh on the hypervisors? Greetings Klaas (Originally by klaas) (In reply to Ryan Barry from comment #3) > There was an older issue about customcompatibilityversion which is still > present in 4.2.6, but I'd to be clear, the cluster was no updated while > these were paused, correct? > > If not, please update to the latest 4.2 it's a DC version, not Cluster version which fails the validation. We do not have a custom DC version support. Generally an upgrade of DC should be prevented if there are VMs in earlier Cluster levels running. It's likely that they did upgrade DC level while there were VMs running after a Cluster update to 4.2 (i.e. with temporary 4.1 custom level). IMHO we shouldn't allow DC upgrade while there are VMs running (including Paused) in CL<DC(including custom level override). DC upgrade validation is Storage, Tal, can you comment on what's the desired behavior around DC level upgrade? (Originally by michal.skrivanek) (In reply to Klaas Demter from comment #4) > Hi, > yes; cluster was upgraded a while back (December). A lot of the VMs were not > rebooted because usually there is no need to do that asap (and docs don't > suggest I would need to do that asap). The wording changed in 4.2 to make it a bit more clear: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/upgrade_guide/changing_the_cluster_compatibility_version_3-6_local_db Also 4.2.7/8 release notes do not > show that there is a fix for a problem of this magnitude... > > But lets sum this up: > > 1) the VMs should have resumed; even if they are still running with 4.1 > compatibility because they have not been rebooted yet no, because apparently you updated DC in the meantime. VMs in CL 4.1 are not supported to run in a DC 4.2 > 2) they haven't because of a known bug in 4.2.6 manager that is not mentioned in release notes? no, because of a missing validation in DC update it seems. I would swear there was a bug about that but can't find it now. Tal? > 3) Could you point me to the bz that shows this problem? > 4) Bonus question: Could I have resumed them with virsh on the hypervisors? likely yes. It's already in an unsupported situation because DC is 4.2 already. There's no difference for running vs unpausing(still the same qemu process - not to be consused with suspend/resume) so it would be very likely fine to "cont" it via virsh. (Originally by michal.skrivanek) Okay, so the problem is that I can upgrade a DC even if there are still hosts running on a lower compatibility version inside the datacenter -- can't you check for that or at least warn about that? I have to admit I have read the docs multiple times and that was not clear to me. (Originally by klaas) after reading the current docs again I would still argue it does not explicitly say that I need to reboot all VMs before changing the DC compatibility version: https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/upgrade_guide/changing_the_cluster_compatibility_version_3-6_local_db "After you update the cluster’s compatibility version, you must update the cluster compatibility version of all running or suspended virtual machines by restarting them from within the Manager, or using the REST API, instead of within the guest operating system. Virtual machines will continue to run in the previous cluster compatibility level until they are restarted. Those virtual machines that require a restart are marked with the pending changes icon ( pendingchanges ). You cannot change the cluster compatibility version of a virtual machine snapshot that is in preview; you must first commit or undo the preview." "Once you have updated the compatibility version of all clusters in a data center, you can then change the compatibility version of the data center itself." This states I must updated them; it does not say I need to do that immediately or before upgrading the DC. https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/html/upgrade_guide/changing_the_data_center_compatibility_version_3-6_local_db "To change the data center compatibility version, you must have first updated all the clusters in your data center to a level that supports your desired compatibility level." also no word about the need to update the VMs before doing this. Side note: "you must update the cluster compatibility version of all running or suspended virtual machines by restarting them from within the Manager, or using the REST API, instead of within the guest operating system" this should be obsolete on all systems that have guest agents installed since 4.2; the reboot should be noticed and transformed to a cold reboot (https://bugzilla.redhat.com/show_bug.cgi?id=1512619) Greetings Klaas (Originally by klaas) (In reply to Klaas Demter from comment #8) > after reading the current docs again I would still argue it does not > explicitly say that I need to reboot all VMs before changing the DC > compatibility version: > > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/ > html/upgrade_guide/changing_the_cluster_compatibility_version_3-6_local_db > "After you update the cluster’s compatibility version, you must update the > cluster compatibility version of all running or suspended virtual machines > by restarting them from within the Manager, or using the REST API, instead > of within the guest operating system. Virtual machines will continue to run > in the previous cluster compatibility level until they are restarted. Those > virtual machines that require a restart are marked with the pending changes > icon ( pendingchanges ). You cannot change the cluster compatibility version > of a virtual machine snapshot that is in preview; you must first commit or > undo the preview." "you must update the cluster compatibility version of all running or suspended virtual machines by restarting them from within the Manager" What is unclear about this? Myabe we need a docs update. > "Once you have updated the compatibility version of all clusters in a data > center, you can then change the compatibility version of the data center > itself." > > This states I must updated them; it does not say I need to do that > immediately or before upgrading the DC. From just below in your comment "to change the DC compatibility version, you must have first..." So, yes, you need to do that before upgrading the DC. > > https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/ > html/upgrade_guide/changing_the_data_center_compatibility_version_3- > 6_local_db > "To change the data center compatibility version, you must have first > updated all the clusters in your data center to a level that supports your > desired compatibility level." > > also no word about the need to update the VMs before doing this. It was in the first part of your comment. Specifically, that all running or suspended VMs need to be rebooted, and they may also need configuration updates. > > Side note: "you must update the cluster compatibility version of all running > or suspended virtual machines by restarting them from within the Manager, or > using the REST API, instead of within the guest operating system" this > should be obsolete on all systems that have guest agents installed since > 4.2; the reboot should be noticed and transformed to a cold reboot > (https://bugzilla.redhat.com/show_bug.cgi?id=1512619) > > Greetings > Klaas Ultimately, the bug here seems to be that it was possible to initiate a DC-level update without following the steps above. That paused VMs fail to come back up (and fail validation) is a side effect of this. That's expected behavior, but it's unexpected that a VM would fall through this gap. I would have sworn there was another bug around DC upgrades also, but these may also be relevant: https://bugzilla.redhat.com/show_bug.cgi?id=1649685 https://bugzilla.redhat.com/show_bug.cgi?id=1662921 In either case, if configuration updates were performed over the API, it _may_ have kept one of these on an older version. But, in general, the failure to resume here is probably NOTABUG. Instead, it should have failed validation on the DC upgrade. What's the expected behavior here? (Originally by Ryan Barry) (In reply to Ryan Barry from comment #9) [...] > > "you must update the cluster compatibility version of all running or > suspended virtual machines by restarting them from within the Manager" > > What is unclear about this? Myabe we need a docs update. It does not say this is a prerequisite for continuing as it does with "change compatibility of the cluster" so I assumed that is not immediately needed. [..] > > Ultimately, the bug here seems to be that it was possible to initiate a > DC-level update without following the steps above. That paused VMs fail to > come back up (and fail validation) is a side effect of this. That's expected > behavior, but it's unexpected that a VM would fall through this gap. I fully agree with this assesment, dc upgrade should not be possible; the error is just a result of this being possible. > > I would have sworn there was another bug around DC upgrades also, but these > may also be relevant: > > https://bugzilla.redhat.com/show_bug.cgi?id=1649685 > https://bugzilla.redhat.com/show_bug.cgi?id=1662921 > > In either case, if configuration updates were performed over the API, it > _may_ have kept one of these on an older version. But, in general, the > failure to resume here is probably NOTABUG. Instead, it should have failed > validation on the DC upgrade. What's the expected behavior here? I do not perform changes via api; all is done by rhvm itsself and my changes come through the web-ui for now. This bug can either be closed as NOTABUG or transformed into "dc upgrade should not be possible if VMs still have older cluster compatibility version" (Originally by klaas) Tal, thoughts on the final part of this? Neither Michal nor I can find an appropriate bug, but this should definitely be blocked (Originally by Ryan Barry) Tal? (Originally by Ryan Barry) There's also a typo ACTION_TYPE_FAILED_VM_COMATIBILITY_VERSION_NOT_SUPPORTED : COMATIBILITY -> COMPATIBILITY (Originally by Sandro Bonazzola) Shani, please check the discussion on rhev-tech about upgrading CL. What should we do about paused VMs? (Originally by Fred Rolland) (In reply to Fred Rolland from comment #17) > Shani, please check the discussion on rhev-tech about upgrading CL. > What should we do about paused VMs? We did PowerOff -> PowerOn but as #6 suggest - maybe you can use virsh to resume the VMs. (Originally by klaas) WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Found non-acked flags: '{'rhevm-4.3.z': '?'}', ] For more info please contact: rhv-devops Verified on ovirt-engine-4.4.0-0.0.master.20190509133331.gitb9d2a1e.el7.noarch. The scenario is: 1. create a DC with an 'old' version (4.1/4.2). 2. create a cluster with 4.1/4.2 version. 3. create a host on the DC and create a VM on the cluster. 4. run the VM and suspend it. Also tried pause vm with blocking storage on host which causes IO error pause. 5. upgrade the cluster to a newer version (VM is still paused). Tried the following updates - 4.1 -> 4.2->4.3->4.4; 4.1->4.3 6. try to update the DC.run the suspended VM . for the paused - delete the blocking rule and see that the vm is running again after the DC is updated Avihai, can you ack? (In reply to Fred Rolland from comment #22) > Avihai, can you ack? Looks like Polina already did the QE work, and scenario looks like virtish . Polina , can you ack it(as you already tested it) ? verified according to the https://bugzilla.redhat.com/show_bug.cgi?id=1710740#c21 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2019:1566 sync2jira sync2jira |