Description of problem: We recently updated from 4.0 to 4.1-beta and after trying to switch cluster to 4.1 I get this warning (FYI the issue is in second popup): ~~~ Change Cluster Compatibility Version All running VMs will be temporarily reconfigured to use the previous cluster compatibility version and marked pending configuration change. In order to change the cluster compatibility version of the VM to a new version, the VM needs to be manually shut down and restarted. There are 55 running VM(s) affected by this change. Are you sure you want to change the Cluster Compatibility Version? ~~~ Clicking [OK] and getting then second popup: ~~~ Operation Canceled Error while executing action: Cannot edit Cluster. Maximum memory (24576MB) cannot exceed platform limit (20480MB) ~~~ This is unfortunate surprise :/ Version-Release number of selected component (if applicable): ovirt-engine-4.1.0.4-0.1.el7.noarch How reproducible: just happens Steps to Reproduce: 1. have old setup (our was 3.5 -> 3.6 -> 4.0) 2. upgrade to 4.1-beta 3. switch cluster from 4.0 to 4.1 Actual results: not possible to upgrade cluster compat level to 4.1 Expected results: should work Additional info: if there's an issue the message should be at least more clear. FYI before the error (which appears also in Events main-tab), there's a huge number of vm reconfiguration events, eg: VM lbednar-rhset1 configuration was updated by system. So the action, even it ended in "operation canceled", was messing with current configuration, ie. it updated configuration of VMs. Strage, shouldn't some precheck be done before doing an action?
engine=# select vm_name,os,max_memory_size_mb from vms where max_memory_size_mb > 20480 order by max_memory_size_mb desc; vm_name | os | max_memory_size_mb ------------------+----+-------------------- brq-openldap | 0 | 4194304 brq-rhosci | 19 | 4194304 om-ovirt | 24 | 4194304 om-openstack | 5 | 4194304 om-wgt | 24 | 4194304 selenium | 19 | 4194304 brq-ipa | 19 | 4194304 om-ad-child2 | 25 | 4194304 brq-w2k8r2 | 17 | 4194304 brq-w2k12r2 | 25 | 4194304 lbednar-rhset1 | 19 | 4194304 HostedEngine | 5 | 65536 brq-dev | 5 | 49152 lleistne-engine1 | 0 | 32768 jboss-eap-qe01 | 19 | 32768 ps-ovirt | 19 | 32768 lleistne-edb | 24 | 32768 cfme-552 | 24 | 32768 pn-win8.1 | 21 | 32768 pk-e5 | 24 | 32000 ps-rh6 | 19 | 28672 rhci-cfme-wk1 | 18 | 24576 rhci-cfme-candu | 18 | 24576 brq-update | 19 | 24480 mo-update | 19 | 24480 pbal-engine36 | 19 | 20516 pbal-engine | 19 | 20516 gr-rhev35_1 | 19 | 20516 gr-rhev35 | 19 | 20516 selenium-nodes | 19 | 20516 (30 rows) os = '18' = RHEL6 32bit.
(In reply to Jiri Belka from comment #2) > engine=# select vm_name,os,max_memory_size_mb from vms where > max_memory_size_mb > 20480 order by max_memory_size_mb desc; > vm_name | os | max_memory_size_mb > ------------------+----+-------------------- ... > rhci-cfme-wk1 | 18 | 24576 > rhci-cfme-candu | 18 | 24576 ... > (30 rows) > > os = '18' = RHEL6 32bit. Switching to 64bit, after that cluster compat level bump up was successful.
I suppose those VMs were actually imported. Can you confirm how and when they were created? The limit didn't chaange for quite some time so it likely bypassed the checks back then.
(In reply to Michal Skrivanek from comment #4) > I suppose those VMs were actually imported. Can you confirm how and when > they were created? The limit didn't chaange for quite some time so it likely > bypassed the checks back then. our env has long history, it used to be - iirc - 3.0 and in 3.5 it was migrated to SHE. engine=# select vm_name,max_memory_size_mb,vmt_name,creation_date,vmt_creation_date,last_start_time,last_stop_time from vms where vm_name = 'rhci-cfme-wk1'; -[ RECORD 1 ]------+--------------------------- vm_name | rhci-cfme-wk1 max_memory_size_mb | 24576 vmt_name | Blank creation_date | 2014-11-18 11:10:27.635-05 vmt_creation_date | 2008-03-31 18:00:00-04 last_start_time | 2015-08-09 23:39:32.339-04 last_stop_time | 2015-12-17 10:55:48.59-05 I have no other info about history of those VMs.
the check for max mem was added in ~ 3.5 so if it was imported before that it is possible it's wrong all that time. Since the name is "rhch-cfme" indicates it might be an external OVF imported into oVirt and that settings were wrong in that OVF. That is quite likely because there were (and I think still are) quite a few differences in the image CFME produces and what we are expecting. This should not happen for oVirt/RHV exported VMs Improved logging in these cases are being tracked as bug 1418641 *** This bug has been marked as a duplicate of bug 1418641 ***
(In reply to Michal Skrivanek from comment #6) > the check for max mem was added in ~ 3.5 so if it was imported before that > it is possible it's wrong all that time. > Since the name is "rhch-cfme" indicates it might be an external OVF imported > into oVirt and that settings were wrong in that OVF. That is quite likely > because there were (and I think still are) quite a few differences in the > image CFME produces and what we are expecting. > This should not happen for oVirt/RHV exported VMs > I have to reopen this issue as I just stumble upon it on a different QE production system. While CFME deployment on a RHEV system is based on an imported VM, I do expect we will hit this if a fix won't be provided. engine=# select vm_name,os from vms where os = 18; vm_name | os ----------------------+---- rhci-cfme-prod-wk1 | 18 rhci-cfme-prod-candu | 18 rhci-cfme-prod-db | 18 (3 rows)
Gil - can you verify if the CFME OVF has a wrong XML? Michal - I believe we need to address this somehow during upgrade and fix this.
since 4.1.1 it should start complaining during import if the value is not correct: https://gerrit.ovirt.org/#/c/69741/ But if the VM was imported before, the value may be incorrect. Doing some magic fixes of the wrong values does not sound too good to me. We should rather implement the bug 1418641 to give the user a better chance to understand what is wrong during the update and give him a chance to solve...
we're not fixing old imports, without knowing when exactly it was imported and what kind of OVF was used we can't really do much. It is an invalid VM and that happened already during the import. I believe current available CFME templates do not have this 32bit OS anymore, so it is not a problem anymore I'll keep the bug opened for a while for further thoughts than, but this is nothing Urgent/Urgent, it's not a Regression, and I will close this again if no further comments are received
(In reply to Michal Skrivanek from comment #10) > we're not fixing old imports, without knowing when exactly it was imported > and what kind of OVF was used we can't really do much. It is an invalid VM > and that happened already during the import. I believe current available > CFME templates do not have this 32bit OS anymore, so it is not a problem > anymore > > I'll keep the bug opened for a while for further thoughts than, but this is > nothing Urgent/Urgent, it's not a Regression, and I will close this again if > no further comments are received engine-setup already does couple of checks, what about to add this into checks for there's a warning before they do upgrade?
(In reply to Jiri Belka from comment #11) > (In reply to Michal Skrivanek from comment #10) > > we're not fixing old imports, without knowing when exactly it was imported > > and what kind of OVF was used we can't really do much. It is an invalid VM > > and that happened already during the import. I believe current available > > CFME templates do not have this 32bit OS anymore, so it is not a problem > > anymore > > > > I'll keep the bug opened for a while for further thoughts than, but this is > > nothing Urgent/Urgent, it's not a Regression, and I will close this again if > > no further comments are received > > engine-setup already does couple of checks, what about to add this into > checks for there's a warning before they do upgrade? Agreed. I feel it's better to fail upgrade on this, then to fail a 'day 2 operation'.
(In reply to Yaniv Kaul from comment #12) > (In reply to Jiri Belka from comment #11) > > (In reply to Michal Skrivanek from comment #10) > > > we're not fixing old imports, without knowing when exactly it was imported > > > and what kind of OVF was used we can't really do much. It is an invalid VM > > > and that happened already during the import. I believe current available > > > CFME templates do not have this 32bit OS anymore, so it is not a problem > > > anymore > > > > > > I'll keep the bug opened for a while for further thoughts than, but this is > > > nothing Urgent/Urgent, it's not a Regression, and I will close this again if > > > no further comments are received > > > > engine-setup already does couple of checks, what about to add this into > > checks for there's a warning before they do upgrade? > > Agreed. I feel it's better to fail upgrade on this, then to fail a 'day 2 > operation'. I don't think the engine-setup is a good place for this. There are lots of checks which need to be done on a VM which are in UpdateVmCommand.validate() - I would not try to re-implement them in SQL scripts on update because: - we will never be able to implement all and keep them up-to-date in the long run - the flow for the user would be: turn engine off, run engine setup, look at the failed VMs, turn engine on, fix vms, turn engine off, run engine setup. Not a great user experience. Especially if the second engine setup fails on some other check again. I think the biggest issue is the incorrect reporting and not sufficient help given to user how to solve the issue. That should be addressed here: https://bugzilla.redhat.com/show_bug.cgi?id=1418641#c5
OK, marking it as dup of 1418641 - it is a generic solution for all this kinds of problems. *** This bug has been marked as a duplicate of bug 1418641 ***