Description of problem: For ugprade from an environment having 3.6 engine with 3.5 cluster compat level with EL6 (ie. 3.5) vdsm hosts to 4.x, we have found a path which would include adding 3.6 NGN (ie EL7 based) hypervisor (NGN, not legacy hypervisor) into this 3.5 cluster, then with help of InClusterUpgrade trick we would migrate VMs from EL6 hosts to NGN (EL7)... ...but, there's following issue. I could not migrate the VMs from 3.6 NGN, which is SPM happily now, because there seems to be bad comparison of version (maybe we were thinking about NGN in time of 3.6): The flow was (migration of course): ~~~ Operation Canceled Error while executing action: upgrade1: Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details: The host dell-r210ii-03 did not satisfy internal filter InClusterUpgrade because its OS version is too old, found RHEL. ~~~ Huh? What is it talking about? # grep -i 'runs.*too.*old' /var/log/ovirt-engine/engine.log | tail -n1 2017-02-20 18:27:18,384 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.CpuLevelFilterPolicyUnit] (ajp-/127.0.0.1:8702-5) [1a0ed059] Host a0a854d1-55ba-4918-aa6d-1b9bf79c0edf runs a too old OS version. Found RHEL - 3.6 - 0.2.el7 ^^^^^^^ what is this? # rpm -qa \*release\* redhat-release-virtualization-host-content-3.6-0.2.el7.x86_64 redhat-release-virtualization-host-3.6-0.2.el7.x86_64 ^^^^^^^ same pattern A comparison with plain EL7 vdsm host: # rpm -qa redhat-release-server redhat-release-server-7.3-7.el7.x86_64 ^^^^^ It seems code could be matching 'redhat-release-.*' to $major.$minor-$something.$elvariant where '$something.$elvariant' "format style" is something the code did not expect. This impacts a flow we are thinking to deliver as a way to migration from 3.5 EL6 to 4.x env. Version-Release number of selected component (if applicable): rhevm-3.6.10.1-0.1.el6.noarch How reproducible: just happens Steps to Reproduce: 1. 3.6 engine, 3.5 cluster with a EL6 host 2. add 3.6 NGN 3. migrate VMs from EL6 to EL7 while utilizing InClusterUpgrade policy Actual results: engine detects 3.6 NGN as an old host, bad version comparison Expected results: should work, 3.6 NGN is for sure newer than 3.5 vdsm Additional info: see docs for InClusterUpgrade in Upgrade Guide
Stopping and starting the VMs on 3.6 NGN works fine but this ruins our goal anyway, ie. migration without big impact.
The version "RHEL - 3.6 - 0.2.el7" comes directly from host.getHostOs(). Can you execute `vdsClient -s 0 getVdsCaps` and check what is reported there?
Created attachment 1256004 [details] vdsClient -s 0 getVdsCaps
Yep, as I assumed: operatingSystem = {'name': 'RHEL', 'release': '0.2.el7', 'version': '3.6'} This does not look right and RHEL 3.6 is really lower than RHEL 6 as far as scheduler is concerned :) So we need to send this bug to whoever is responsible for the host version detection.
mperina, isn't it a dup of another ngn-related bug we've discussed?
(In reply to Dan Kenigsberg from comment #8) > mperina, isn't it a dup of another ngn-related bug we've discussed? Yes, from the description above it seems to me it's caused by BZ1368364. This is the 1st time we see that there are worse consequences than displaying bad version in webadmin ...
Created attachment 1256766 [details] caps.py
Hi Jiri, Could you please test the caps.py attached in your ngn 3.6? I would recommend the following: - Grab current operatingSystem data # vdsClient -s 0 getVdsCaps | grep operatingSystem - Backup current caps.py # cp /usr/share/vdsm/caps.py /usr/share/vdsm/caps.py.bkp - Replace caps.py # cp /path/new/caps.py /usr/share/vdsm/caps.py - Restart vdsmsd # systemctl restart vdsmd.service - Grab new operatingSystem and compare with the old one # vdsClient -s 0 getVdsCaps | grep operatingSystem - Try again the vm migration Thanks!
(In reply to Douglas Schilling Landgraf from comment #11) > Hi Jiri, > > Could you please test the caps.py attached in your ngn 3.6? > > I would recommend the following: > > - Grab current operatingSystem data > # vdsClient -s 0 getVdsCaps | grep operatingSystem # vdsClient -s 0 getVdsCaps | grep operatingSystem operatingSystem = {'name': 'RHEL', 'release': '0.2.el7', 'version': '3.6'} > - Backup current caps.py > # cp /usr/share/vdsm/caps.py /usr/share/vdsm/caps.py.bkp > > - Replace caps.py > # cp /path/new/caps.py /usr/share/vdsm/caps.py > > - Restart vdsmsd > # systemctl restart vdsmd.service > > - Grab new operatingSystem and compare with the old one > # vdsClient -s 0 getVdsCaps | grep operatingSystem ]# vdsClient -s 0 getVdsCaps | grep operatingSystem operatingSystem = {'name': 'RHEL', 'release': '0.2.el7', 'version': '7.3'} > - Try again the vm migration works fine. > > Thanks!
ok, based on #13 redhat-virtualization-host-image-update-placeholder-3.6-0.2.el7.noarch
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:1211
Had this problem too. First installed the latest 4.0 series like 4.0.7 But then checked in your document that 7.3 based is not supported with 3.5 cluster (see https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Installation_Guide/Host_Compatibility_Matrix.html table 8.1) So installed first 4.0 hypervisor. And got this weird problem because it presented itself as version "4.0". This quick hack solved the problem: caps['operatingSystem'] = osinfo.version() + caps['operatingSystem']['version'] = "7.2" I do understand that you there in RH are protecting people from themselves which makes me frustrated because have been doing this work before most of you there have even been born. (pun intended) This problem should have been fixed long long time ago.