Bug 1425174 - Can't migrate VM in 3.5 cluster (3.6 engine) to 3.6 NGN EL7 (bad version comparison)
Summary: Can't migrate VM in 3.5 cluster (3.6 engine) to 3.6 NGN EL7 (bad version comp...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.6.10
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ovirt-3.6.11
: ---
Assignee: Douglas Schilling Landgraf
QA Contact: Jiri Belka
URL:
Whiteboard:
Depends On: 1368364
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-20 17:41 UTC by Jiri Belka
Modified: 2019-04-28 13:28 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-09 17:04:32 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vdsClient -s 0 getVdsCaps (13.14 KB, text/plain)
2017-02-21 09:03 UTC, Jiri Belka
no flags Details
caps.py (28.02 KB, text/plain)
2017-02-23 05:17 UTC, Douglas Schilling Landgraf
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1368364 0 medium CLOSED Reported Node version and release are incorrect - RHEL version should be reported 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2017:1211 0 normal SHIPPED_LIVE vdsm 3.6.11 bug fix and enhancement update 2017-05-09 21:03:19 UTC
oVirt gerrit 73297 0 ovirt-3.6 MERGED ngn: grab OS version according with /etc/os-release 2017-03-01 08:49:55 UTC
oVirt gerrit 73298 0 ovirt-3.6 MERGED vdsm: adding handling for NGN in osinfo.py 2017-03-01 08:49:40 UTC
oVirt gerrit 73403 0 master ABANDONED osinfo: show variant name in osname for ngn hosts 2017-03-08 14:04:36 UTC

Internal Links: 1368364

Description Jiri Belka 2017-02-20 17:41:42 UTC
Description of problem:

For ugprade from an environment having 3.6 engine with 3.5 cluster compat level with EL6 (ie. 3.5) vdsm hosts to 4.x, we have found a path which would include adding 3.6 NGN (ie EL7 based) hypervisor (NGN, not legacy hypervisor) into this 3.5 cluster, then with help of InClusterUpgrade trick we would migrate VMs from EL6 hosts to NGN (EL7)...

...but, there's following issue. I could not migrate the VMs from 3.6 NGN, which is SPM happily now, because there seems to be bad comparison of version (maybe we were thinking about NGN in time of 3.6):

The flow was (migration of course):

~~~
Operation Canceled

Error while executing action: 

upgrade1:
Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:
The host dell-r210ii-03 did not satisfy internal filter InClusterUpgrade because its OS version is too old, found RHEL.
~~~

Huh? What is it talking about?

# grep -i 'runs.*too.*old' /var/log/ovirt-engine/engine.log   | tail -n1
2017-02-20 18:27:18,384 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.CpuLevelFilterPolicyUnit] (ajp-/127.0.0.1:8702-5) [1a0ed059] Host a0a854d1-55ba-4918-aa6d-1b9bf79c0edf runs a too old OS version. Found RHEL - 3.6 - 0.2.el7
                                              ^^^^^^^ what is this?

# rpm -qa \*release\*
redhat-release-virtualization-host-content-3.6-0.2.el7.x86_64
redhat-release-virtualization-host-3.6-0.2.el7.x86_64
                                       ^^^^^^^ same pattern

A comparison with plain EL7 vdsm host:

# rpm -qa redhat-release-server
redhat-release-server-7.3-7.el7.x86_64
                          ^^^^^ 
It seems code could be matching 'redhat-release-.*' to $major.$minor-$something.$elvariant where '$something.$elvariant' "format style" is something the code did not expect.

This impacts a flow we are thinking to deliver as a way to migration from 3.5 EL6 to 4.x env.

Version-Release number of selected component (if applicable):
rhevm-3.6.10.1-0.1.el6.noarch

How reproducible:
just happens

Steps to Reproduce:
1. 3.6 engine, 3.5 cluster with a EL6 host
2. add 3.6 NGN
3. migrate VMs from EL6 to EL7 while utilizing InClusterUpgrade policy

Actual results:
engine detects 3.6 NGN as an old host, bad version comparison

Expected results:
should work, 3.6 NGN is for sure newer than 3.5 vdsm

Additional info:
see docs for InClusterUpgrade in Upgrade Guide

Comment 3 Jiri Belka 2017-02-20 17:48:59 UTC
Stopping and starting the VMs on 3.6 NGN works fine but this ruins our goal anyway, ie. migration without big impact.

Comment 4 Martin Sivák 2017-02-21 08:46:21 UTC
The version "RHEL - 3.6 - 0.2.el7" comes directly from host.getHostOs(). Can you execute `vdsClient -s 0 getVdsCaps` and check what is reported there?

Comment 5 Jiri Belka 2017-02-21 09:03:26 UTC
Created attachment 1256004 [details]
vdsClient -s 0 getVdsCaps

Comment 7 Martin Sivák 2017-02-21 13:38:46 UTC
Yep, as I assumed:

operatingSystem = {'name': 'RHEL', 'release': '0.2.el7', 'version': '3.6'}

This does not look right and RHEL 3.6 is really lower than RHEL 6 as far as scheduler is concerned :) So we need to send this bug to whoever is responsible for the host version detection.

Comment 8 Dan Kenigsberg 2017-02-21 13:51:10 UTC
mperina, isn't it a dup of another ngn-related bug we've discussed?

Comment 9 Martin Perina 2017-02-21 13:57:59 UTC
(In reply to Dan Kenigsberg from comment #8)
> mperina, isn't it a dup of another ngn-related bug we've discussed?

Yes, from the description above it seems to me it's caused by BZ1368364. This is the 1st time we see that there are worse consequences than displaying bad version in webadmin ...

Comment 10 Douglas Schilling Landgraf 2017-02-23 05:17:34 UTC
Created attachment 1256766 [details]
caps.py

Comment 11 Douglas Schilling Landgraf 2017-02-23 05:23:27 UTC
Hi Jiri, 
 
Could you please test the caps.py attached in your ngn 3.6?

I would recommend the following:

  - Grab current operatingSystem data
    # vdsClient -s 0 getVdsCaps | grep operatingSystem

  - Backup current caps.py 
    # cp /usr/share/vdsm/caps.py /usr/share/vdsm/caps.py.bkp

  - Replace caps.py
    # cp /path/new/caps.py /usr/share/vdsm/caps.py

  - Restart vdsmsd 
    # systemctl restart vdsmd.service

  - Grab new operatingSystem and compare with the old one
    # vdsClient -s 0 getVdsCaps | grep operatingSystem

  - Try again the vm migration

Thanks!

Comment 12 Jiri Belka 2017-02-27 13:39:19 UTC
(In reply to Douglas Schilling Landgraf from comment #11)
> Hi Jiri, 
>  
> Could you please test the caps.py attached in your ngn 3.6?
> 
> I would recommend the following:
> 
>   - Grab current operatingSystem data
>     # vdsClient -s 0 getVdsCaps | grep operatingSystem

# vdsClient -s 0 getVdsCaps | grep operatingSystem
        operatingSystem = {'name': 'RHEL', 'release': '0.2.el7', 'version': '3.6'}

>   - Backup current caps.py 
>     # cp /usr/share/vdsm/caps.py /usr/share/vdsm/caps.py.bkp
> 
>   - Replace caps.py
>     # cp /path/new/caps.py /usr/share/vdsm/caps.py
> 
>   - Restart vdsmsd 
>     # systemctl restart vdsmd.service
> 
>   - Grab new operatingSystem and compare with the old one
>     # vdsClient -s 0 getVdsCaps | grep operatingSystem

]# vdsClient -s 0 getVdsCaps | grep operatingSystem
        operatingSystem = {'name': 'RHEL', 'release': '0.2.el7', 'version': '7.3'}

>   - Try again the vm migration

works fine.

> 
> Thanks!

Comment 15 Jiri Belka 2017-04-03 12:42:19 UTC
ok, based on #13

redhat-virtualization-host-image-update-placeholder-3.6-0.2.el7.noarch

Comment 17 errata-xmlrpc 2017-05-09 17:04:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1211

Comment 18 Ari Lemmke 2017-05-12 09:09:19 UTC
Had this problem too.

First installed the latest 4.0 series like 4.0.7

But then checked in your document that 7.3 based is not supported with 3.5 cluster
(see https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Installation_Guide/Host_Compatibility_Matrix.html table 8.1)

So installed first 4.0 hypervisor. And got this weird problem because it presented itself as version "4.0".

This quick hack solved the problem:

      caps['operatingSystem'] = osinfo.version()
+     caps['operatingSystem']['version'] = "7.2"

I do understand that you there in RH are protecting people from themselves which makes me frustrated because have been doing this work before most of you there have even been born. (pun intended)

This problem should have been fixed long long time ago.


Note You need to log in before you can comment on or make changes to this bug.