1425174 – Can't migrate VM in 3.5 cluster (3.6 engine) to 3.6 NGN EL7 (bad version comparison)

Bug 1425174 - Can't migrate VM in 3.5 cluster (3.6 engine) to 3.6 NGN EL7 (bad version comparison)

Summary: Can't migrate VM in 3.5 cluster (3.6 engine) to 3.6 NGN EL7 (bad version comp...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	vdsm
Sub Component:
Version:	3.6.10
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	ovirt-3.6.11
Target Release:	---
Assignee:	Douglas Schilling Landgraf
QA Contact:	Jiri Belka
Docs Contact:
URL:
Whiteboard:
Depends On:	1368364
Blocks:
TreeView+	depends on / blocked

Reported:	2017-02-20 17:41 UTC by Jiri Belka
Modified:	2019-04-28 13:28 UTC (History)
CC List:	14 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-05-09 17:04:32 UTC
oVirt Team:	Node
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
vdsClient -s 0 getVdsCaps (13.14 KB, text/plain) 2017-02-21 09:03 UTC, Jiri Belka	no flags	Details
caps.py (28.02 KB, text/plain) 2017-02-23 05:17 UTC, Douglas Schilling Landgraf	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1368364	medium	CLOSED	Reported Node version and release are incorrect - RHEL version should be reported	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHBA-2017:1211	normal	SHIPPED_LIVE	vdsm 3.6.11 bug fix and enhancement update	2017-05-09 21:03:19 UTC
oVirt gerrit	73297	ovirt-3.6	MERGED	ngn: grab OS version according with /etc/os-release	2017-03-01 08:49:55 UTC
oVirt gerrit	73298	ovirt-3.6	MERGED	vdsm: adding handling for NGN in osinfo.py	2017-03-01 08:49:40 UTC
oVirt gerrit	73403	master	ABANDONED	osinfo: show variant name in osname for ngn hosts	2017-03-08 14:04:36 UTC

Internal Links: 1368364

Description Jiri Belka 2017-02-20 17:41:42 UTC

Description of problem:

For ugprade from an environment having 3.6 engine with 3.5 cluster compat level with EL6 (ie. 3.5) vdsm hosts to 4.x, we have found a path which would include adding 3.6 NGN (ie EL7 based) hypervisor (NGN, not legacy hypervisor) into this 3.5 cluster, then with help of InClusterUpgrade trick we would migrate VMs from EL6 hosts to NGN (EL7)...

...but, there's following issue. I could not migrate the VMs from 3.6 NGN, which is SPM happily now, because there seems to be bad comparison of version (maybe we were thinking about NGN in time of 3.6):

The flow was (migration of course):

~~~
Operation Canceled

Error while executing action: 

upgrade1:
Cannot migrate VM. There is no host that satisfies current scheduling constraints. See below for details:
The host dell-r210ii-03 did not satisfy internal filter InClusterUpgrade because its OS version is too old, found RHEL.
~~~

Huh? What is it talking about?

# grep -i 'runs.*too.*old' /var/log/ovirt-engine/engine.log   | tail -n1
2017-02-20 18:27:18,384 DEBUG [org.ovirt.engine.core.bll.scheduling.policyunits.CpuLevelFilterPolicyUnit] (ajp-/127.0.0.1:8702-5) [1a0ed059] Host a0a854d1-55ba-4918-aa6d-1b9bf79c0edf runs a too old OS version. Found RHEL - 3.6 - 0.2.el7
                                              ^^^^^^^ what is this?

# rpm -qa \*release\*
redhat-release-virtualization-host-content-3.6-0.2.el7.x86_64
redhat-release-virtualization-host-3.6-0.2.el7.x86_64
                                       ^^^^^^^ same pattern

A comparison with plain EL7 vdsm host:

# rpm -qa redhat-release-server
redhat-release-server-7.3-7.el7.x86_64
                          ^^^^^ 
It seems code could be matching 'redhat-release-.*' to $major.$minor-$something.$elvariant where '$something.$elvariant' "format style" is something the code did not expect.

This impacts a flow we are thinking to deliver as a way to migration from 3.5 EL6 to 4.x env.

Version-Release number of selected component (if applicable):
rhevm-3.6.10.1-0.1.el6.noarch

How reproducible:
just happens

Steps to Reproduce:
1. 3.6 engine, 3.5 cluster with a EL6 host
2. add 3.6 NGN
3. migrate VMs from EL6 to EL7 while utilizing InClusterUpgrade policy

Actual results:
engine detects 3.6 NGN as an old host, bad version comparison

Expected results:
should work, 3.6 NGN is for sure newer than 3.5 vdsm

Additional info:
see docs for InClusterUpgrade in Upgrade Guide

Comment 3 Jiri Belka 2017-02-20 17:48:59 UTC

Stopping and starting the VMs on 3.6 NGN works fine but this ruins our goal anyway, ie. migration without big impact.

Comment 4 Martin Sivák 2017-02-21 08:46:21 UTC

The version "RHEL - 3.6 - 0.2.el7" comes directly from host.getHostOs(). Can you execute `vdsClient -s 0 getVdsCaps` and check what is reported there?

Comment 5 Jiri Belka 2017-02-21 09:03:26 UTC

Created attachment 1256004 [details]
vdsClient -s 0 getVdsCaps

Comment 7 Martin Sivák 2017-02-21 13:38:46 UTC

Yep, as I assumed:

operatingSystem = {'name': 'RHEL', 'release': '0.2.el7', 'version': '3.6'}

This does not look right and RHEL 3.6 is really lower than RHEL 6 as far as scheduler is concerned :) So we need to send this bug to whoever is responsible for the host version detection.

Comment 8 Dan Kenigsberg 2017-02-21 13:51:10 UTC

mperina, isn't it a dup of another ngn-related bug we've discussed?

Comment 9 Martin Perina 2017-02-21 13:57:59 UTC

(In reply to Dan Kenigsberg from comment #8)
> mperina, isn't it a dup of another ngn-related bug we've discussed?

Yes, from the description above it seems to me it's caused by BZ1368364. This is the 1st time we see that there are worse consequences than displaying bad version in webadmin ...

Comment 10 Douglas Schilling Landgraf 2017-02-23 05:17:34 UTC

Created attachment 1256766 [details]
caps.py

Comment 11 Douglas Schilling Landgraf 2017-02-23 05:23:27 UTC

Hi Jiri, 
 
Could you please test the caps.py attached in your ngn 3.6?

I would recommend the following:

  - Grab current operatingSystem data
    # vdsClient -s 0 getVdsCaps | grep operatingSystem

  - Backup current caps.py 
    # cp /usr/share/vdsm/caps.py /usr/share/vdsm/caps.py.bkp

  - Replace caps.py
    # cp /path/new/caps.py /usr/share/vdsm/caps.py

  - Restart vdsmsd 
    # systemctl restart vdsmd.service

  - Grab new operatingSystem and compare with the old one
    # vdsClient -s 0 getVdsCaps | grep operatingSystem

  - Try again the vm migration

Thanks!

Comment 12 Jiri Belka 2017-02-27 13:39:19 UTC

(In reply to Douglas Schilling Landgraf from comment #11)
> Hi Jiri, 
>  
> Could you please test the caps.py attached in your ngn 3.6?
> 
> I would recommend the following:
> 
>   - Grab current operatingSystem data
>     # vdsClient -s 0 getVdsCaps | grep operatingSystem

# vdsClient -s 0 getVdsCaps | grep operatingSystem
        operatingSystem = {'name': 'RHEL', 'release': '0.2.el7', 'version': '3.6'}

>   - Backup current caps.py 
>     # cp /usr/share/vdsm/caps.py /usr/share/vdsm/caps.py.bkp
> 
>   - Replace caps.py
>     # cp /path/new/caps.py /usr/share/vdsm/caps.py
> 
>   - Restart vdsmsd 
>     # systemctl restart vdsmd.service
> 
>   - Grab new operatingSystem and compare with the old one
>     # vdsClient -s 0 getVdsCaps | grep operatingSystem

]# vdsClient -s 0 getVdsCaps | grep operatingSystem
        operatingSystem = {'name': 'RHEL', 'release': '0.2.el7', 'version': '7.3'}

>   - Try again the vm migration

works fine.

> 
> Thanks!

Comment 15 Jiri Belka 2017-04-03 12:42:19 UTC

ok, based on #13

redhat-virtualization-host-image-update-placeholder-3.6-0.2.el7.noarch

Comment 17 errata-xmlrpc 2017-05-09 17:04:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:1211

Comment 18 Ari Lemmke 2017-05-12 09:09:19 UTC

Had this problem too.

First installed the latest 4.0 series like 4.0.7

But then checked in your document that 7.3 based is not supported with 3.5 cluster
(see https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html/Installation_Guide/Host_Compatibility_Matrix.html table 8.1)

So installed first 4.0 hypervisor. And got this weird problem because it presented itself as version "4.0".

This quick hack solved the problem:

      caps['operatingSystem'] = osinfo.version()
+     caps['operatingSystem']['version'] = "7.2"

I do understand that you there in RH are protecting people from themselves which makes me frustrated because have been doing this work before most of you there have even been born. (pun intended)

This problem should have been fixed long long time ago.

Note You need to log in before you can comment on or make changes to this bug.