Bug 1472812

Summary: [3.6.z-async] Running engine-host-update.py does not work with RHEVH hosts
Product: Red Hat Enterprise Virtualization Manager Reporter: Lukas Svaty <lsvaty>
Component: ovirt-engineAssignee: Lev Veyde <lveyde>
Status: CLOSED ERRATA QA Contact: Jiri Belka <jbelka>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.6.10CC: amarchuk, apinnick, bugs, cshao, danken, jbelka, lsurette, lsvaty, mkalinin, rbalakri, Rhev-m-bugs, srevivo, ykaul, ylavi
Target Milestone: ovirt-3.6.z-asyncKeywords: ZStream
Target Release: ---   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1503447 (view as bug list) Environment:
Last Closed: 2017-11-27 13:34:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1503447    

Description Lukas Svaty 2017-07-19 12:44:00 UTC
Description of problem:
Running engine-host-update.py for RHVH hosts

Version-Release number of selected component (if applicable):
ovirt-engine-4.1.4.2-0.1.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. run `./engine-host-update.py --insecure --engine=localhost --username=admin@internal --password=mypass` --host=rhevh

Actual results:
RHVH host is reinstalled. Updates still available.

Expected results:
When host is upgraded properly.

Comment 1 Yaniv Lavi 2017-07-31 09:11:17 UTC
Which version is the RHVH host are you trying to update?

Comment 2 Lukas Svaty 2017-07-31 09:47:11 UTC
It was upgrade 4.1.3 -> 4.1.4 candidate.
However problem is not in host, rather in the utility, current flow:

1. Deactivate host
2. Reinstall host
3. Activate host

Reinstall host uses this code:

host.install(
            ovirtsdk.xml.params.Action(
                ssh=ovirtsdk.xml.params.SSH(
                    authentication_method='publickey'
                ),
                host=ovirtsdk.xml.params.Host(override_iptables=True),
            )
        )

My wild guess:
This method reinstalls the current image
a) for rhel hosts installs the packages and redeploy vdsm/libvirt..., which is correct
b)for rhevh it I believe it just reinstalls the current image, even that new image is available. For new image to be installed, upgrade should be used, not install/reinstall.

Comment 3 Yaniv Lavi 2017-08-07 09:06:07 UTC
Do we want to fix this?
When is the Ansible to do this planned to be released?

Comment 4 Dan Kenigsberg 2017-08-21 12:32:58 UTC
(In reply to Yaniv Lavi (Dary) from comment #3)
> Do we want to fix this?

I do. Most of our install base is using RHVH.

> When is the Ansible to do this planned to be released?

4.1.6, according to Bug 1473535

Comment 5 Yaniv Lavi 2017-08-30 09:11:08 UTC
Making this work with the vintage node is more critical, than RHVH.

Comment 6 Jiri Belka 2017-08-31 09:39:42 UTC
(In reply to Yaniv Lavi (Dary) from comment #5)
> Making this work with the vintage node is more critical, than RHVH.

I talked to mperina@ and he clarified how 'install' works.

1. first 'host.install' action does check what is host type:
   - if EL host, it _only_ installs packages which should are defined in
     host-deploy code
   - if node (and legacy?) it supposes it has all packages available

2. even "installing" packages on EL host does _NOT_ update all packages. These
   are defined in the DB (PackageNamesForCheckUpdate), thus simple 'host.install'
   won't update all packages in (PackageNamesForCheckUpdate).

3. for node/ngn I suppose 'host.install' does not touch any packaging and
   upgrade-manager updates 'ovirt-node-ng-image-update'
   ('OvirtNodePackageNamesForCheckUpdate' in DB) to update node/ngn.

Thus to update node/ngn (and to correct vds_type if it has been wrong) it would need to to reinstall ('host.install') and tell upgrade-manager to upgrade it as well.

Comment 7 Lev Veyde 2017-09-26 14:34:06 UTC
Hi Lukas,

Can you please test my latest patch to see if that solves the issue for you?

Comment 8 Lukas Svaty 2017-09-27 07:58:57 UTC
Moving needinfo to Jirka

Comment 9 Jiri Belka 2017-10-03 15:25:09 UTC
- do not reinstall rhevh (legacy) it causes confusing message in engine events. rhevh (legacy) is distributed as an iso, please move 'if vdsType in ('rhev-h', 'RHEV_H'):' a little bit up

Host dell-r210ii-13 installation in progress . Failed to install fluentd packages.Please check the log for details.

Host dell-r210ii-13 installation in progress . Vintage node, skipping kernel arguments..

Host dell-r210ii-13 installation in progress . Cannot validate host name settings, reason: resolved host does not match any of the local addresses.

Comment 10 Jiri Belka 2017-10-03 15:32:29 UTC
...
Performing RHEVH (Legacy) upgrade...
	Installing........................
	Rebooting............................................................*.*.*.*.*Error: RuntimeError('Unable to complete the reinstall operational, host is in mode: non_responsive',)

imo it should make another attempt to get host status, it is quite often that host is a little bit in non-responsive state after upgrade

Comment 11 Jiri Belka 2017-10-04 08:05:44 UTC
this works incorrectly for RHVH (ngn or aka ovirt-node).

it does host.install, that's useless and does not update anything.

508            if vdsType in ('rhev-h', 'RHEV_H'):
509                print('Performing RHEVH (Legacy) upgrade...')
510                upgradeRHevhhost(api, name)
511            verifyHost(api, name)

but...

Processing Host: dell-r210ii-04
Type: ovirt_node
        Moving host to the maintenance.
        Host moved to maintenance.
        Installing........
        ^^^ why?

        Installed.
        ^^^ not upgraded anyting!

        Activating host..
        Host activated.
Requering the host type, type: ovirt_node
        Verifying that host stays up..................
        Verified.
Closed connection.

#6 was ignored here. summary:

- for rhel7 hosts, host.install is ok
- for rhevh legacy, host.upgrade should be used, no host.install at all!
- for rhvh/ovirt-node (ngn), host.upgrade probably with this time without iso

current implementation is bogus.

Comment 12 Lev Veyde 2017-10-16 17:18:12 UTC
Fixed the issues for RHEVH legacy, working to fix the issues for the oVirt NGN as well.

Comment 13 Yaniv Lavi 2017-10-18 08:48:56 UTC
This helper script need to be updated in the KBase, but doesn't require a backport to 3.6.z codebase.

Comment 14 Jiri Belka 2017-10-23 07:38:04 UTC
(In reply to Lev Veyde from comment #12)
> Fixed the issues for RHEVH legacy, working to fix the issues for the oVirt
> NGN as well.

Works fine now for RHEVH-legacy.

Comment 15 Jiri Belka 2017-10-23 09:13:18 UTC
(In reply to Jiri Belka from comment #11)
> this works incorrectly for RHVH (ngn or aka ovirt-node).
> 
> it does host.install, that's useless and does not update anything.
> 
> 508            if vdsType in ('rhev-h', 'RHEV_H'):
> 509                print('Performing RHEVH (Legacy) upgrade...')
> 510                upgradeRHevhhost(api, name)
> 511            verifyHost(api, name)
> 
> but...
> 
> Processing Host: dell-r210ii-04
> Type: ovirt_node

        ^^^ this was obviously tested with NGN on 4.1

>         Moving host to the maintenance.
>         Host moved to maintenance.
>         Installing........
>         ^^^ why?
> 
>         Installed.
>         ^^^ not upgraded anyting!
> 
>         Activating host..
>         Host activated.
> Requering the host type, type: ovirt_node
>         Verifying that host stays up..................
>         Verified.
> Closed connection.
> 
> [...]
> - for rhvh/ovirt-node (ngn), host.upgrade probably with this time without iso

IMO, we cannot do anything with NGN on 3.6 engine as 3.6 engine does _NOT_ know anything about NGN. See below on 3.6 engine with 3.6 NGN:

...
Cluster Default contains the following hosts: ['10-37-137-130']
Processing Host: 10-37-137-130
Type: rhel
...

The type is 'rhel' as 3.6 engine does not know NGN as vds_type at all.

engine=# select vds_name,vds_type,pretty_name from vds;
   vds_name    | vds_type |               pretty_name               
---------------+----------+-----------------------------------------
 10-37-137-130 |        0 | Red Hat Virtualization Host 3.6 (el7.3)

I'm not sure if we should care about NGN on 3.6 engine (and this script is not to be used on 4.x as we have ovirt ansible roles there). A problem is we don't have any possibility to distinguish RHEL and NGN via ovirt API. We can do that only from DB.

Thus, there should either be a decission to convert this script to run _only_ on engine VM and to use DB query to add support for NGN, or not to care about NGN on 3.6 at all and document this behavior.

Comment 16 Yaniv Lavi 2017-10-23 12:32:27 UTC
(In reply to Jiri Belka from comment #15)
> 
> Thus, there should either be a decission to convert this script to run
> _only_ on engine VM and to use DB query to add support for NGN, or not to
> care about NGN on 3.6 at all and document this behavior.

We do not care about 3.6 NGN, please document.

Comment 18 Jiri Belka 2017-11-13 15:40:42 UTC
ok, rhevm-backend-3.6.12.2-0.1.el6.noarch

...
Processing Host: dell-r210ii-13.example.com
Type: rhev-h
        Performing oVirt Node/RHEVH (Legacy) upgrade...
        Installing..........................
        Rebooting...........................................................*.*.*.*.*.*.
        Installed.
        Verifying that host stays up..................
        Verified.
Closed connection.

Comment 23 errata-xmlrpc 2017-11-27 13:34:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:3262