Description of problem: host update with 3.6 ngn in 4.0 engine fails because of package ovirt-imageio-daemon cannot be found. this issue seems to origin in fact that 4.0 engine doesn't know 3.6 ngn is not EL7 host: ... 2017-03-10 17:11:43 DEBUG otopi.context context.dumpEnvironment:770 ENV OMGMT_PACKAGES/packages=list:'['libvirt-daemon-config-nwfilter', 'ovirt-imageio-daemon', 'ioprocess', 'mom', 'python-ioprocess', 'qemu-kvm', 'ovirt-vmconsole-host', 'libvirt-daemon-kvm', 'qemu-img', 'ovirt-imageio-common', 'ovirt-vmconsole', 'vdsm-cli', 'libvirt-client', 'sanlock', 'libvirt-python', 'lvm2', 'libvirt-lock-sanlock', 'vdsm']' ... 2017-03-10 17:11:43 ERROR otopi.plugins.ovirt_host_mgmt.packages.update update.error:102 Yum: Cannot queue package ovirt-imageio-daemon: Package ovirt-imageio-daemon cannot be found 2017-03-10 17:11:43 INFO otopi.plugins.ovirt_host_mgmt.packages.update update.info:98 Yum: Performing yum transaction rollback 2017-03-10 17:11:43 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/tmp/ovirt-cX1TVARXAP/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-cX1TVARXAP/otopi-plugins/ovirt-host-mgmt/packages/update.py", line 110, in _packagesCheck packages=self.environment[omgmt.PackagesEnv.PACKAGES], File "/tmp/ovirt-cX1TVARXAP/pythonlib/otopi/miniyum.py", line 851, in install **kwargs File "/tmp/ovirt-cX1TVARXAP/pythonlib/otopi/miniyum.py", line 500, in _queue package=package, RuntimeError: Package ovirt-imageio-daemon cannot be found 2017-03-10 17:11:43 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Package installation': Package ovirt-imageio-daemon cannot be found [root@10-34-62-43 ~]# cat /etc/os-release NAME="Red Hat Enterprise Linux" VERSION="7.3" VERSION_ID="7.3" ID="rhel" ID_LIKE="fedora" VARIANT="Red Hat Virtualization Host" VARIANT_ID="ovirt-node" PRETTY_NAME="Red Hat Virtualization Host 3.6 (el7.3)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:7.3:GA:hypervisor" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" # FIXME REDHAT_BUGZILLA_PRODUCT="Red Hat Virtualization" REDHAT_BUGZILLA_PRODUCT_VERSION=7.3 REDHAT_SUPPORT_PRODUCT="Red Hat Virtualization" REDHAT_SUPPORT_PRODUCT_VERSION=7.3 [root@10-34-62-43 ~]# yum update Loaded plugins: imgbased-warning, product-id, search-disabled-repos, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register. Warning: yum operations are not persisted across upgrades! Resolving Dependencies --> Running transaction check ---> Package redhat-virtualization-host-image-update.noarch 0:4.0-20170307.1.el7_3 will be obsoleting ---> Package redhat-virtualization-host-image-update-placeholder.noarch 0:3.6-0.2.el7 will be obsoleted --> Finished Dependency Resolution Dependencies Resolved ==================================================================================================================================================================================================================== Package Arch Version Repository Size ==================================================================================================================================================================================================================== Installing: redhat-virtualization-host-image-update noarch 4.0-20170307.1.el7_3 4.0.7-6 571 M replacing redhat-virtualization-host-image-update-placeholder.noarch 3.6-0.2.el7 Transaction Summary ==================================================================================================================================================================================================================== Install 1 Package Total download size: 571 M Version-Release number of selected component (if applicable): otopi-1.6.0-1.el7ev.noarch ovirt-host-deploy-1.5.5-1.el7ev.noarch ovirt-engine-backend-4.0.7.4-0.1.el7ev.noarch redhat-release-virtualization-host-3.6-0.2.el7.x86_64 How reproducible: 100% Steps to Reproduce: 1. add 3.6 ngn into 4.0 engine with 3.6 cluster 2. wait for host update manager check 3. Actual results: failure, no updates listed even there's hypervisor rpm ready to be updated Expected results: updates should be highlighted correctly Additional info:
This seems to be general issue that host-deploy detects incorrectly if host is NGN or EL7: ... 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: OMGMT_PACKAGES/packages 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE libvirt-daemon-config-nwfilter 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE ovirt-imageio-daemon 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE ioprocess 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE mom 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE python-ioprocess 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE qemu-kvm 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE ovirt-vmconsole-host 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE libvirt-daemon-kvm 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE qemu-img 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE ovirt-imageio-common 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE ovirt-vmconsole 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE vdsm-cli 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE libvirt-client 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE sanlock 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE libvirt-python 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE lvm2 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE libvirt-lock-sanlock 2017-03-10 18:14:09 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE vdsm ... Am I right?
I'd say that this is caused by error we have made in BZ1344020, where we missed that imageio packages are not available in 3.6. We have fixed that in 4.1.1 by BZ1418757, but this was not backported to 4.0.z
(In reply to Martin Perina from comment #3) > I'd say that this is caused by error we have made in BZ1344020, where we > missed that imageio packages are not available in 3.6. We have fixed that in > 4.1.1 by BZ1418757, but this was not backported to 4.0.z I checked the engine's db and it seems that Jiri's claim is correct: engine=# select vds_type, vds_name from vds; vds_type | vds_name | ----------+----------------+ 0 | dell-r210ii-13 | 0 | slot-5d | So it appears that the hosts are considered as "rhel" and not as rhv. However, I couldn't find on engine.log nor the relevant host-deploy log file where those hosts were added to the system. Couldn't it be that originally those hosts were added as RHEL and at some point were re-installed as RHV-H ?
Please attach the engine.log where those hosts were originally added to the engine and the relevant host-deploy files for those installation.
(In reply to Moti Asayag from comment #5) > Please attach the engine.log where those hosts were originally added to the > engine and the relevant host-deploy files for those installation. It is not part of the BZ's attachment?
(In reply to Jiri Belka from comment #6) > (In reply to Moti Asayag from comment #5) > > Please attach the engine.log where those hosts were originally added to the > > engine and the relevant host-deploy files for those installation. > > It is not part of the BZ's attachment? No, the attachment contains only the the ovirt-host-mgmt-XYZ.log which is for the upgrade process, but there is ovirt-host-deploy-XYZ.log which is for the install/reinstall host where the detected host type is determined. Since the bug deals with host os type detection, we should see how and what was detected during the host installation which is saved on ovirt-host-deploy-XYZ.log.
adding 3.6 ngn into 4.0 engine works fine, even host upgrade check works. engine=# select vds_name,pretty_name,vds_type from vds; -[ RECORD 1 ]---------------------------------------- vds_name | dell-r210ii-13 pretty_name | Red Hat Virtualization Host 3.6 (el7.3) vds_type | 1 so i suppose the problem is of the historical flow - 3.6 ngn was in fact added into the engine in 3.6 engine time and then updated to 4.0. i'll redo 3.6 ngn into 3.6 engine.
so here it is - 3.6 ngn in 3.6 engine with 3.5 cluster. engine=# select vds_name,vds_type from vds; vds_name | vds_type ----------------+---------- dell-r210ii-13 | 0 (1 row) engine=# \q -bash-4.1$ logout [root@jbelka-vm2 yum.repos.d]# grep -i ovirt-node /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-20170314135651-10.34.62.205-6af41e3d.log [root@jbelka-vm2 yum.repos.d]# rpm -q rhevm rhevm-3.6.10.2-0.2.el6.noarch
Jiri, iiuc this is true for any version of ngn that is added to rhv-m-3.6. Right?
(In reply to Dan Kenigsberg from comment #16) > Jiri, iiuc this is true for any version of ngn that is added to rhv-m-3.6. > Right? 1 - 4.0 ngn, 2 - 3.6 ngn (ovirt-host-deploy-1.4.1-1.el6ev.noarch) thus both detected as '0' - EL. engine=# select vds_name,vds_type,rpm_version,supported_cluster_levels,supported_engines from vds; -[ RECORD 1 ]------------+--------------------- vds_name | dell-r210ii-03 vds_type | 0 rpm_version | vdsm-4.18.24-3.el7ev supported_cluster_levels | 3.5,3.6,4.0 supported_engines | 3.6,4.0 -[ RECORD 2 ]------------+--------------------- vds_name | dell-r210ii-13 vds_type | 0 rpm_version | vdsm-4.17.38-1.el7ev supported_cluster_levels | 3.4,3.5,3.6 supported_engines | 3.4,3.5,3.6
The ovirt-engine-3.6 and ovirt-host-deploy-1.4 don't detect the NGN as ovirt-node. It is being considered as RHEL host, therefore the '0' type which represents RHEL host. The notation of ovirt-node was introduced in ovirt-engine-4.0, where the host type for the NGN/ovirt-node is '1' (and for the ovirt-node-vintange is '2'). In order to set the correct type for the ovirt-node, it should be re-installed. So ovirt-host-deploy will detect the proper type of it and will persist it to the engine's db. Another alternative is removing and adding the same host. I'd suggest closing this bug as "CLOSED WONTFIX" and add a release notes to upgrade to ovirt-engine-4.0 and reinstall any ovirt-node (NGN) host which was added in ovirt-engine-3.6.
on 4.0 engine: engine=# select vds_name,vds_type,rpm_version,supported_cluster_levels,supported_engines from vds; vds_name | vds_type | rpm_version | supported_cluster_levels | supported_engines ----------------+----------+----------------------+--------------------------+------------------- dell-r210ii-13 | 0 | vdsm-4.17.38-1.el7ev | 3.4,3.5,3.6 | 3.4,3.5,3.6 dell-r210ii-03 | 0 | vdsm-4.18.24-3.el7ev | 3.5,3.6,4.0 | 3.6,4.0 (2 rows) after doing 'Reinstall' of dell-r210ii-03: engine=# select vds_name,vds_type,rpm_version,supported_cluster_levels,supported_engines from vds; vds_name | vds_type | rpm_version | supported_cluster_levels | supported_engines ----------------+----------+----------------------+--------------------------+------------------- dell-r210ii-13 | 0 | vdsm-4.17.38-1.el7ev | 3.4,3.5,3.6 | 3.4,3.5,3.6 dell-r210ii-03 | 0 | vdsm-4.18.24-3.el7ev | 3.5,3.6,4.0 | 3.6,4.0 (2 rows) so, reinstall doesn't have vds_type :/
see #19 and attachment please.
(In reply to Jiri Belka from comment #19) > on 4.0 engine: ovirt-host-deploy-1.5.3-1.el7ev.noarch rhevm-4.0.6.3-0.1.el7ev.noarch
The host wasn't detected as ovirt-node by host-deploy, here is the relevant part from the host-deploy.log: 2017-03-14 16:29:40 DEBUG otopi.context context.dumpEnvironment:770 ENV VDSM/ovirt-legacy-node=bool:'False' 2017-03-14 16:29:40 DEBUG otopi.context context.dumpEnvironment:770 ENV VDSM/ovirt-node=bool:'False' ... 2017-03-14 16:29:41 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### Customization phase, use 'install' to proceed 2017-03-14 16:29:41 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### COMMAND> 2017-03-14 16:29:41 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QHidden: FALSE 2017-03-14 16:29:41 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***Q:STRING CUSTOMIZATION_COMMAND 2017-03-14 16:29:41 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QEnd: CUSTOMIZATION_COMMAND 2017-03-14 16:29:41 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:RECEIVE env-get -k VDSM/ovirt-node 2017-03-14 16:29:41 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ***D:VALUE VDSM/ovirt-node=bool:False 2017-03-14 16:29:41 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND **%QStart: CUSTOMIZATION_COMMAND 2017-03-14 16:29:41 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND ### However, despite not being reported as ovirt-node, the host contains the required information to be detected as ovirt-node: [root@dell-xxx ~]# grep VARIANT_ID /etc/os-release VARIANT_ID="ovirt-node" which comply to /usr/share/ovirt-host-deploy/plugins/ovirt-host-deploy/node/detect.py: odeploycons.VdsmEnv.OVIRT_NODE, ( self.hasconf(odeploycons.FileLocations.OVIRT_NODE_OS_FILE, odeploycons.FileLocations.OVIRT_NODE_VARIANT_KEY, odeploycons.FileLocations.OVIRT_NODE_VARIANT_VAL) ) The installed version is: ovirt-host-deploy-1.5.3-1.el7ev.noarch Sandro, could you have a look ?
same with 4.0.7 host-deploy - ovirt-host-deploy-1.5.5-1.el7ev.noarch
(In reply to Moti Asayag from comment #18) > The ovirt-engine-3.6 and ovirt-host-deploy-1.4 don't detect the NGN as > ovirt-node. It is being considered as RHEL host, therefore the '0' type > which represents RHEL host. > > The notation of ovirt-node was introduced in ovirt-engine-4.0, where the > host type for the NGN/ovirt-node is '1' (and for the ovirt-node-vintange is > '2'). > > In order to set the correct type for the ovirt-node, it should be > re-installed. > So ovirt-host-deploy will detect the proper type of it and > will persist it to the engine's db. Another alternative is removing and > adding the same host. These are not real options for huge real-life setups. We decided to take el7-rhv-h-3.6-ngn out of tech preview in order to avoid spurious host upgrades for customers. 3.6 is intended to be the RHV-M version that can manage both 3.y and 4.y, and this bug puts a big hurdle to that. What is the cost of backporting NGN detection to 3.6.11?
> These are not real options for huge real-life setups. We decided to take > el7-rhv-h-3.6-ngn out of tech preview in order to avoid spurious host > upgrades for customers. 3.6 is intended to be the RHV-M version that can > manage both 3.y and 4.y, and this bug puts a big hurdle to that. > Btw, from the beginning we said that upgrade is not supported and you'll have to re-install your NGN. I never heard that has changed. The purpose was to let people play with it and provide feedback in 3.6.
Please ignore info about dell-r210ii-03, I did mistake and confused only this host with dell-r210ii-04, so it means dell-r210ii-03 was real EL7 :/ I'll redo dell-r210ii-04 (4.0 ngn) in 3.6 engine and 4.0 engine (reinstall).
(In reply to Oved Ourfali from comment #26) > > These are not real options for huge real-life setups. We decided to take > > el7-rhv-h-3.6-ngn out of tech preview in order to avoid spurious host > > upgrades for customers. 3.6 is intended to be the RHV-M version that can > > manage both 3.y and 4.y, and this bug puts a big hurdle to that. > > > > Btw, from the beginning we said that upgrade is not supported and you'll > have to re-install your NGN. I never heard that has changed. The purpose was > to let people play with it and provide feedback in 3.6. I recall that 3.6-ngn is in tech preview. But we would like to make it fully supportable (see bug 1421098) since 3.6 hosts can be managed by both ancient and new Engines, and can serve as a stepping stone for upgrades. Hence my question about the cost of backport.
Backporting the NGN support to 3.6 is costly and risky: There is an entire topic branch [1] and a set of patches/bugs that were fixed afterwards as a result of those patches which aren't listed or cannot be simply tracked. In addition, we'll have to verify that ovirt-host-deploy changes didn't break any compatibility when backporting those changes used to work with engine-4.x to engine-3.6. [1] https://gerrit.ovirt.org/#/q/topic:NGN
And how about making Vdsm report in getCaps whether is is NGN, and having Engine update vds_type field accordingly? Would this solve our issue? Is it safer and simpler as it seems to me?
Jiri, just to confirm, I'd like to ensure that 2 scenarios work: (A) 1. Install 3.6 NGN (in a 3.6 environment) 2. Upgrade Engine to 4.1 3. Reinstall the host (from engine) - Is the host now recognized correctly as RHVH? (B) 1. Install 3.6 NGN (in a 3.6 environment) 2. Upgrade the host to NGN 4.1 (from within the host of course) 2. Upgrade Engine to 4.1 3. Reinstall the host (from engine) - Is the host now recognized correctly as RHVH?
(In reply to Yaniv Kaul from comment #40) > Jiri, just to confirm, I'd like to ensure that 2 scenarios work: > (A) > 1. Install 3.6 NGN (in a 3.6 environment) > 2. Upgrade Engine to 4.1 > 3. Reinstall the host (from engine) > > - Is the host now recognized correctly as RHVH? It fails because of 'collectd issue' (again)... 2017-04-21 12:47:52 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:85 Yum Cannot queue package collectd: Package collectd cannot be found 2017-04-21 12:47:52 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/tmp/ovirt-iOzUcZ31yl/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-iOzUcZ31yl/otopi-plugins/ovirt-host-deploy/collectd/packages.py", line 53, in _packages 'collectd-write_http', File "/tmp/ovirt-iOzUcZ31yl/otopi-plugins/otopi/packagers/yumpackager.py", line 307, in installUpdate ignoreErrors=ignoreErrors File "/tmp/ovirt-iOzUcZ31yl/pythonlib/otopi/miniyum.py", line 883, in installUpdate **kwargs File "/tmp/ovirt-iOzUcZ31yl/pythonlib/otopi/miniyum.py", line 500, in _queue package=package, RuntimeError: Package collectd cannot be found See https://bugzilla.redhat.com/show_bug.cgi?id=1444450 > (B) > 1. Install 3.6 NGN (in a 3.6 environment) > 2. Upgrade the host to NGN 4.1 (from within the host of course) > 2. Upgrade Engine to 4.1 > 3. Reinstall the host (from engine) > > - Is the host now recognized correctly as RHVH? Works fine.
> > 1. Install 3.6 NGN (in a 3.6 environment) > > 2. Upgrade Engine to 4.1 > > 3. Reinstall the host (from engine) > > > > - Is the host now recognized correctly as RHVH? > > It fails because of 'collectd issue' (again)... > > 2017-04-21 12:47:52 ERROR otopi.plugins.otopi.packagers.yumpackager > yumpackager.error:85 Yum Cannot queue package collectd: Package collectd > cannot be found > 2017-04-21 12:47:52 DEBUG otopi.context context._executeMethod:142 method > exception > Traceback (most recent call last): > File "/tmp/ovirt-iOzUcZ31yl/pythonlib/otopi/context.py", line 132, in > _executeMethod > method['method']() > File > "/tmp/ovirt-iOzUcZ31yl/otopi-plugins/ovirt-host-deploy/collectd/packages.py", > line 53, in _packages > 'collectd-write_http', > File "/tmp/ovirt-iOzUcZ31yl/otopi-plugins/otopi/packagers/yumpackager.py", > line 307, in installUpdate > ignoreErrors=ignoreErrors > File "/tmp/ovirt-iOzUcZ31yl/pythonlib/otopi/miniyum.py", line 883, in > installUpdate > **kwargs > File "/tmp/ovirt-iOzUcZ31yl/pythonlib/otopi/miniyum.py", line 500, in > _queue > package=package, > RuntimeError: Package collectd cannot be found > > See https://bugzilla.redhat.com/show_bug.cgi?id=1444450 If db upgrade scripts would correct vds_type then we won't have this collecd issue as Reinstall won't be needed https://bugzilla.redhat.com/show_bug.cgi?id=1445297
bug 1445297 is a good-enough workaround for me. The users would have to upgrade Engine first to 4.1.3, and only then, their 3.6.11-ngn would be recognized as such, and would be able to be upgraded (rather than re-installed).