Description of problem: When the RHVH upgrade failed, RHVM did not capture the failure information and still showed that "Upgrade was successful and host will be rebooted.". Version-Release number of selected component (if applicable): RHVM: 4.4.5.5-0.13.el8ev RHVH: redhat-virtualization-host-4.4.5-20210215.0.el8_3 How reproducible: 100% Steps to Reproduce: 1. Install RHVH-4.4-20210202.0-RHVH-x86_64-dvd1.iso 2. Add the host to RHVM 3. Login to RHVH, setup local repos and point to "redhat-virtualization-host-4.4.5-20210215.0.el8_3" 4. Create an LV with the LV name of latest build (so that imgbased will fail during the creation of LV) # lvcreate -V 2G --thin -n rhvh-4.4.5.3-0.20210215.0+1 {vg-name}/{pool-name} # lvs 5. Upgrade the host via RHVM 6. Focus on host status and /var/log/imgbased.log Actual results: 1. The host upgrade failed in imgbased.log, which is what we expect. 2. However, RHVM shows "Upgrade was successful and host will be rebooted." and then the status of the host becomes "NonResponsive". Expected results: When RHVH upgrade failed in imgbased, it should be reflected as an error event in the RHVM portal. Additional info: The error in /var/log/imgbased.log QE made is as follows: ~~~~ Traceback (most recent call last): File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/imgbased/__main__.py", line 53, in <module> CliApplication() File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/imgbased/__init__.py", line 82, in CliApplication app.hooks.emit("post-arg-parse", args) File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/imgbased/hooks.py", line 120, in emit cb(self.context, *args) File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/imgbased/plugins/update.py", line 75, in post_argparse six.reraise(*exc_info) File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/six.py", line 675, in reraise raise value File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/imgbased/plugins/update.py", line 66, in post_argparse base, _ = LiveimgExtractor(app.imgbase).extract(args.FILENAME) File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/imgbased/plugins/update.py", line 148, in extract "%s" % size, nvr) File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/imgbased/plugins/update.py", line 128, in add_base_with_tree new_layer_lv = self.imgbase.add_layer(new_base) File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/imgbased/imgbase.py", line 209, in add_layer new_lv = self._add_lvm_snapshot(prev_lv, new_layer.lv_name) File "/tmp/tmp.xIXTME6RBg/usr/lib/python3.6/site-packages/imgbased/imgbase.py", line 230, in _add_lvm_snapshot raise RuntimeError("Failed to create a new layer") RuntimeError: Failed to create a new layer ~~~~
Created attachment 1758335 [details] engine logs from /var/log/ovirt-engine/host-deploy
Created attachment 1760325 [details] /var/log/ovirt-eigine
Created attachment 1760327 [details] host logs from /var/log
It seems like some tasks, including e.g. 'Delete yum_updates file from host', but also 'Prepare NGN host for upgrade', are called more than once. Dana, can you please check why? Perhaps some things ran in parallel unintentionally, including checking for '/var/imgbased/.image-updated' before 'yum update' finished. Not sure.
Please attach all of /var/log from the engine. Thanks.
Created attachment 1760564 [details] Part1 of the engine log /var/log The compressed size of /var/log exceeds the upper limit of bugzilla, so I divided /var/log into two parts, Part1 does not contain /var/log/ovirt-engine, and /var/log/ovirt-engine is in Part2
Created attachment 1760565 [details] Part2 of engine log /var/log
Thanks! Restoring needinfo on Dana.
This issue has an open bug and is currently under investigation https://bugzilla.redhat.com/show_bug.cgi?id=1917707
Thanks. Closing as duplicate for now. Please reopen if needed. *** This bug has been marked as a duplicate of bug 1917707 ***
reopening as bug 1917707 seems to be caused an ansible-runner issue, which causes events to appear several times in the audit log
In my previous investigation (leading to comment 6), it seemed like imgbased and the %post of redhat-virtualization-host-image-update worked as expected, and the bug (not noticing that they failed) happened in the ansible code calling them. So moving to Dana for now.
While looking at host deploy log I noticed 2 things: 1. Whereas audit log shows some tasks several times (comment 13), in host upgrade log, tasks 'Check if image-updated file exists', ' Verify image was updated successfully', 'Configure LVM filter' are listed twice, and the results of 'Configure LVM filter' indicate that these are separate runs: first one output contains: "stdout" : "ok: [10.73.73.91]" second run output contains: "stdout" : "Analyzing host...\nLVM filter is already configured for Vdsm" 2. The task 'Verify image was updated successfully' (https://github.com/oVirt/ovirt-engine/blob/a65cf0eae8858ab2278c3f537dc427e3ff20eba7/packaging/ansible-runner-service-project/project/roles/ovirt-host-upgrade/tasks/main.yml#L64) which was expected to fail was skipped, although, in ansible-runner-service log, the values of those params are (can be also seen in host upgrade log): node_host: [2021-02-20 19:36:52,487 ] {'node_host': True} image_pkg_updated: [2021-02-20 19:38:37,088] {'image_pkg_updated': True} image_updated_file: [2021-02-20 19:38:37,806] 'res': {'changed': False, 'stat': {'exists': False}, 'invocation': {'module_args': {'path': '/var/imgbased/.image-updated', host_deploy_cluster_version was reported to be 4.5 I'm looking more in Ansible Runner to see what might cause [1]. Pengshan, is it possible to re-run with a file that should be replaced in the engine so that I can see whether there was a change in any of these params?
(In reply to Dana from comment #15) > While looking at host deploy log I noticed 2 things: > > 1. Whereas audit log shows some tasks several times (comment 13), in host > upgrade log, tasks 'Check if image-updated file exists', ' Verify image was > updated successfully', 'Configure LVM filter' are listed twice, and the > results of 'Configure LVM filter' indicate that these are separate runs: > first one output contains: "stdout" : "ok: [10.73.73.91]" > second run output contains: "stdout" : "Analyzing host...\nLVM filter is > already configured for Vdsm" > > 2. The task 'Verify image was updated successfully' > (https://github.com/oVirt/ovirt-engine/blob/ > a65cf0eae8858ab2278c3f537dc427e3ff20eba7/packaging/ansible-runner-service- > project/project/roles/ovirt-host-upgrade/tasks/main.yml#L64) which was > expected to fail was skipped, although, in ansible-runner-service log, the > values of those params are (can be also seen in host upgrade log): > node_host: [2021-02-20 19:36:52,487 ] {'node_host': True} > image_pkg_updated: [2021-02-20 19:38:37,088] {'image_pkg_updated': True} > image_updated_file: [2021-02-20 19:38:37,806] 'res': {'changed': False, > 'stat': {'exists': False}, 'invocation': {'module_args': {'path': > '/var/imgbased/.image-updated', > host_deploy_cluster_version was reported to be 4.5 > > I'm looking more in Ansible Runner to see what might cause [1]. > Pengshan, is it possible to re-run with a file that should be replaced in > the engine so that I can see whether there was a change in any of these > params? Yes, I can re-run, please tell me how to replace the file.
Created attachment 1770191 [details] logs of /var/log/ovirt-engine after replacing main.yml logs of /var/log/ovirt-engine after replacing /usr/share/ovirt-engine/ansible-runner-service-project/project/roles/ovirt-host-upgrade/tasks/main.yml
Pending on the new buid of ovirt-engine-4.4.6.4 to verify this bug.
QE verified this bug. Verified version: RHVM: 4.4.6.5-0.17.el8ev RHVH: redhat-virtualization-host-4.4.6-20210426.0.el8_4 Steps to Reproduce: 1. Install redhat-virtualization-host-4.4.5-20210330.0.el8_3 2. Add the host to RHVM 3. Login to RHVH, setup local repos and point to "redhat-virtualization-host-4.4.6-20210426.0.el8_4" 4. Create an LV with the LV name of latest build (so that imgbased will fail during the creation of LV) # lvcreate -V 2G --thin -n rhvh-4.4.6.1-0.20210426.0+1 {vg-name}/{pool-name} # lvs 5. Upgrade the host via RHVM 6. Focus on host status and /var/log/imgbased.log Actual results: 1. Host upgrade failed in imgbased.log, which is what we expect. 2. Host status changed to "InstallFailed" on RHVM. But in "Events", there is no related error message of "Failed to upgrade Host xxx".
Will “Actual results 2” affect the functionality?
1. what do you see in 'events'? 2. in 'hosts'- what is the host's status? does it show 'installation failed'?
Sorry, I made a mistake on Comment 20. Verified this bug again. Verified version: RHVM: 4.4.6.5-0.17.el8ev RHVH: redhat-virtualization-host-4.4.6-20210426.0.el8_4 Steps to Reproduce: 1. Install redhat-virtualization-host-4.4.5-20210330.0.el8_3 2. Add the host to RHVM 3. Login to RHVH, setup local repos and point to "redhat-virtualization-host-4.4.6-20210426.0.el8_4" 4. Create an LV with the LV name of latest build (so that imgbased will fail during the creation of LV) # lvcreate -V 2G --thin -n rhvh-4.4.6.1-0.20210426.0+1 {vg-name}/{pool-name} # lvs 5. Upgrade the host via RHVM 6. Focus on host status and /var/log/imgbased.log Actual results: 1. Host upgrade failed in imgbased.log, which is what we expect. 2. Host status changed to "InstallFailed" on RHVM. 3. In "Events", there is a related error message of "Failed to upgrade Host 0427-03 (User: admin@internal-authz).". So move bug status to "VERIFIED".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: RHV Manager security update (ovirt-engine) [ovirt-4.4.6]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2179