Bug 1496371

Summary: Error in imgbased are not considered by manager while upgrading RHV-H
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: ovirt-node-ngAssignee: Yuval Turgeman <yturgema>
Status: CLOSED ERRATA QA Contact: Huijuan Zhao <huzhao>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.6CC: bgraveno, cshao, dfediuck, dguo, huzhao, jiawu, mgoldboi, qiyuan, rbarry, sbonazzo, weiwang, yaniwang, ycui, yturgema, yzhao
Target Milestone: ovirt-4.1.7Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
This update resolves an issue where an RPM installation finishes successfully (rc=0) even though the postscript failed. The 'set -e' command is now used to fail the RPM installation process when the post install script fails.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-07 17:30:17 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Comment 8: All logs from host none

Description nijin ashok 2017-09-27 08:25:35 UTC
Description of problem:

The manager is only checking if the package "redhat-virtualization-host-image-update" installation is successful and is not considered if anything fails at imgbased. The package installation will be completed successfully even if the imgbased process failed because of some reason. So manager thinks that the upgrade is successful and will give the output to the user as "upgrade was completed successfully" in the event tab and will even reboot the host. However, the host will be still using the old layer.

Version-Release number of selected component (if applicable):

rhevm-4.1.6.2-0.1.el7.noarch

How reproducible:

100%

Steps to Reproduce:

1. Fail the imgbased upgrade process someway. I created a lv with the same name what the upgrade script will create so that it will fail while creating the lv. For the customer, the upgrade was failing with i/o error during the imgbased copy operation. 

2. The RHV-M will show as "upgrade was completed successfully" although it failed.

Actual results:

RHV-M is not showing the correct status of the upgrade.

Expected results:

RHV-M should capture the correct message and show to the user if upgrade failed. 

Additional info:

Comment 1 Ryan Barry 2017-09-27 11:41:38 UTC
We can intentionally fail the %post scriptlet if something goes wrong, but I suspect that RHVM will not capture the complete output from imgbased-copy-bootfiles, since that is not visible from yum

RHVM only knows whether yum completed. It does not know the inner workings of imgbased/rhvh.

However, in case where manual intervention is necessary (duplicate LV names or i/o errors), logging into the system to get logs will required in any case

Comment 3 Yuval Turgeman 2017-09-27 13:58:46 UTC
Running a similar flow with `set -e` in %post sets the status in the UI to "Install Failed", and looking in ovirt-host-deploy logs, you can see something like:
"Yum Script sink: warning: %post(pkgname...) scriptlet failed, exit status 1"

Is that an acceptable fix (it's already like that in upstream btw) ?

Comment 4 Ryan Barry 2017-09-27 14:07:48 UTC
That definitely works for me

Comment 9 Huijuan Zhao 2017-10-02 10:35:06 UTC
Created attachment 1333179 [details]
Comment 8: All logs from host

Comment 12 Sandro Bonazzola 2017-10-16 06:29:47 UTC
Moving back to assigned according to comment #11

Comment 13 Sandro Bonazzola 2017-10-16 07:48:44 UTC
Moving back to modified after discussing the status with Yuval

Comment 15 Huijuan Zhao 2017-10-16 12:27:27 UTC
According to comment 11, this bug is fixed in rhvh-4.1-0.20171002.0.
And another Bug 1502681 will trace the new issue in comment 11, that upgrade failed from non-nist system to another nist system after upgrade from non-nist system to nist-system.

So I will verify this bug.

Comment 18 errata-xmlrpc 2017-11-07 17:30:17 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:3140