Bug 1573334

Summary: RHV-H update to latest version fails on RHV 4.1 due to yum transaction failure
Product: Red Hat Enterprise Virtualization Manager Reporter: Robert McSwain <rmcswain>
Component: imgbasedAssignee: Ryan Barry <rbarry>
Status: CLOSED ERRATA QA Contact: Yaning Wang <yaniwang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.10CC: cshao, dfediuck, huzhao, inetkach, jiaczhan, kshukla, lsurette, mkalinin, obockows, pstehlik, qiyuan, rbarry, rmcswain, sasundar, srevivo, weiwang, yaniwang, ycui, ykaul, yzhao
Target Milestone: ovirt-4.2.3-1Keywords: Rebase, ZStream
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: imgbased-1.0.17 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-11 06:56:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1582433    

Description Robert McSwain 2018-04-30 21:14:56 UTC
Description of problem:
Customer has tried to update a RHV-H via the gui upon repeated failure performed a yum update on the RHVH node itself.  The manager portal reported install failed. The node is part of a replicated glusterfs storage pool.

Failures in the logs show... 
warning: %post(redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch) scriptlet failed, exit status 1

2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update.noarch 0:4.1-20180410.1.el7_5 - u
2018-04-17 09:58:37 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:85 Yum Non-fatal POSTIN scriptlet failure in rpm package redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum updated: 2/2: redhat-virtualization-host-image-update
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Script sink: D: ========== +++ redhat-virtualization-host-image-update-4.1-20180314.0.el7_4 noarch-linux 0x0
D:     erase: redhat-virtualization-host-image-update-4.1-20180314.0.el7_4 has 3 files
D: erase      100644  1 (   0,   0)   235 /usr/share/redhat-virtualization-host/image/redhat-virtualization-host-4.1-20180314.0.el7_4.squashfs.img.meta
D: erase      100644  1 (   0,   0)635158528 /usr/share/redhat-virtualization-host/image/redhat-virtualization-host-4.1-20180314.0.el7_4.squashfs.img
D: skip       040755  2 (   0,   0)  4096 /usr/share/redhat-virtualization-host/image

2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update-4.1-20180314.0.el7_4.noarch
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Verify: 1/2: redhat-virtualization-host-image-update.noarch 0:4.1-20180410.1.el7_5 - u
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Verify: 2/2: redhat-virtualization-host-image-update.noarch 0:4.1-20180314.0.el7_4 - ud
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Transaction processed
2018-04-17 09:58:37 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-tnFFlJNVEl/pythonlib/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/tmp/ovirt-tnFFlJNVEl/otopi-plugins/otopi/packagers/yumpackager.py", line 261, in _packages
    self._miniyum.processTransaction()
  File "/tmp/ovirt-tnFFlJNVEl/pythonlib/otopi/miniyum.py", line 1050, in processTransaction
    _('One or more elements within Yum transaction failed')
RuntimeError: One or more elements within Yum transaction failed
2018-04-17 09:58:37 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Package installation': One or more elements within Yum transaction failed
2018-04-17 09:58:37 DEBUG otopi.transaction transaction.abort:119 aborting 'Yum Transaction'
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Performing yum transaction rollback


Version-Release number of selected component (if applicable):
rhevm-4.1.10.3-0.1.el7.noarch                               
redhat-virtualization-host-image-update-4.1-20180314.0.el7_4.noarch as well as latest RHV-H

How reproducible:
Unknown

Actual results:
Upgrade fails in %post due to yum error of "One or more elements within Yum transaction failed", then aborting, and rolling back.

Expected results:
Upgrade completes without error to the latest redhat-virtualization-host-image-update package.

Additional info:
Attachments to be linked in private

Comment 2 Ryan Barry 2018-05-01 13:52:25 UTC
This is actually a failure case we haven't seen before.

Is LVM ok on this system? lvmdiskscan shows LVs, but the sosreport shows nothing under:

# cat sos_commands/lvm2/lvs_-a_-o_lv_tags_devices_--config_global_locking_type_0 
  WARNING: Locking disabled. Be careful! This could corrupt your metadata.
#

imgbased is very dependent on LVM.

Here's what I'm seeing from the logs --

There have been a number of failed upgrades. Those logs are gone, so I can't tell what happened there. What's happening now is:

- imgbased believes the running layer is rhvh-4.1-0-20171101.0+1 (possibly due to LVM problems)
- In updating, it's trying to grab fstab from /dev/rhvh/rhvh-4.1-0-20171101.0+1. fstab on that layer does not have /var (maybe it was never migrated due to a previously failed upgrade?), so we look for /etc/systemd/system/var.mount, which doesn't exist, because /var is actually in fstab
- Since that fails, we can't ensure the partition layout is NIST 800-53 compliant, and we fail
- Successive upgrades fail because the new LV is there

What I would ask so we can find a root cause is:

- The output of `imgbase layer --current`
- The output of `lvs -o lv_name,tags`
- The above after 'vgchange -ay --select vg_tags = imgbased:vg'
- Remove all failed upgrade LVs (something basically like "for lv in `lvs --noheadings -o lv_name`; do echo $lv | grep -q `imgbase layer --current | sed -e 's/\+1//'` || lvremove rhvh/$lv; done
- See what `imgbase layer --current` says now

If it correctly points to 20170706, please re-try the upgrade.

Unfortunately, I cannot say how it got to the current state, but it definitely looks like LVM is not ok on the system.

Ultimately, this comes from:

2018-04-14 12:12:12,008 [DEBUG] (MainThread) Fetching image for '/'
2018-04-14 12:12:12,008 [DEBUG] (MainThread) Calling binary: (['findmnt', '--noheadings', '-o', 'SOURCE', '/'],) {}
2018-04-14 12:12:12,008 [DEBUG] (MainThread) Calling: (['findmnt', '--noheadings', '-o', 'SOURCE', '/'],) {'close_fds': True, 'stderr': -2}
2018-04-14 12:12:12,016 [DEBUG] (MainThread) Returned: /dev/mapper/rhvh-rhvh--4.1--0.20170706.0+1
2018-04-14 12:12:12,017 [DEBUG] (MainThread) Found '/dev/mapper/rhvh-rhvh--4.1--0.20170706.0+1'

But later, LVM appears to go haywire. A patch is up, to work around this

Comment 3 Ryan Barry 2018-05-16 14:06:50 UTC
*** Bug 1578857 has been marked as a duplicate of this bug. ***

Comment 5 Ryan Barry 2018-05-29 14:16:15 UTC
*** Bug 1583700 has been marked as a duplicate of this bug. ***

Comment 18 Ryan Barry 2018-06-05 09:34:12 UTC
Reproducing this requires a RHHI environment, with custom LVM filtering.

In general, RHVH makes an attempt to ensure all RHVH LVs are activated before starting upgrades. However, a failed upgrade for other causes can result in an activated LV from a failed upgrade, but with no actual upgrade data in it.

Neither engineering nor Virt QE has a reproducer, and the patch was written on the basis of log output.

Comment 20 Ryan Barry 2018-06-07 12:16:53 UTC
VERIFIED on the basis of logs and patch review.

If this is encountered again, please re-open.

Comment 22 errata-xmlrpc 2018-06-11 06:56:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1820

Comment 23 Franta Kust 2019-05-16 13:04:34 UTC
BZ<2>Jira Resync

Comment 24 Daniel Gur 2019-08-28 13:12:16 UTC
sync2jira

Comment 25 Daniel Gur 2019-08-28 13:16:28 UTC
sync2jira