Bug 1573334 - RHV-H update to latest version fails on RHV 4.1 due to yum transaction failure
Summary: RHV-H update to latest version fails on RHV 4.1 due to yum transaction failure
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: imgbased
Version: 4.1.10
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.2.3-1
: ---
Assignee: Ryan Barry
QA Contact: Yaning Wang
URL:
Whiteboard:
: 1578857 1583700 (view as bug list)
Depends On:
Blocks: imgbased-1.0.17
TreeView+ depends on / blocked
 
Reported: 2018-04-30 21:14 UTC by Robert McSwain
Modified: 2021-09-09 13:54 UTC (History)
20 users (show)

Fixed In Version: imgbased-1.0.17
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-11 06:56:53 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:
lsvaty: testing_plan_complete-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1820 0 None None None 2018-06-11 06:57:31 UTC
oVirt gerrit 90803 0 master MERGED osupdater: activate our VGs earlier 2020-09-14 12:52:44 UTC
oVirt gerrit 91527 0 ovirt-4.2 MERGED osupdater: activate our VGs earlier 2020-09-14 12:52:43 UTC

Description Robert McSwain 2018-04-30 21:14:56 UTC
Description of problem:
Customer has tried to update a RHV-H via the gui upon repeated failure performed a yum update on the RHVH node itself.  The manager portal reported install failed. The node is part of a replicated glusterfs storage pool.

Failures in the logs show... 
warning: %post(redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch) scriptlet failed, exit status 1

2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update.noarch 0:4.1-20180410.1.el7_5 - u
2018-04-17 09:58:37 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:85 Yum Non-fatal POSTIN scriptlet failure in rpm package redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum updated: 2/2: redhat-virtualization-host-image-update
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Script sink: D: ========== +++ redhat-virtualization-host-image-update-4.1-20180314.0.el7_4 noarch-linux 0x0
D:     erase: redhat-virtualization-host-image-update-4.1-20180314.0.el7_4 has 3 files
D: erase      100644  1 (   0,   0)   235 /usr/share/redhat-virtualization-host/image/redhat-virtualization-host-4.1-20180314.0.el7_4.squashfs.img.meta
D: erase      100644  1 (   0,   0)635158528 /usr/share/redhat-virtualization-host/image/redhat-virtualization-host-4.1-20180314.0.el7_4.squashfs.img
D: skip       040755  2 (   0,   0)  4096 /usr/share/redhat-virtualization-host/image

2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update-4.1-20180314.0.el7_4.noarch
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Verify: 1/2: redhat-virtualization-host-image-update.noarch 0:4.1-20180410.1.el7_5 - u
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Verify: 2/2: redhat-virtualization-host-image-update.noarch 0:4.1-20180314.0.el7_4 - ud
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Transaction processed
2018-04-17 09:58:37 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-tnFFlJNVEl/pythonlib/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/tmp/ovirt-tnFFlJNVEl/otopi-plugins/otopi/packagers/yumpackager.py", line 261, in _packages
    self._miniyum.processTransaction()
  File "/tmp/ovirt-tnFFlJNVEl/pythonlib/otopi/miniyum.py", line 1050, in processTransaction
    _('One or more elements within Yum transaction failed')
RuntimeError: One or more elements within Yum transaction failed
2018-04-17 09:58:37 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Package installation': One or more elements within Yum transaction failed
2018-04-17 09:58:37 DEBUG otopi.transaction transaction.abort:119 aborting 'Yum Transaction'
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Performing yum transaction rollback


Version-Release number of selected component (if applicable):
rhevm-4.1.10.3-0.1.el7.noarch                               
redhat-virtualization-host-image-update-4.1-20180314.0.el7_4.noarch as well as latest RHV-H

How reproducible:
Unknown

Actual results:
Upgrade fails in %post due to yum error of "One or more elements within Yum transaction failed", then aborting, and rolling back.

Expected results:
Upgrade completes without error to the latest redhat-virtualization-host-image-update package.

Additional info:
Attachments to be linked in private

Comment 2 Ryan Barry 2018-05-01 13:52:25 UTC
This is actually a failure case we haven't seen before.

Is LVM ok on this system? lvmdiskscan shows LVs, but the sosreport shows nothing under:

# cat sos_commands/lvm2/lvs_-a_-o_lv_tags_devices_--config_global_locking_type_0 
  WARNING: Locking disabled. Be careful! This could corrupt your metadata.
#

imgbased is very dependent on LVM.

Here's what I'm seeing from the logs --

There have been a number of failed upgrades. Those logs are gone, so I can't tell what happened there. What's happening now is:

- imgbased believes the running layer is rhvh-4.1-0-20171101.0+1 (possibly due to LVM problems)
- In updating, it's trying to grab fstab from /dev/rhvh/rhvh-4.1-0-20171101.0+1. fstab on that layer does not have /var (maybe it was never migrated due to a previously failed upgrade?), so we look for /etc/systemd/system/var.mount, which doesn't exist, because /var is actually in fstab
- Since that fails, we can't ensure the partition layout is NIST 800-53 compliant, and we fail
- Successive upgrades fail because the new LV is there

What I would ask so we can find a root cause is:

- The output of `imgbase layer --current`
- The output of `lvs -o lv_name,tags`
- The above after 'vgchange -ay --select vg_tags = imgbased:vg'
- Remove all failed upgrade LVs (something basically like "for lv in `lvs --noheadings -o lv_name`; do echo $lv | grep -q `imgbase layer --current | sed -e 's/\+1//'` || lvremove rhvh/$lv; done
- See what `imgbase layer --current` says now

If it correctly points to 20170706, please re-try the upgrade.

Unfortunately, I cannot say how it got to the current state, but it definitely looks like LVM is not ok on the system.

Ultimately, this comes from:

2018-04-14 12:12:12,008 [DEBUG] (MainThread) Fetching image for '/'
2018-04-14 12:12:12,008 [DEBUG] (MainThread) Calling binary: (['findmnt', '--noheadings', '-o', 'SOURCE', '/'],) {}
2018-04-14 12:12:12,008 [DEBUG] (MainThread) Calling: (['findmnt', '--noheadings', '-o', 'SOURCE', '/'],) {'close_fds': True, 'stderr': -2}
2018-04-14 12:12:12,016 [DEBUG] (MainThread) Returned: /dev/mapper/rhvh-rhvh--4.1--0.20170706.0+1
2018-04-14 12:12:12,017 [DEBUG] (MainThread) Found '/dev/mapper/rhvh-rhvh--4.1--0.20170706.0+1'

But later, LVM appears to go haywire. A patch is up, to work around this

Comment 3 Ryan Barry 2018-05-16 14:06:50 UTC
*** Bug 1578857 has been marked as a duplicate of this bug. ***

Comment 5 Ryan Barry 2018-05-29 14:16:15 UTC
*** Bug 1583700 has been marked as a duplicate of this bug. ***

Comment 18 Ryan Barry 2018-06-05 09:34:12 UTC
Reproducing this requires a RHHI environment, with custom LVM filtering.

In general, RHVH makes an attempt to ensure all RHVH LVs are activated before starting upgrades. However, a failed upgrade for other causes can result in an activated LV from a failed upgrade, but with no actual upgrade data in it.

Neither engineering nor Virt QE has a reproducer, and the patch was written on the basis of log output.

Comment 20 Ryan Barry 2018-06-07 12:16:53 UTC
VERIFIED on the basis of logs and patch review.

If this is encountered again, please re-open.

Comment 22 errata-xmlrpc 2018-06-11 06:56:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1820

Comment 23 Franta Kust 2019-05-16 13:04:34 UTC
BZ<2>Jira Resync

Comment 24 Daniel Gur 2019-08-28 13:12:16 UTC
sync2jira

Comment 25 Daniel Gur 2019-08-28 13:16:28 UTC
sync2jira


Note You need to log in before you can comment on or make changes to this bug.