1573334 – RHV-H update to latest version fails on RHV 4.1 due to yum transaction failure

Bug 1573334 - RHV-H update to latest version fails on RHV 4.1 due to yum transaction failure

Summary: RHV-H update to latest version fails on RHV 4.1 due to yum transaction failure

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	imgbased
Sub Component:
Version:	4.1.10
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	ovirt-4.2.3-1
Target Release:	---
Assignee:	Ryan Barry
QA Contact:	Yaning Wang
Docs Contact:
URL:
Whiteboard:
Duplicates (2):	1578857 1583700 (view as bug list)
Depends On:
Blocks:	imgbased-1.0.17
TreeView+	depends on / blocked

Reported:	2018-04-30 21:14 UTC by Robert McSwain
Modified:	2021-09-09 13:54 UTC (History)
CC List:	20 users (show)
Fixed In Version:	imgbased-1.0.17
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-06-11 06:56:53 UTC
oVirt Team:	Node
Target Upstream Version:
Embargoed:
Flags:	lsvaty: testing_plan_complete-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2018:1820	None	None	None	2018-06-11 06:57:31 UTC
oVirt gerrit	90803	master	MERGED	osupdater: activate our VGs earlier	2020-09-14 12:52:44 UTC
oVirt gerrit	91527	ovirt-4.2	MERGED	osupdater: activate our VGs earlier	2020-09-14 12:52:43 UTC

Description Robert McSwain 2018-04-30 21:14:56 UTC

Description of problem:
Customer has tried to update a RHV-H via the gui upon repeated failure performed a yum update on the RHVH node itself.  The manager portal reported install failed. The node is part of a replicated glusterfs storage pool.

Failures in the logs show... 
warning: %post(redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch) scriptlet failed, exit status 1

2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update.noarch 0:4.1-20180410.1.el7_5 - u
2018-04-17 09:58:37 ERROR otopi.plugins.otopi.packagers.yumpackager yumpackager.error:85 Yum Non-fatal POSTIN scriptlet failure in rpm package redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum updated: 2/2: redhat-virtualization-host-image-update
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Script sink: D: ========== +++ redhat-virtualization-host-image-update-4.1-20180314.0.el7_4 noarch-linux 0x0
D:     erase: redhat-virtualization-host-image-update-4.1-20180314.0.el7_4 has 3 files
D: erase      100644  1 (   0,   0)   235 /usr/share/redhat-virtualization-host/image/redhat-virtualization-host-4.1-20180314.0.el7_4.squashfs.img.meta
D: erase      100644  1 (   0,   0)635158528 /usr/share/redhat-virtualization-host/image/redhat-virtualization-host-4.1-20180314.0.el7_4.squashfs.img
D: skip       040755  2 (   0,   0)  4096 /usr/share/redhat-virtualization-host/image

2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Done: redhat-virtualization-host-image-update-4.1-20180314.0.el7_4.noarch
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Verify: 1/2: redhat-virtualization-host-image-update.noarch 0:4.1-20180410.1.el7_5 - u
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Verify: 2/2: redhat-virtualization-host-image-update.noarch 0:4.1-20180314.0.el7_4 - ud
2018-04-17 09:58:37 DEBUG otopi.plugins.otopi.packagers.yumpackager yumpackager.verbose:76 Yum Transaction processed
2018-04-17 09:58:37 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-tnFFlJNVEl/pythonlib/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/tmp/ovirt-tnFFlJNVEl/otopi-plugins/otopi/packagers/yumpackager.py", line 261, in _packages
    self._miniyum.processTransaction()
  File "/tmp/ovirt-tnFFlJNVEl/pythonlib/otopi/miniyum.py", line 1050, in processTransaction
    _('One or more elements within Yum transaction failed')
RuntimeError: One or more elements within Yum transaction failed
2018-04-17 09:58:37 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Package installation': One or more elements within Yum transaction failed
2018-04-17 09:58:37 DEBUG otopi.transaction transaction.abort:119 aborting 'Yum Transaction'
2018-04-17 09:58:37 INFO otopi.plugins.otopi.packagers.yumpackager yumpackager.info:80 Yum Performing yum transaction rollback


Version-Release number of selected component (if applicable):
rhevm-4.1.10.3-0.1.el7.noarch                               
redhat-virtualization-host-image-update-4.1-20180314.0.el7_4.noarch as well as latest RHV-H

How reproducible:
Unknown

Actual results:
Upgrade fails in %post due to yum error of "One or more elements within Yum transaction failed", then aborting, and rolling back.

Expected results:
Upgrade completes without error to the latest redhat-virtualization-host-image-update package.

Additional info:
Attachments to be linked in private

Comment 2 Ryan Barry 2018-05-01 13:52:25 UTC

This is actually a failure case we haven't seen before.

Is LVM ok on this system? lvmdiskscan shows LVs, but the sosreport shows nothing under:

# cat sos_commands/lvm2/lvs_-a_-o_lv_tags_devices_--config_global_locking_type_0 
  WARNING: Locking disabled. Be careful! This could corrupt your metadata.
#

imgbased is very dependent on LVM.

Here's what I'm seeing from the logs --

There have been a number of failed upgrades. Those logs are gone, so I can't tell what happened there. What's happening now is:

- imgbased believes the running layer is rhvh-4.1-0-20171101.0+1 (possibly due to LVM problems)
- In updating, it's trying to grab fstab from /dev/rhvh/rhvh-4.1-0-20171101.0+1. fstab on that layer does not have /var (maybe it was never migrated due to a previously failed upgrade?), so we look for /etc/systemd/system/var.mount, which doesn't exist, because /var is actually in fstab
- Since that fails, we can't ensure the partition layout is NIST 800-53 compliant, and we fail
- Successive upgrades fail because the new LV is there

What I would ask so we can find a root cause is:

- The output of `imgbase layer --current`
- The output of `lvs -o lv_name,tags`
- The above after 'vgchange -ay --select vg_tags = imgbased:vg'
- Remove all failed upgrade LVs (something basically like "for lv in `lvs --noheadings -o lv_name`; do echo $lv | grep -q `imgbase layer --current | sed -e 's/\+1//'` || lvremove rhvh/$lv; done
- See what `imgbase layer --current` says now

If it correctly points to 20170706, please re-try the upgrade.

Unfortunately, I cannot say how it got to the current state, but it definitely looks like LVM is not ok on the system.

Ultimately, this comes from:

2018-04-14 12:12:12,008 [DEBUG] (MainThread) Fetching image for '/'
2018-04-14 12:12:12,008 [DEBUG] (MainThread) Calling binary: (['findmnt', '--noheadings', '-o', 'SOURCE', '/'],) {}
2018-04-14 12:12:12,008 [DEBUG] (MainThread) Calling: (['findmnt', '--noheadings', '-o', 'SOURCE', '/'],) {'close_fds': True, 'stderr': -2}
2018-04-14 12:12:12,016 [DEBUG] (MainThread) Returned: /dev/mapper/rhvh-rhvh--4.1--0.20170706.0+1
2018-04-14 12:12:12,017 [DEBUG] (MainThread) Found '/dev/mapper/rhvh-rhvh--4.1--0.20170706.0+1'

But later, LVM appears to go haywire. A patch is up, to work around this

Comment 3 Ryan Barry 2018-05-16 14:06:50 UTC

*** Bug 1578857 has been marked as a duplicate of this bug. ***

Comment 5 Ryan Barry 2018-05-29 14:16:15 UTC

*** Bug 1583700 has been marked as a duplicate of this bug. ***

Comment 18 Ryan Barry 2018-06-05 09:34:12 UTC

Reproducing this requires a RHHI environment, with custom LVM filtering.

In general, RHVH makes an attempt to ensure all RHVH LVs are activated before starting upgrades. However, a failed upgrade for other causes can result in an activated LV from a failed upgrade, but with no actual upgrade data in it.

Neither engineering nor Virt QE has a reproducer, and the patch was written on the basis of log output.

Comment 20 Ryan Barry 2018-06-07 12:16:53 UTC

VERIFIED on the basis of logs and patch review.

If this is encountered again, please re-open.

Comment 22 errata-xmlrpc 2018-06-11 06:56:53 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1820

Comment 23 Franta Kust 2019-05-16 13:04:34 UTC

BZ<2>Jira Resync

Comment 24 Daniel Gur 2019-08-28 13:12:16 UTC

sync2jira

Comment 25 Daniel Gur 2019-08-28 13:16:28 UTC

sync2jira

Note You need to log in before you can comment on or make changes to this bug.

cshao
dfediuck
huzhao
inetkach
jiaczhan
kshukla
lsurette
mkalinin
obockows
pstehlik
qiyuan
rbarry
rmcswain
sasundar
srevivo
weiwang
yaniwang
ycui
ykaul
yzhao