I've upgraded oVirt node from from 4.1.8 to 4.2.3.1 and it failed during %post part: Installing : ovirt-node-ng-image-update-4.2.3.1-1.el7.noarch 1/1 warning: %post(ovirt-node-ng-image-update-4.2.3.1-1.el7.noarch) scriptlet failed, exit status 1 Non-fatal POSTIN scriptlet failure in rpm package ovirt-node-ng-image-update-4.2.3.1-1.el7.noarch Verifying : ovirt-node-ng-image-update-4.2.3.1-1.el7.noarch 1/1 Installed: ovirt-node-ng-image-update.noarch 0:4.2.3.1-1.el7 Complete! imgbase log: https://gist.github.com/sandersr/8ab1a0048ab8ceb94a3c1f1934ab6962 It looks like this layer has been removed: ovirt-node-ng-4.1.8-0.20171211.0 Even tho it doesn't exist, the server boots fine to 4.1.8 image! lvdisplay: https://gist.github.com/sandersr/0879b0bfa52051a7b8c955f0254362ec imgbase w You are on ovirt-node-ng-4.1.8-0.20171211.0+1 imgbase layout Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/imgbased/__main__.py", line 53, in <module> CliApplication() File "/usr/lib/python2.7/site-packages/imgbased/__init__.py", line 82, in CliApplication app.hooks.emit("post-arg-parse", args) File "/usr/lib/python2.7/site-packages/imgbased/hooks.py", line 120, in emit cb(self.context, *args) File "/usr/lib/python2.7/site-packages/imgbased/plugins/core.py", line 182, in post_argparse print(layout.dumps()) File "/usr/lib/python2.7/site-packages/imgbased/plugins/core.py", line 210, in dumps return self.app.imgbase.layout() File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 154, in layout return self.naming.layout() File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 109, in layout tree = self.tree(lvs) File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 224, in tree bases[img.base.nvr].layers.append(img) KeyError: <NVR ovirt-node-ng-4.1.8-0.20171211.0 /> I cannot reinstall ovirt-node-ng-image-update-4.2.3.1-1.el7.noarch because imgbase will fail with: ... 2018-06-01 13:28:51,489 [DEBUG] (MainThread) Exception! Using default stripesize 64.00 KiB. Logical Volume "ovirt-node-ng-4.2.3.1-0.20180530.0" already exists in volume group "onn" ... subprocess.CalledProcessError: Command '['lvcreate', '--thin', '--virtualsize', u'155508015104B', '--name', 'ovirt-node-ng-4.2.3.1-0.20180530.0', u'onn/pool00']' returned non-zero exit status 5 Is there a way to forcefully remove 4.1.8 layer? (imgbase doesn't provide --force switch so maybe there is a manual way) Shouldn't the post script perform a cleanup before trying to create LV? Simple if-exist-remove would fix this part. Since the "Non-fatal POSTIN scriptlet failure" is non-fatal it's easy to overlook there is a problem (chances are this problem was already present when I upgraded to 4.1.9 but I didn't notice at the time). It also doesn't finish installation correctly (no kernel files, no configuration copy etc etc), so booting to a new image is not easy. What's the best way forward to recover this node without complete re-installation?
I managed to recover my system by analysing debug log from imgbase and re-using it's actions to recreate missing LV. Pasting here for reference only if someone else has similar problem. Chances are not all the steps are needed, but this worked for me: # lvcreate --thin --virtualsize 155508015104B --name ovirt-node-ng-4.1.8-0.20171211.0 onn/pool00 Using default stripesize 64.00 KiB. WARNING: Sum of all thin volume sizes (<1.02 TiB) exceeds the size of thin pool onn/pool00 and the size of whole volume group (220.00 GiB)! For thin pool auto extension activation/thin_pool_autoextend_threshold should be below 100. Logical volume "ovirt-node-ng-4.1.8-0.20171211.0" created. # lvchange --addtag imgbased:base onn/ovirt-node-ng-4.1.8-0.20171211.0 Logical volume onn/ovirt-node-ng-4.1.8-0.20171211.0 changed. # lvchange --permission r onn/ovirt-node-ng-4.1.8-0.20171211.0 Logical volume onn/ovirt-node-ng-4.1.8-0.20171211.0 changed. # lvchange --setactivationskip y onn/ovirt-node-ng-4.1.8-0.20171211.0 Logical volume onn/ovirt-node-ng-4.1.8-0.20171211.0 changed. # lvchange --activate n onn/ovirt-node-ng-4.1.8-0.20171211.0 # lvchange --permission rw onn/ovirt-node-ng-4.1.8-0.20171211.0 Logical volume onn/ovirt-node-ng-4.1.8-0.20171211.0 changed. # lvchange --activate y onn/ovirt-node-ng-4.1.8-0.20171211.0 --ignoreactivationskip # mkfs.ext4 -E discard /dev/onn/ovirt-node-ng-4.1.8-0.20171211.0 mke2fs 1.42.9 (28-Dec-2013) Discarding device blocks: done Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=16 blocks, Stripe width=16 blocks 9494528 inodes, 37965824 blocks 1898291 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=2187329536 1159 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done # lvchange --permission r onn/ovirt-node-ng-4.1.8-0.20171211.0 Logical volume onn/ovirt-node-ng-4.1.8-0.20171211.0 changed. # lvchange --setactivationskip y onn/ovirt-node-ng-4.1.8-0.20171211.0 Logical volume onn/ovirt-node-ng-4.1.8-0.20171211.0 changed. # lvchange --activate n onn/ovirt-node-ng-4.1.8-0.20171211.0 # imgbase layout ovirt-node-ng-4.1.8-0.20171211.0 +- ovirt-node-ng-4.1.8-0.20171211.0+1 ovirt-node-ng-4.1.9-0.20180124.0 +- ovirt-node-ng-4.1.9-0.20180124.0+1 ovirt-node-ng-4.2.3.1-0.20180530.0 +- ovirt-node-ng-4.2.3.1-0.20180530.0+1 # lvremove onn/ovirt-node-ng-4.2.3.1-0.20180530.0+1 Do you really want to remove active logical volume onn/ovirt-node-ng-4.2.3.1-0.20180530.0+1? [y/n]: y Logical volume "ovirt-node-ng-4.2.3.1-0.20180530.0+1" successfully removed # lvremove onn/ovirt-node-ng-4.2.3.1-0.20180530.0 Do you really want to remove active logical volume onn/ovirt-node-ng-4.2.3.1-0.20180530.0? [y/n]: y Logical volume "ovirt-node-ng-4.2.3.1-0.20180530.0" successfully removed # yum reinstall ovirt-node-ng-image-update -y
I'm glad you were able to resolve, but any idea how the system got in this state in the first place? This is a totally new report to me, and I've never seen anything like it... It looks like one of the LVs was removed but LVM still had it cached somewhere. Closing for now, since you worked around it, but still responding to comments...