Created attachment 1303571 [details] /var/log/*, /tmp/*, sosreport Description of problem: After installation finished, there is no RHVH boot entry. Still can log into system, the node status is : node status: OK See `nodectl check` for more information Run 'nodectl check', the result is : Status: FAILED Bootloader ... FAILED - It looks like there are no valid bootloader entries. Please ensure this is fixed before rebooting. Layer boot entries ... FAILED - No bootloader entries which point to imgbased layers Valid boot entries ... FAILED - No valid boot entries for imgbased layers or non-imgbased layers Mount points ... FAILED - This can happen if the installation was performed incorrectly Separate /var ... OK Discard is used ... FAILED - 'discard' mount option was not added or got removed Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... OK Checking available space in thinpool ... OK Checking thinpool auto-extend ... OK vdsmd ... OK Check ks-script-*.log under /var/log/anaconda/, there is error: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1468, in thread_group_handler t.join_with_exceptions() File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1460, in join_with_exceptions raise exc[1] BootSetupError Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1468, in thread_group_handler t.join_with_exceptions() File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1460, in join_with_exceptions raise exc[1] SystemExit: 1 Status: ^[[31m^[[1mFailure^[[0m Reason: Trying to create a manageable base from '/' Initial base will be <Base rhvh-4.1-0.20170721.0 [] /> Initial layer will be <Layer rhvh-4.1-0.20170721.0+1 /> Creating an initial base <Base rhvh-4.1-0.20170721.0 [] /> for <LV 'rhvh_dhcp-10-111/root' /> Creating initial layer <Layer rhvh-4.1-0.20170721.0+1 /> for initial base Adding a new layer after <Base rhvh-4.1-0.20170721.0 [] /> Adding a new layer after <Base rhvh-4.1-0.20170721.0 [] /> New layer will be: <Layer rhvh-4.1-0.20170721.0+1 /> Verifying stream compatability Migrating /etc (from <LV 'rhvh_dhcp-10-111/rhvh-4.1-0.20170721.0+1' />) Migrating /root Syncing systemd levels Inspecting if the layer contains OS data Adjusting mount and boot related points Failed to update OS Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/imgbased/plugins/osupdater.py", line 126, in thread_boot_migrator adjust_mounts_and_boot(imgbase, new_lv, previous_layer_lv) File "/usr/lib/python2.7/site-packages/imgbased/plugins/osupdater.py", line 632, in adjust_mounts_and_boot log.debug("Old def grub: %s" % old_grub_append) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 233, in __exit__ self.mp.umount() File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 210, in umount self.run.call(["umount", self.target]) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 365, in call stdout = call(*args, **kwargs) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 147, in call return subprocess.check_output(*args, **kwargs).strip() File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output raise CalledProcessError(retcode, cmd, output=output) CalledProcessError: Command '['umount', u'/tmp/mnt.ZOg64']' returned non-zero exit status 32 Failed to migrate etc Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/imgbased/plugins/osupdater.py", line 117, in on_new_layer thread_group_handler(threads) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1471, in thread_group_handler sys.exit(1) SystemExit: 1 This issue cannot be reproduced 100%, some installation via Anaconda GUI can succeed, but others could not, some installation choosing auto partitioning could succeed, but others could not. Version-Release number of selected component (if applicable): redhat-virtualization-host-4.1-20170721.0 imgbased-0.9.34-0.1.el7ev.noarch How reproducible: 80% Steps to Reproduce: 1. Install redhat-virtualization-host-4.1-20170721.0 2. After installation finished, log into system 3. Run 'nodectl check' 4. Check ks-script-*.log under /var/log/anaconda/ Actual results: 1. After step2, there is no RHVH boot entry, but still can log into system 2. After step3, the result of 'nodectl check' is as above 3. After step4, there is ks-script-*.log like above Expected results: 1. After step2, there is RHVH boot entry 2. After step3, the result of 'nodectl check' is ok without FAILED. Additional info:
I'm trying to reproduce this. I wonder why Anaconda did not show that an exception was thrown and fail, though. It's very hard to diagnose without the complete imgbased log...
I've spent a couple hours without a reproducer. There's a patch attached which will hopefully resolve, but can you please provided detailed steps for some scenario which fails so it can be verified?
The reproducing rate became lower using the latest ISO "RHVH-4.1-20170723.1-RHVH-x86_64-dvd1.iso", previously QE used PXE installation with the old profile made from "RHVH-4.1-20170718.2-RHVH-x86_64-dvd1.iso". This issue still can be reproduced on my local testing machine using the latest pxe profile made from "RHVH-4.1-20170723.1-RHVH-x86_64-dvd1.iso", so I can use this machine to verify the modification. The steps are: 1. Install redhat-virtualization-host-4.1-20170721.0 via PXE with the latest profile made from "RHVH-4.1-20170723.1-RHVH-x86_64-dvd1.iso", and the kickstart file is: liveimg --url=URL/to/redhat-virtualization-host-4.1-20170721.0.x86_64.liveimg.squashfs autopart --type=thinp %post --erroronfail nodectl init %end 2. Choose auto partitioning on Anaconda GUI, and continue to finish other mandatory steps. 3. Reboot at the end of installation. 4. Log into system, run 'nodectl check' It's better to verify this with the new PXE profile made from the latest ISO containing the modification, or just use the ISO to install, the steps are the same as step2 to step4.
Unfortunately, ,that's what I tested with yesterday. I also tested all day today in a loop, and was unable to reproduce... The test systems for bz#1474268 allowed me to look at systems where this failed, though.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Verify Versions: RHVH-4.1-20170730.1-RHVH-x86_64-dvd1.iso redhat-virtualization-host-4.1-20170728.0 imgbased-0.9.36-0.1.el7ev.noarch Verify Steps and Results: 1. Install RHVH via the latest ISO on the machine which failed to install "RHVH-4.1-20170723.1-RHVH-x86_64-dvd1.iso" last time. 2. Choose auto partitioning, and finish other needed steps. 3. Reboot at the end of installation. 4. Log into system, run 'nodectl check' [root@dhcp-10-111 ~]# nodectl check Status: OK Bootloader ... OK Layer boot entries ... OK Valid boot entries ... OK Mount points ... OK Separate /var ... OK Discard is used ... OK Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... OK Checking available space in thinpool ... OK Checking thinpool auto-extend ... OK vdsmd ... OK The issue described in comment #0 also hasn't appeared on other machines when installing with the latest RHVH build, so set the status to VERIFIED.