Bug 1474296 - imgbased init sometimes fails during installation
imgbased init sometimes fails during installation
Status: CLOSED CURRENTRELEASE
Product: ovirt-node
Classification: oVirt
Component: Installation & Update (Show other bugs)
4.1
Unspecified Unspecified
unspecified Severity urgent (vote)
: ovirt-4.1.4
: 4.1
Assigned To: Ryan Barry
Qin Yuan
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-07-24 06:39 EDT by Qin Yuan
Modified: 2017-08-23 04:04 EDT (History)
15 users (show)

See Also:
Fixed In Version: imgbased-0.9.35-0.1.el7ev
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-08-23 04:04:15 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Node
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.1+
rule-engine: blocker+
cshao: testing_ack+


Attachments (Terms of Use)
/var/log/*, /tmp/*, sosreport (9.17 MB, application/x-gzip)
2017-07-24 06:39 EDT, Qin Yuan
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 79775 None None None 2017-07-24 14:59 EDT
oVirt gerrit 79811 master MERGED nodectl: catch the overall status when JSON 2017-07-26 14:58 EDT
oVirt gerrit 79860 ovirt-4.1 MERGED nodectl: catch the overall status when JSON 2017-07-26 15:00 EDT
oVirt gerrit 79861 ovirt-4.1-pre MERGED nodectl: catch the overall status when JSON 2017-07-26 15:00 EDT
oVirt gerrit 79862 ovirt-4.1-snapshot MERGED nodectl: catch the overall status when JSON 2017-07-26 15:00 EDT

  None (edit)
Description Qin Yuan 2017-07-24 06:39:51 EDT
Created attachment 1303571 [details]
/var/log/*, /tmp/*, sosreport

Description of problem:
After installation finished, there is no RHVH boot entry. Still can log into system, the node status is :

  node status: OK
  See `nodectl check` for more information

Run 'nodectl check', the result is :

Status: FAILED
Bootloader ... FAILED - It looks like there are no valid bootloader entries. Please ensure this is fixed before rebooting.
  Layer boot entries ... FAILED - No bootloader entries which point to imgbased layers
  Valid boot entries ... FAILED - No valid boot entries for imgbased layers or non-imgbased layers
Mount points ... FAILED - This can happen if the installation was performed incorrectly
  Separate /var ... OK
  Discard is used ... FAILED - 'discard' mount option was not added or got removed
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... OK
  Checking available space in thinpool ... OK
  Checking thinpool auto-extend ... OK
vdsmd ... OK

Check ks-script-*.log under /var/log/anaconda/, there is error:

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1468, in thread_group_handler
    t.join_with_exceptions()
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1460, in join_with_exceptions
    raise exc[1]
BootSetupError
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1468, in thread_group_handler
    t.join_with_exceptions()
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1460, in join_with_exceptions
    raise exc[1]
SystemExit: 1
Status: ^[[31m^[[1mFailure^[[0m
  Reason:
    Trying to create a manageable base from '/'
    Initial base will be <Base rhvh-4.1-0.20170721.0 [] />
    Initial layer will be <Layer rhvh-4.1-0.20170721.0+1 />
    Creating an initial base <Base rhvh-4.1-0.20170721.0 [] /> for <LV 'rhvh_dhcp-10-111/root' />
    Creating initial layer <Layer rhvh-4.1-0.20170721.0+1 /> for initial base
    Adding a new layer after <Base rhvh-4.1-0.20170721.0 [] />
    Adding a new layer after <Base rhvh-4.1-0.20170721.0 [] />
    New layer will be: <Layer rhvh-4.1-0.20170721.0+1 />
    Verifying stream compatability
    Migrating /etc (from <LV 'rhvh_dhcp-10-111/rhvh-4.1-0.20170721.0+1' />)
    Migrating /root
    Syncing systemd levels
    Inspecting if the layer contains OS data
    Adjusting mount and boot related points
    Failed to update OS
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/imgbased/plugins/osupdater.py", line 126, in thread_boot_migrator
        adjust_mounts_and_boot(imgbase, new_lv, previous_layer_lv)
      File "/usr/lib/python2.7/site-packages/imgbased/plugins/osupdater.py", line 632, in adjust_mounts_and_boot
        log.debug("Old def grub: %s" % old_grub_append)
      File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 233, in __exit__
        self.mp.umount()
      File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 210, in umount
        self.run.call(["umount", self.target])
      File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 365, in call
        stdout = call(*args, **kwargs)
      File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 147, in call
        return subprocess.check_output(*args, **kwargs).strip()
      File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
        raise CalledProcessError(retcode, cmd, output=output)
    CalledProcessError: Command '['umount', u'/tmp/mnt.ZOg64']' returned non-zero exit status 32
    Failed to migrate etc
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/imgbased/plugins/osupdater.py", line 117, in on_new_layer
        thread_group_handler(threads)
      File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 1471, in thread_group_handler
        sys.exit(1)
    SystemExit: 1


This issue cannot be reproduced 100%, some installation via Anaconda GUI can succeed, but others could not, some installation choosing auto partitioning could succeed, but others could not.


Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.1-20170721.0
imgbased-0.9.34-0.1.el7ev.noarch


How reproducible:
80%


Steps to Reproduce:
1. Install redhat-virtualization-host-4.1-20170721.0
2. After installation finished, log into system
3. Run 'nodectl check'
4. Check ks-script-*.log under /var/log/anaconda/


Actual results:
1. After step2, there is no RHVH boot entry, but still can log into system
2. After step3, the result of 'nodectl check' is as above
3. After step4, there is ks-script-*.log like above


Expected results:
1. After step2, there is RHVH boot entry
2. After step3, the result of 'nodectl check' is ok without FAILED.


Additional info:
Comment 1 Ryan Barry 2017-07-24 08:23:58 EDT
I'm trying to reproduce this.

I wonder why Anaconda did not show that an exception was thrown and fail, though. It's very hard to diagnose without the complete imgbased log...
Comment 2 Ryan Barry 2017-07-24 14:57:28 EDT
I've spent a couple hours without a reproducer.

There's a patch attached which will hopefully resolve, but can you please provided detailed steps for some scenario which fails so it can be verified?
Comment 3 Qin Yuan 2017-07-25 06:52:29 EDT
The reproducing rate became lower using the latest ISO "RHVH-4.1-20170723.1-RHVH-x86_64-dvd1.iso", previously QE used PXE installation with the old profile made from "RHVH-4.1-20170718.2-RHVH-x86_64-dvd1.iso". 

This issue still can be reproduced on my local testing machine using the latest pxe profile made from "RHVH-4.1-20170723.1-RHVH-x86_64-dvd1.iso", so I can use this machine to verify the modification. The steps are:

1. Install redhat-virtualization-host-4.1-20170721.0 via PXE with the latest profile made from "RHVH-4.1-20170723.1-RHVH-x86_64-dvd1.iso", and the kickstart file is:

liveimg --url=URL/to/redhat-virtualization-host-4.1-20170721.0.x86_64.liveimg.squashfs

autopart --type=thinp

%post --erroronfail
nodectl init
%end

2. Choose auto partitioning on Anaconda GUI, and continue to finish other mandatory steps.
3. Reboot at the end of installation.
4. Log into system, run 'nodectl check'

It's better to verify this with the new PXE profile made from the latest ISO containing the modification, or just use the ISO to install, the steps are the same as step2 to step4.
Comment 4 Ryan Barry 2017-07-25 17:38:31 EDT
Unfortunately, ,that's what I tested with yesterday. I also tested all day today in a loop, and was unable to reproduce...

The test systems for bz#1474268 allowed me to look at systems where this failed, though.
Comment 5 Red Hat Bugzilla Rules Engine 2017-07-27 07:25:29 EDT
This bug report has Keywords: Regression or TestBlocker.
Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Comment 7 Qin Yuan 2017-08-01 03:45:09 EDT
Verify Versions:
RHVH-4.1-20170730.1-RHVH-x86_64-dvd1.iso
redhat-virtualization-host-4.1-20170728.0
imgbased-0.9.36-0.1.el7ev.noarch

Verify Steps and Results:
1. Install RHVH via the latest ISO on the machine which failed to install "RHVH-4.1-20170723.1-RHVH-x86_64-dvd1.iso" last time.
2. Choose auto partitioning, and finish other needed steps.
3. Reboot at the end of installation.
4. Log into system, run 'nodectl check'
[root@dhcp-10-111 ~]# nodectl check
Status: OK
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... OK
  Checking available space in thinpool ... OK
  Checking thinpool auto-extend ... OK
vdsmd ... OK

The issue described in comment #0 also hasn't appeared on other machines when installing with the latest RHVH build, so set the status to VERIFIED.

Note You need to log in before you can comment on or make changes to this bug.