Description of problem: RHVH wont start after reboot Version-Release number of selected component (if applicable): we have been chasing this issue for a while now current version is RHVH-4.1-20170914.1 How reproducible: in our automation it happens a lot, virt qe didnt see it at all Steps to Reproduce: 1. Install RHVH 2. Cycle multiple times until it reproduces 3. Actual results: The host wont start it is stuck after probing EDD step Expected results: Host should start Additional info: According to Yuval T, this happens in cold reboot and related to the imgbase-copy-bootfiles script, but Ill let him put his input
Yuval got some results this morning, though I'll let him chime in. It seems that systemd may be killing the script. I wonder if we can simply set TimeoutSec=30
imgbase-copy-bootfiles copies the kernel and initrd on shutdown from /boot to /boot/rhvh..., and while this copy is being done systemd kills the unit's processes (cp), leaving a partial initrd (or kernel) file under /boot/rhvh, and making the system unbootable. Something like the following: # ls -l /boot/initramfs-3.10.0-693.2.2.el7.x86_64.img /boot/rhvh-4.1-0.20170914.0+1/initramfs-3.10.0-693.2.2.el7.x86_64.img -rw-------. 1 root root 59685039 Sep 19 16:32 /boot/initramfs-3.10.0-693.2.2.el7.x86_64.img -rw-------. 1 root root 59685039 Sep 19 16:50 /boot/rhvh-4.1-0.20170914.0+1/initramfs-3.10.0-693.2.2.el7.x86_64.img # /usr/sbin/imgbase-copy-bootfiles shutdown & while [ 1 ]; do killall -9 cp; done 2>/dev/null <ctrl-c> # ls -l /boot/initramfs-3.10.0-693.2.2.el7.x86_64.img /boot/rhvh-4.1-0.20170914.0+1/initramfs-3.10.0-693.2.2.el7.x86_64.img -rw-------. 1 root root 59685039 Sep 19 16:32 /boot/initramfs-3.10.0-693.2.2.el7.x86_64.img -rw-------. 1 root root 0 Sep 19 16:54 /boot/rhvh-4.1-0.20170914.0+1/initramfs-3.10.0-693.2.2.el7.x86_64.img Adding KillMode=none to the imgbase-copy-bootfiles service unit seems to solve this.
Hi, can we somehow use/change this in kickstart template in foreman? Or we should wait for new build where it will be fixed? Thanks, Petr
(In reply to Petr Balogh from comment #3) > Hi, > > can we somehow use/change this in kickstart template in foreman? Or we > should wait for new build where it will be fixed? > > Thanks, Petr I think it's very rare, it's not a fix in the kickstart but in the system itself - check out the patch, it should solve this issue.
I tried to reproduce this issue: 1. Install RHVH-4.1-20170914.1-RHVH-x86_64-dvd1.iso on Dell PowerEdge R730 for many times. 2. Cold boot Dell PowerEdge R730 installed with RHVH-4.1-20170914.1 for many times. The issue didn't occur. Nelly, could you help to verify this bug?
Should be in oVirt 4.1.7 RC3
it is stuck again on some host when trying to install rhvh-4.1-0.20171012.0. i did try to install on 3 hosts and 2 out of 3 successfully installed.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1339618 [details] journalctl.dump journalctl dump attached
Ryan i attched the journalctl output.
Reducing severity as it doesn't reproduce on many hosts.
Re-targeting to 4.1.8 not being a 4.1.7 blocker
I think this one is fixed, can we close it ?
I believe so, we havent seen it for a while now
yuval can probably add more details, but looks like only SIGTERM was handled, while in PM tests a different signal is sent (SIGKILL?) causing the issue to reproduce in these tests
(In reply to Nelly Credi from comment #18) > yuval can probably add more details, > but looks like only SIGTERM was handled, while in PM tests a different > signal is sent (SIGKILL?) causing the issue to reproduce in these tests Nelly, following latest iterations around this bug, do you know how often it reproduces and on what percentages of the systems? we would like to understand the current status here, thanks.
atm it reproduces during some sla PM test (afaik it happens every time). once it happens the host cannot recover, so it is causing more failures in other tests
Hi Tareq, Can you help to verify this bug, as we can not reproduce this bug with our machines. The latest 4.2 iso containing the new patch is RHVH-4.2-20180128.0-RHVH-x86_64-dvd1.iso. (4.1 iso RHVH-4.1-20180128.0-RHVH-x86_64-dvd1.iso also contains the new patch)
it didn'r reporduce with RHVH-4.2-20180203.0-RHVH-x86_64-dvd1.iso
Verify this bug according #c25.
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.