Bug 1095140
Summary: | [3.5_7.0] configure kdump fail via local | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | haiyang,dong <hadong> | ||||||
Component: | ovirt-node | Assignee: | Ryan Barry <rbarry> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 3.5.0 | CC: | aberezin, aburden, bhe, cshao, dfediuck, dyoung, fdeutsch, gklein, gouyang, hadong, huiwa, iheim, juwu, leiwang, michal.skrivanek, ruyang, yaniwang, ycui, ylavi | ||||||
Target Milestone: | --- | Keywords: | Regression | ||||||
Target Release: | 3.5.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | node | ||||||||
Fixed In Version: | ovirt-node-3.2.1-5.el6 ovirt-node-3.2.1-5.el7 | Doc Type: | Known Issue | ||||||
Doc Text: |
Local configurations of kdump now work as expected in Red Hat Enterprise Virtualization Hypervisor 7.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2015-02-11 20:56:49 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | 1097453, 1181285 | ||||||||
Bug Blocks: | 1094719, 1164308, 1164311 | ||||||||
Attachments: |
|
Moving to 3.5 as 7.0 is affected. This is a mass change, moving bugs of merged patches into MODIFIED. Please correct the state, if you think that the move was not justified. Test version: rhev-hypervisor7-7.0-20140807.0.iso ovirt-node-3.1.0-0.6.20140731git2c8e71f.el7.noarch configure kdump still failed via local/ssh/nfs [root@dhcp-9-33 admin]# service kdump restart Redirecting to /bin/systemctl restart kdump.service Job for kdump.service failed. See 'systemctl status kdump.service' and 'journalctl -xn' for details. [root@dhcp-9-33 admin]# service kdump status Redirecting to /bin/systemctl status kdump.service kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; disabled) Active: failed (Result: exit-code) since Tue 2014-08-12 06:17:07 UTC; 16s ago Process: 2667 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE) Main PID: 2667 (code=exited, status=1/FAILURE) Aug 12 06:17:06 dhcp-9-33.nay.redhat.com systemd[1]: Starting Crash recovery kernel arming... Aug 12 06:17:07 dhcp-9-33.nay.redhat.com kdumpctl[2667]: Error: /boot-kdump/vmlinuz-3.10.0-123.6.3.el7.x86_64 not found. Aug 12 06:17:07 dhcp-9-33.nay.redhat.com kdumpctl[2667]: Starting kdump: [FAILED] Aug 12 06:17:07 dhcp-9-33.nay.redhat.com systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE Aug 12 06:17:07 dhcp-9-33.nay.redhat.com systemd[1]: Failed to start Crash recovery kernel arming. Aug 12 06:17:07 dhcp-9-33.nay.redhat.com systemd[1]: Unit kdump.service entered failed state. so need to re-assigned this bug again We're waiting on bz#1097453 in order to resolve another problem blocking kdump, though the linked patch is also necessary. Setting this back to modified as our part is fixed, and we are just waiting for a platform fix. Test version: rhev-hypervisor7-7.0-20140926.0.iso ovirt-node-3.1.0-0.17.20140925git29c3403.el7.noarch This issue is still exist in rhev-hypervisor7-7.0-20140926.0.iso. And kdump service is not up default. So change the status from ON_QA to Assigned. Just additional info for comment 7, the bz #1097453 in comment 5 is cloned to z-stream bz #1130112(CLOSED ERRATA), so we need to check node side. Test version: rhev-hypervisor7-7.0-20141006.0.el7ev ovirt-node-3.1.0-0.20.20141006gitc421e04.el7.noarch Test steps: 1. Although configure kdump via local success, but Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell, system halted and didn't reboot auto. after reboot manual, no dump files in /data/core/. 2. configure kdump via ssh -FAILED An error occurred: KDump configuration failed, location unreachable. Previous configuration was restored. 3. configure kdump via nfs -FAILED An error occurred: KDump configuration failed, location unreachable. Previous configuration was restored. so need assigned this bug again The patches never got merged. Not sure how this made it to MODIFIED. Moving back to POST. Doc text added for beta 5 release as per engineering request. Please update the doc text for GA or just simply set the 'requires_release_note' flag to - and the doc text would be excluded from the GA release notes. If you don't set the 'requires_release_note' flag to - for GA, and the doc text gets pulled in for GA release notes, I will not go into every bug to manually remove the text. Beware of the consequences. Test version: rhev-hypervisor7-7.0-20150106.0.el7ev ovirt-node-3.1.0-0.40.20150105git69f34a6.el7.noarch Test steps: 1. Although configure kdump via local success, but Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell, system halted and didn't reboot auto. after reboot manual, no dump files in /data/core/. 2. Although configure kdump via ssh success, but Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell, system halted and didn't reboot auto. after reboot manual, no dump files in /var/crash/ of ssh server 3. configure kdump via nfs -FAILED An error occurred: KDump configuration failed, location unreachable. Previous configuration was restored. so need assigned this bug again to justify 3.5 blocker.... is this a critical functionality? is this really an Urgent severity? We should at least be capable of configuring kdump to store the core dumps locally. This bug is to broad: It covers failed configuration of using ssh, nfs and local as a target. To get a more detailed view we should split the bug up, because technically it looks like theer are different causes for the failures. I'd suggest to keep this bug for local configuration, and open new bugs to cover ssh and nfs. This will also help to identify the bugs we see as a blocker. For me only the failed local configuration is a blocker. The symptoms in comment 18 look like beeing caused by bug 1175967 http://gerrit.ovirt.org/36312 is fixing the local configuration > 3. configure kdump via nfs -FAILED
>
> An error occurred:
>
> KDump configuration failed, location unreachable. Previous configuration was
> restored.
>
> so need assigned this bug again
I'm not able to reproduce this, kdump configuration over NFS on rhev-hypervisor7-7.0-20150106.0.el7ev succeeds when I test.
Can you please provide logs and exact steps to reproduce, preferably with the TUI in debug mode?
Clarification: The configuration is corerct now. But the core dump still fails, because kdump does not recognize the root=live:… kernel argument. From the node side we are good for nopw, we just need to teach kdump how to work with node. Wang - There appear to be a couple of problems here, some we can work around, but at least one that I'm not sure how to work around. mkdumprd on el7 sets --hostonly by default, with no apparent way to override it, which leaves dmsquash-live out of dracut (rhev-h runs as a squashfs image, and we need this, otherwise dracut doesn't know how to handle root=live...). If I remove this by hand so an initramfs which includes dmsquash-live is included and trigger a dump, it waits for an incredibly long time unpacking the initramfs. Long enough that it would appear to be a hang, while taking 100% CPU for the duration. The longest I've waited for this is 30 minutes, and it's never proceeded beyond that point (despite bumping the memory available to the VM). What could be happening here? Any suggestions for debugging this? The kdump kernel appears to be re-execed with mostly the same options, but neither "verbose" nor any other options I've tried (including "debug") show me anything useful. Any suggestions? [CCing other kdump developers here] Hi, Ryan If I understand correctly, you're using a particular /etc/kdump.conf to enable kdump service in rhev-h environment. We set --hostonly because we need to keep initrd image as minimal as possible. The first problem is dmsquash-live isn't included by default in "hostonly" mode. You can resolve this by editing /etc/kdump.conf as the following: # vim /etc/kdump.conf dracut_args --add dmsquash-live The next problem is kernel hang at unpacking initramfs. Could you attach the console log? How do you configure crashkernel=X? How much is your custom kdump initrd in size? If you remove '--hostonly', I'm afraid the initramfs size will be bloating and kdump kernel may be hang at unpacking initramfs because memory is running out. Thanks WANG Chao (In reply to Fabian Deutsch from comment #21) > This bug is to broad: It covers failed configuration of using ssh, nfs and > local as a target. > > To get a more detailed view we should split the bug up, because technically > it looks like theer are different causes for the failures. > > I'd suggest to keep this bug for local configuration, and open new bugs to > cover ssh and nfs. new bug to cover configure kdump via ssh failed: https://bugzilla.redhat.com/show_bug.cgi?id=1180371 new bug to cover configure kdump via nfs failed: https://bugzilla.redhat.com/show_bug.cgi?id=1180377 > > This will also help to identify the bugs we see as a blocker. > > For me only the failed local configuration is a blocker. (In reply to WANG Chao from comment #27) > [CCing other kdump developers here] > > Hi, Ryan > > If I understand correctly, you're using a particular /etc/kdump.conf to > enable kdump service in rhev-h environment. We set --hostonly because we > need to keep initrd image as minimal as possible. The kdump.conf included in rhev-h is pretty stock, at least. The only real difference is that we're dumping to a LV instead of /var/crash, but that's never been a problem before (in EL6). We don't make a ton of changes. > > The first problem is dmsquash-live isn't included by default in "hostonly" > mode. You can resolve this by editing /etc/kdump.conf as the following: > > # vim /etc/kdump.conf > dracut_args --add dmsquash-live Unfortunately, to my knowledge, dmsquash-live and hostonly are orthogonal. Even adding it to dracut_args (or dracut.conf), dracut (via mkdumprd) spits out messages indicating that it can't be found or included unless hostonly is removed. If you're aware of a way to run in hostonly mode *and* include dmsquash-live, I would be happy to do that. In theory, we could patch out module-setup for dmsquash-live or file an RFE allowing for this. > > The next problem is kernel hang at unpacking initramfs. Could you attach the > console log? How do you configure crashkernel=X? How much is your custom > kdump initrd in size? It's 18M with --hostonly. 18M if I remove the check for hostonly from dmsquash-live. 33M without --hostonly. We don't touch crashkernel, so it's likely the default for EL7. > > If you remove '--hostonly', I'm afraid the initramfs size will be bloating > and kdump kernel may be hang at unpacking initramfs because memory is > running out. This looks like it's correct, though surprising. Using an initramfs with --hostonly *and* with dmsquash-live (by patching the check) successfully unpacks the initramfs and works. Thanks for the help. Just a note, we are setting crashkernel=128M on the kernel commandline. Created attachment 978298 [details]
console output from failed kdump
Test version: rhev-hypervisor7-7.0-20150114.0.iso ovirt-node-3.2.1-4.el7.noarch Test steps: 1.Install rhev-hypervisor7-7.0-20150114.0.iso 2.Configure network. 3.Configure kdump by using NFS/SSH/Local under Kernel Dump Page. 4.Check kdump status 5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell. Test result: After step4, check kdump status was up. After step5, check kdump functions works well.system reboot auto. after reboot sucess, found dump files in /data/core/ of rhevh host. so this bug has been fixed, after the status changed into "ON_QA", i will verify this bug. According to comment 33, clean this needinfo. updated comment 33, the test step 3 should be Configure kdump by using Local under Kernel Dump Page, didn't contain nfs/ssh types. Test version: rhev-hypervisor7-7.0-20150123.2.iso ovirt-node-3.2.1-6.el7.noarch Test steps: 1.Install rhev-hypervisor7-7.0-20150123.2.iso 2.Configure network. 3.Configure kdump by using Local under Kernel Dump Page. 4.Check kdump status 5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell. Test result: After step4, check kdump status was up. After step5, check kdump functions works well.system reboot auto. after reboot sucess, found dump files in /data/core/ of rhevh host. so this bug has been fixed, changed the status into "VERIFIED". Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2015-0160.html |
Created attachment 893151 [details] attached Screenshot for error occurred screen Description of problem: After configured network,configure kdump by using these methods: NFS,SSH,Local, but configre kdump fail by using remote kdump server(NFS/SSH) and Local. Version-Release number of selected component (if applicable): rhevh-7.0-20140424.0.iso ovirt-node-3.1.0-0.2.20140424gitbfdfc00.el7 How reproducible: 100% Steps to Reproduce: 1.Install rhevh-7.0-20140424.0.iso 2.Configure network. 3.Configure kdump by using NFS/SSH/Local under Kernel Dump Page. 4.Check kdump status Actual results: 1. kdump fail to work by using remote kdump server(NFS/SSH) and Local. 2. After apply it,Pop:"KDump configuration failed,location unreachable"(error occurred screen.png) Expected results: kdump can work by using remote kdump server(NFS/SSH) and Local. Additional info: