Created attachment 978006 [details] attached var logs for rhev-hypervisor7-7.0-20150106.0.el7ev Description of problem: After configured network,configure kdump by using SSH. Although configure kdump via ssh success, but Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell, system halted and didn't reboot auto. after reboot manual, no dump files in /var/crash/ of ssh server Version-Release number of selected component (if applicable): rhev-hypervisor7-7.0-20150106.0.el7ev ovirt-node-3.1.0-0.40.20150105git69f34a6.el7.noarch How reproducible: 100% Steps to Reproduce: 1.Install rhev-hypervisor7-7.0-20150106.0.el7ev 2.Configure network. 3.Configure kdump by using SSH under Kernel Dump Page. 4.Check kdump status 5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell Actual results: 1. After step 4, configure kdump via ssh success 2. After step 5, system halted and didn't reboot auto. after reboot manual, no dump files in /var/crash/ of ssh server Expected results: kdump can work by using remote kdump server SSH. Additional info:
Created attachment 978007 [details] attached sosreport logs for rhev-hypervisor7-7.0-20150106.0.el7ev
Fabian - I'm testing various kdump configs as long as I'm on it. Since I'd previously assumed that the kdump service itself would report success or failure, I didn't actually try dumping, but there appears to be another problem. I've already opened bz#1139298, but it's not enough for kdumpctl to start successfully. The "ssh" keyword in kdump.conf appears to also be parsed by something in the initramfs, and renaming it to "net" does not leave a dump, even though the service is configured properly. A Z-stream clone of bz#1139298 will be required if we want kdump over SSH to work in EL7.0
Moving to 3.5.1, since we will focus on getting local kdump to work on RHEV-H 7. Adding flag to add release note on this for 3.5.0.
Dong, could you please check the state of kdump configuration in RHEV-H from 3.5.1? I see that all except patch 36743 were merged into the stable branch, and thus I see a good chance that it is working.
Test version: RHEV-H 7.1 for RHEV 3.5.1-2 rhev-hypervisor7-7.1-20150512.1 ovirt-node-3.2.2-3.el7.noarch Test Steps: 1.Install rhevh-7.1-20150512.1.el7ev.iso 2.Configure network. 3.Configure kdump by using SSH under Kernel Dump Page. 4.Check kdump status 5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell Test results: 1. After step 4, configure kdump via ssh success and the kdump.conf is the follow: .. default reboot net root.8.137 but the kdump status was failed [root@dhcp-8-115 admin]# service kdump status -l Redirecting to /bin/systemctl status -l kdump.service kdump.service - Crash recovery kernel arming Loaded: loaded (/usr/lib/systemd/system/kdump.service; disabled) Active: failed (Result: exit-code) since Wed 2015-06-03 09:24:45 UTC; 19min ago Process: 1954 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE) Main PID: 1954 (code=exited, status=1/FAILURE) CGroup: /system.slice/kdump.service Jun 03 09:24:26 localhost kdumpctl[1954]: /usr/lib/dracut/modules.d/50drm/module-setup.sh: line 26: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/gpu/drm/qxl/qxl.ko: No such file or directory Jun 03 09:24:26 localhost dracut[4549]: *** Including module: dm *** Jun 03 09:24:26 localhost dracut[4549]: Skipping udev rule: 64-device-mapper.rules Jun 03 09:24:26 localhost dracut[4549]: Skipping udev rule: 60-persistent-storage-dm.rules Jun 03 09:24:26 localhost dracut[4549]: Skipping udev rule: 55-dm.rules Jun 03 09:24:26 localhost dracut[4549]: *** Including module: dmsquash-live *** Jun 03 09:24:27 localhost dracut[4549]: *** Including module: kernel-modules *** Jun 03 09:24:45 localhost systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE Jun 03 09:24:45 localhost systemd[1]: Failed to start Crash recovery kernel arming. Jun 03 09:24:45 localhost systemd[1]: Unit kdump.service entered failed state. 2. After step 5, system reboot auto. after reboot, no dump files in /var/crash/ of ssh server
I don't see a good reason why this is failing. Restarting kdump after these steps immediately succeeds, and it works on a reboot. There's nothing in the output to indicate why it's failing. I'm going to go through the steps kdump.service takes by hand, I guess.
The output of "service" is misleading. "service" and "kdumpctl" don't agree on what's happening. On a fresh install, before configuring ssh, "service kdump status" shows it failed. "kdumpctl status" shows "kdump is operational". Restarting kdump (or ovirt-kdump) makes these agree. I'm inclined to believe that it's a service ordering problem with ovirt-kdump, though I don't know what kdump is depending on earlier in the boot process that's making it fail. This probably also means that the default local kdump configuration is broken in 7.1, and I'll submit a patch that adds the same dependencies regular kdump.service has to ovirt-kdump.service I'll also make sure that the kdump service gets restarted even if kdumpctl exists, since it doesn't propagate changes. Partially, the dump is not succeeding because the core_collector options are not set correctly. This is because https://gerrit.ovirt.org/#/c/36743/ was not pulled into ovirt-3.5, and it needs to be. The other reason is because of the bugs listed above (no augeas support for "ssh"). This works on EL7 now, but not EL6. I'll add a temporary code path to sub it out on EL7 until the EL6 Z-stream (which is verified, but waiting for release) gets pushed.
It looks like the issue described in this bug is related to clean setups where the target folder given to kdump does not exist yet on the server side. Tome this is a not to urgent matter, as kdump is behaving the same way as on RHEL.
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015. Please review this bug and if not a blocker, please postpone to a later release. All bugs not postponed on GA release will be automatically re-targeted to - 3.6.1 if severity >= high - 4.0 if severity < high
Test version: rhev-hypervisor7-7.2-20151112.1.el7ev ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch Test steps: 1.Install rhev-hypervisor7-7.2-20151112.1.el7ev 2.Configure network 3.Configure kdump by using SSH under Kernel Dump Page 4.Check kdump status 5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell 6.Check the kdump file in ssh server Test result: 1. RHEV-H reboot succeed. 2. After step6, dump file is in remote ssh server. So this issue is fixed in ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch. Change the status to verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0378.html