Bug 1180371

Summary: [3.5_7.0] Miss dump files in /var/crash/ of ssh server after configure kdump via ssh
Product: Red Hat Enterprise Virtualization Manager Reporter: haiyang,dong <hadong>
Component: ovirt-nodeAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED ERRATA QA Contact: wanghui <huiwa>
Severity: high Docs Contact:
Priority: medium    
Version: 3.5.0CC: cshao, fdeutsch, gklein, gouyang, huiwa, leiwang, lsurette, rbarry, ycui, ykaul
Target Milestone: ovirt-3.6.0-rc3   
Target Release: 3.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-node-3.6.1-5.0.el7ev Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-03-09 14:24:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1139298    
Bug Blocks:    
Attachments:
Description Flags
attached var logs for rhev-hypervisor7-7.0-20150106.0.el7ev
none
attached sosreport logs for rhev-hypervisor7-7.0-20150106.0.el7ev none

Description haiyang,dong 2015-01-09 02:51:23 UTC
Created attachment 978006 [details]
attached var logs for rhev-hypervisor7-7.0-20150106.0.el7ev

Description of problem:
After configured network,configure kdump by using SSH.
Although configure kdump via ssh success,
but Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell,
system halted and didn't reboot auto. after reboot manual, no dump files in /var/crash/ of ssh server

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.0-20150106.0.el7ev
ovirt-node-3.1.0-0.40.20150105git69f34a6.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1.Install rhev-hypervisor7-7.0-20150106.0.el7ev
2.Configure network.
3.Configure kdump by using SSH under Kernel Dump Page.
4.Check kdump status
5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell

Actual results:
1. After step 4, configure kdump via ssh success
2. After step 5, system halted and didn't reboot auto. after reboot manual, no dump files in /var/crash/ of ssh server

Expected results:
kdump can work by using remote kdump server SSH.

Additional info:

Comment 1 haiyang,dong 2015-01-09 02:57:24 UTC
Created attachment 978007 [details]
attached sosreport logs for rhev-hypervisor7-7.0-20150106.0.el7ev

Comment 2 Ryan Barry 2015-01-09 18:08:35 UTC
Fabian -

I'm testing various kdump configs as long as I'm on it.

Since I'd previously assumed that the kdump service itself would report success or failure, I didn't actually try dumping, but there appears to be another problem.

I've already opened bz#1139298, but it's not enough for kdumpctl to start successfully. The "ssh" keyword in kdump.conf appears to also be parsed by something in the initramfs, and renaming it to "net" does not leave a dump, even though the service is configured properly.

A Z-stream clone of bz#1139298 will be required if we want kdump over SSH to work in EL7.0

Comment 3 Yaniv Lavi 2015-01-12 14:47:03 UTC
Moving to 3.5.1, since we will focus on getting local kdump to work on RHEV-H 7.
Adding flag to add release note on this for 3.5.0.

Comment 4 Fabian Deutsch 2015-06-03 08:12:14 UTC
Dong, could you please check the state of kdump configuration in RHEV-H from 3.5.1?
I see that all except patch 36743 were merged into the stable branch, and thus I see a good chance that it is working.

Comment 5 haiyang,dong 2015-06-03 09:55:20 UTC
Test version:
RHEV-H 7.1 for RHEV 3.5.1-2 rhev-hypervisor7-7.1-20150512.1
ovirt-node-3.2.2-3.el7.noarch

Test Steps:
1.Install rhevh-7.1-20150512.1.el7ev.iso
2.Configure network.
3.Configure kdump by using SSH under Kernel Dump Page.
4.Check kdump status
5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell

Test results:
1. After step 4, configure kdump via ssh success and the kdump.conf is the follow:
..
default reboot
net root.8.137
but the kdump status was failed
[root@dhcp-8-115 admin]# service kdump status -l
Redirecting to /bin/systemctl status  -l kdump.service
kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; disabled)
   Active: failed (Result: exit-code) since Wed 2015-06-03 09:24:45 UTC; 19min ago
  Process: 1954 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
 Main PID: 1954 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/kdump.service

Jun 03 09:24:26 localhost kdumpctl[1954]: /usr/lib/dracut/modules.d/50drm/module-setup.sh: line 26: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/gpu/drm/qxl/qxl.ko: No such file or directory
Jun 03 09:24:26 localhost dracut[4549]: *** Including module: dm ***
Jun 03 09:24:26 localhost dracut[4549]: Skipping udev rule: 64-device-mapper.rules
Jun 03 09:24:26 localhost dracut[4549]: Skipping udev rule: 60-persistent-storage-dm.rules
Jun 03 09:24:26 localhost dracut[4549]: Skipping udev rule: 55-dm.rules
Jun 03 09:24:26 localhost dracut[4549]: *** Including module: dmsquash-live ***
Jun 03 09:24:27 localhost dracut[4549]: *** Including module: kernel-modules ***
Jun 03 09:24:45 localhost systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
Jun 03 09:24:45 localhost systemd[1]: Failed to start Crash recovery kernel arming.
Jun 03 09:24:45 localhost systemd[1]: Unit kdump.service entered failed state.

2. After step 5, system reboot auto. after reboot, no dump files in /var/crash/ of ssh server

Comment 6 Ryan Barry 2015-06-03 14:51:31 UTC
I don't see a good reason why this is failing. Restarting kdump after these steps immediately succeeds, and it works on a reboot. There's nothing in the output to indicate why it's failing. I'm going to go through the steps kdump.service takes by hand, I guess.

Comment 7 Ryan Barry 2015-06-03 17:36:35 UTC
The output of "service" is misleading.

"service" and "kdumpctl" don't agree on what's happening. On a fresh install, before configuring ssh, "service kdump status" shows it failed. "kdumpctl status" shows "kdump is operational". Restarting kdump (or ovirt-kdump) makes these agree. I'm inclined to believe that it's a service ordering problem with ovirt-kdump, though I don't know what kdump is depending on earlier in the boot process that's making it fail.

This probably also means that the default local kdump configuration is broken in 7.1, and I'll submit a patch that adds the same dependencies regular kdump.service has to ovirt-kdump.service

I'll also make sure that the kdump service gets restarted even if kdumpctl exists, since it doesn't propagate changes.

Partially, the dump is not succeeding because the core_collector options are not set correctly. This is because https://gerrit.ovirt.org/#/c/36743/ was not pulled into ovirt-3.5, and it needs to be.

The other reason is because of the bugs listed above (no augeas support for "ssh"). This works on EL7 now, but not EL6. I'll add a temporary code path to sub it out on EL7 until the EL6 Z-stream (which is verified, but waiting for release) gets pushed.

Comment 9 Fabian Deutsch 2015-09-15 10:11:34 UTC
It looks like the issue described in this bug is related to clean setups where the target folder given to kdump does not exist yet on the server side.

Tome this is a not to urgent matter, as kdump is behaving the same way as on RHEL.

Comment 10 Sandro Bonazzola 2015-10-26 12:32:42 UTC
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 11 wanghui 2015-11-24 03:26:01 UTC
Test version:
rhev-hypervisor7-7.2-20151112.1.el7ev
ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch

Test steps:
1.Install rhev-hypervisor7-7.2-20151112.1.el7ev
2.Configure network
3.Configure kdump by using SSH under Kernel Dump Page
4.Check kdump status
5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell
6.Check the kdump file in ssh server

Test result:
1. RHEV-H reboot succeed.
2. After step6, dump file is in remote ssh server. 

So this issue is fixed in ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch. Change the status to verified.

Comment 13 errata-xmlrpc 2016-03-09 14:24:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0378.html