Bug 1180371 - [3.5_7.0] Miss dump files in /var/crash/ of ssh server after configure kdump via ssh
Summary: [3.5_7.0] Miss dump files in /var/crash/ of ssh server after configure kdump ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-3.6.0-rc3
: 3.6.0
Assignee: Fabian Deutsch
QA Contact: wanghui
URL:
Whiteboard:
Depends On: 1139298
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-09 02:51 UTC by haiyang,dong
Modified: 2016-03-09 14:24 UTC (History)
10 users (show)

Fixed In Version: ovirt-node-3.6.1-5.0.el7ev
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-03-09 14:24:56 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
attached var logs for rhev-hypervisor7-7.0-20150106.0.el7ev (1.26 MB, application/x-tar)
2015-01-09 02:51 UTC, haiyang,dong
no flags Details
attached sosreport logs for rhev-hypervisor7-7.0-20150106.0.el7ev (4.82 MB, application/x-xz)
2015-01-09 02:57 UTC, haiyang,dong
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0378 0 normal SHIPPED_LIVE ovirt-node bug fix and enhancement update for RHEV 3.6 2016-03-09 19:06:36 UTC
oVirt gerrit 36743 0 master MERGED Use explicit core_collector options Never
oVirt gerrit 36744 0 master MERGED Include dmsquash-live in kdump's dracut Never
oVirt gerrit 36819 0 ovirt-3.5 MERGED Include dmsquash-live in kdump's dracut Never
oVirt gerrit 36821 0 master MERGED Bump crashkernel size to 256M on EL7 Never
oVirt gerrit 36853 0 master MERGED Leave mkdumprd alone. Patch 90dmsquash-live instead Never
oVirt gerrit 36866 0 ovirt-3.5 MERGED Bump crashkernel size to 256M on EL7 Never
oVirt gerrit 36867 0 ovirt-3.5 MERGED Leave mkdumprd alone. Patch 90dmsquash-live instead Never
oVirt gerrit 41909 0 master MERGED ovirt-kdump should match the dependencies of regular kdump Never
oVirt gerrit 41910 0 master MERGED Always restart the kdump service, even if kdumpctl is present Never
oVirt gerrit 41911 0 master MERGED Use "ssh" instead of "net" in the augeas template on EL7 Never
oVirt gerrit 46108 0 None None None Never

Description haiyang,dong 2015-01-09 02:51:23 UTC
Created attachment 978006 [details]
attached var logs for rhev-hypervisor7-7.0-20150106.0.el7ev

Description of problem:
After configured network,configure kdump by using SSH.
Although configure kdump via ssh success,
but Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell,
system halted and didn't reboot auto. after reboot manual, no dump files in /var/crash/ of ssh server

Version-Release number of selected component (if applicable):
rhev-hypervisor7-7.0-20150106.0.el7ev
ovirt-node-3.1.0-0.40.20150105git69f34a6.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1.Install rhev-hypervisor7-7.0-20150106.0.el7ev
2.Configure network.
3.Configure kdump by using SSH under Kernel Dump Page.
4.Check kdump status
5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell

Actual results:
1. After step 4, configure kdump via ssh success
2. After step 5, system halted and didn't reboot auto. after reboot manual, no dump files in /var/crash/ of ssh server

Expected results:
kdump can work by using remote kdump server SSH.

Additional info:

Comment 1 haiyang,dong 2015-01-09 02:57:24 UTC
Created attachment 978007 [details]
attached sosreport logs for rhev-hypervisor7-7.0-20150106.0.el7ev

Comment 2 Ryan Barry 2015-01-09 18:08:35 UTC
Fabian -

I'm testing various kdump configs as long as I'm on it.

Since I'd previously assumed that the kdump service itself would report success or failure, I didn't actually try dumping, but there appears to be another problem.

I've already opened bz#1139298, but it's not enough for kdumpctl to start successfully. The "ssh" keyword in kdump.conf appears to also be parsed by something in the initramfs, and renaming it to "net" does not leave a dump, even though the service is configured properly.

A Z-stream clone of bz#1139298 will be required if we want kdump over SSH to work in EL7.0

Comment 3 Yaniv Lavi 2015-01-12 14:47:03 UTC
Moving to 3.5.1, since we will focus on getting local kdump to work on RHEV-H 7.
Adding flag to add release note on this for 3.5.0.

Comment 4 Fabian Deutsch 2015-06-03 08:12:14 UTC
Dong, could you please check the state of kdump configuration in RHEV-H from 3.5.1?
I see that all except patch 36743 were merged into the stable branch, and thus I see a good chance that it is working.

Comment 5 haiyang,dong 2015-06-03 09:55:20 UTC
Test version:
RHEV-H 7.1 for RHEV 3.5.1-2 rhev-hypervisor7-7.1-20150512.1
ovirt-node-3.2.2-3.el7.noarch

Test Steps:
1.Install rhevh-7.1-20150512.1.el7ev.iso
2.Configure network.
3.Configure kdump by using SSH under Kernel Dump Page.
4.Check kdump status
5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell

Test results:
1. After step 4, configure kdump via ssh success and the kdump.conf is the follow:
..
default reboot
net root.8.137
but the kdump status was failed
[root@dhcp-8-115 admin]# service kdump status -l
Redirecting to /bin/systemctl status  -l kdump.service
kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; disabled)
   Active: failed (Result: exit-code) since Wed 2015-06-03 09:24:45 UTC; 19min ago
  Process: 1954 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
 Main PID: 1954 (code=exited, status=1/FAILURE)
   CGroup: /system.slice/kdump.service

Jun 03 09:24:26 localhost kdumpctl[1954]: /usr/lib/dracut/modules.d/50drm/module-setup.sh: line 26: /lib/modules/3.10.0-229.1.2.el7.x86_64//kernel/drivers/gpu/drm/qxl/qxl.ko: No such file or directory
Jun 03 09:24:26 localhost dracut[4549]: *** Including module: dm ***
Jun 03 09:24:26 localhost dracut[4549]: Skipping udev rule: 64-device-mapper.rules
Jun 03 09:24:26 localhost dracut[4549]: Skipping udev rule: 60-persistent-storage-dm.rules
Jun 03 09:24:26 localhost dracut[4549]: Skipping udev rule: 55-dm.rules
Jun 03 09:24:26 localhost dracut[4549]: *** Including module: dmsquash-live ***
Jun 03 09:24:27 localhost dracut[4549]: *** Including module: kernel-modules ***
Jun 03 09:24:45 localhost systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
Jun 03 09:24:45 localhost systemd[1]: Failed to start Crash recovery kernel arming.
Jun 03 09:24:45 localhost systemd[1]: Unit kdump.service entered failed state.

2. After step 5, system reboot auto. after reboot, no dump files in /var/crash/ of ssh server

Comment 6 Ryan Barry 2015-06-03 14:51:31 UTC
I don't see a good reason why this is failing. Restarting kdump after these steps immediately succeeds, and it works on a reboot. There's nothing in the output to indicate why it's failing. I'm going to go through the steps kdump.service takes by hand, I guess.

Comment 7 Ryan Barry 2015-06-03 17:36:35 UTC
The output of "service" is misleading.

"service" and "kdumpctl" don't agree on what's happening. On a fresh install, before configuring ssh, "service kdump status" shows it failed. "kdumpctl status" shows "kdump is operational". Restarting kdump (or ovirt-kdump) makes these agree. I'm inclined to believe that it's a service ordering problem with ovirt-kdump, though I don't know what kdump is depending on earlier in the boot process that's making it fail.

This probably also means that the default local kdump configuration is broken in 7.1, and I'll submit a patch that adds the same dependencies regular kdump.service has to ovirt-kdump.service

I'll also make sure that the kdump service gets restarted even if kdumpctl exists, since it doesn't propagate changes.

Partially, the dump is not succeeding because the core_collector options are not set correctly. This is because https://gerrit.ovirt.org/#/c/36743/ was not pulled into ovirt-3.5, and it needs to be.

The other reason is because of the bugs listed above (no augeas support for "ssh"). This works on EL7 now, but not EL6. I'll add a temporary code path to sub it out on EL7 until the EL6 Z-stream (which is verified, but waiting for release) gets pushed.

Comment 9 Fabian Deutsch 2015-09-15 10:11:34 UTC
It looks like the issue described in this bug is related to clean setups where the target folder given to kdump does not exist yet on the server side.

Tome this is a not to urgent matter, as kdump is behaving the same way as on RHEL.

Comment 10 Sandro Bonazzola 2015-10-26 12:32:42 UTC
this is an automated message. oVirt 3.6.0 RC3 has been released and GA is targeted to next week, Nov 4th 2015.
Please review this bug and if not a blocker, please postpone to a later release.
All bugs not postponed on GA release will be automatically re-targeted to

- 3.6.1 if severity >= high
- 4.0 if severity < high

Comment 11 wanghui 2015-11-24 03:26:01 UTC
Test version:
rhev-hypervisor7-7.2-20151112.1.el7ev
ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch

Test steps:
1.Install rhev-hypervisor7-7.2-20151112.1.el7ev
2.Configure network
3.Configure kdump by using SSH under Kernel Dump Page
4.Check kdump status
5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell
6.Check the kdump file in ssh server

Test result:
1. RHEV-H reboot succeed.
2. After step6, dump file is in remote ssh server. 

So this issue is fixed in ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch. Change the status to verified.

Comment 13 errata-xmlrpc 2016-03-09 14:24:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0378.html


Note You need to log in before you can comment on or make changes to this bug.