Bug 1095140

Summary: [3.5_7.0] configure kdump fail via local
Product: Red Hat Enterprise Virtualization Manager Reporter: haiyang,dong <hadong>
Component: ovirt-nodeAssignee: Ryan Barry <rbarry>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.0CC: aberezin, aburden, bhe, cshao, dfediuck, dyoung, fdeutsch, gklein, gouyang, hadong, huiwa, iheim, juwu, leiwang, michal.skrivanek, ruyang, yaniwang, ycui, ylavi
Target Milestone: ---Keywords: Regression
Target Release: 3.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: node
Fixed In Version: ovirt-node-3.2.1-5.el6 ovirt-node-3.2.1-5.el7 Doc Type: Known Issue
Doc Text:
Local configurations of kdump now work as expected in Red Hat Enterprise Virtualization Hypervisor 7.
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-02-11 20:56:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1097453, 1181285    
Bug Blocks: 1094719, 1164308, 1164311    
Attachments:
Description Flags
attached Screenshot for error occurred screen
none
console output from failed kdump none

Description haiyang,dong 2014-05-07 08:10:07 UTC
Created attachment 893151 [details]
attached Screenshot for error occurred screen

Description of problem:
After configured network,configure kdump by using these methods: NFS,SSH,Local,
but configre kdump fail by using remote kdump server(NFS/SSH) and Local.

Version-Release number of selected component (if applicable):
rhevh-7.0-20140424.0.iso 
ovirt-node-3.1.0-0.2.20140424gitbfdfc00.el7

How reproducible:
100%

Steps to Reproduce:
1.Install rhevh-7.0-20140424.0.iso 
2.Configure network.
3.Configure kdump by using NFS/SSH/Local under Kernel Dump Page.
4.Check kdump status
Actual results:
1. kdump fail to work by using remote kdump server(NFS/SSH) and Local.
2. After apply it,Pop:"KDump configuration failed,location unreachable"(error occurred screen.png)

Expected results:
kdump can work by using remote kdump server(NFS/SSH) and Local.

Additional info:

Comment 2 Fabian Deutsch 2014-07-14 14:21:33 UTC
Moving to 3.5 as 7.0 is affected.

Comment 3 Fabian Deutsch 2014-07-24 16:02:58 UTC
This is a mass change, moving bugs of merged patches into MODIFIED.

Please correct the state, if you think that the move was not justified.

Comment 4 haiyang,dong 2014-08-12 06:23:59 UTC
Test version:
rhev-hypervisor7-7.0-20140807.0.iso
ovirt-node-3.1.0-0.6.20140731git2c8e71f.el7.noarch

configure kdump still failed via local/ssh/nfs 

[root@dhcp-9-33 admin]# service kdump restart
Redirecting to /bin/systemctl restart  kdump.service
Job for kdump.service failed. See 'systemctl status kdump.service' and 'journalctl -xn' for details.
[root@dhcp-9-33 admin]# service kdump status
Redirecting to /bin/systemctl status  kdump.service
kdump.service - Crash recovery kernel arming
   Loaded: loaded (/usr/lib/systemd/system/kdump.service; disabled)
   Active: failed (Result: exit-code) since Tue 2014-08-12 06:17:07 UTC; 16s ago
  Process: 2667 ExecStart=/usr/bin/kdumpctl start (code=exited, status=1/FAILURE)
 Main PID: 2667 (code=exited, status=1/FAILURE)

Aug 12 06:17:06 dhcp-9-33.nay.redhat.com systemd[1]: Starting Crash recovery kernel arming...
Aug 12 06:17:07 dhcp-9-33.nay.redhat.com kdumpctl[2667]: Error: /boot-kdump/vmlinuz-3.10.0-123.6.3.el7.x86_64 not found.
Aug 12 06:17:07 dhcp-9-33.nay.redhat.com kdumpctl[2667]: Starting kdump: [FAILED]
Aug 12 06:17:07 dhcp-9-33.nay.redhat.com systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
Aug 12 06:17:07 dhcp-9-33.nay.redhat.com systemd[1]: Failed to start Crash recovery kernel arming.
Aug 12 06:17:07 dhcp-9-33.nay.redhat.com systemd[1]: Unit kdump.service entered failed state.

so need to re-assigned this bug again

Comment 5 Ryan Barry 2014-08-12 13:05:40 UTC
We're waiting on bz#1097453 in order to resolve another problem blocking kdump, though the linked patch is also necessary.

Comment 6 Fabian Deutsch 2014-08-13 13:46:32 UTC
Setting this back to modified as our part is fixed, and we are just waiting for a platform fix.

Comment 7 wanghui 2014-09-28 06:08:25 UTC
Test version:
rhev-hypervisor7-7.0-20140926.0.iso
ovirt-node-3.1.0-0.17.20140925git29c3403.el7.noarch

This issue is still exist in rhev-hypervisor7-7.0-20140926.0.iso. And kdump service is not up default. So change the status from ON_QA to Assigned.

Comment 8 Ying Cui 2014-09-28 06:20:16 UTC
Just additional info for comment 7, the bz #1097453 in comment 5 is cloned to z-stream bz #1130112(CLOSED ERRATA),
so we need to check node side.

Comment 9 haiyang,dong 2014-10-22 09:40:51 UTC
Test version:
rhev-hypervisor7-7.0-20141006.0.el7ev
ovirt-node-3.1.0-0.20.20141006gitc421e04.el7.noarch

Test steps:
1. Although configure kdump via local success, 
but Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell,
system halted and didn't reboot auto. after reboot manual, no dump files in /data/core/.

2.  configure kdump via ssh -FAILED

An error occurred:                                                                             
KDump configuration failed, location unreachable. Previous configuration was restored.  

3.  configure kdump via nfs -FAILED

An error occurred:                                                                             
KDump configuration failed, location unreachable. Previous configuration was restored.  

so need assigned this bug again

Comment 10 Ryan Barry 2014-10-22 14:12:40 UTC
The patches never got merged. Not sure how this made it to MODIFIED. Moving back to POST.

Comment 11 Julie 2014-11-26 08:41:29 UTC
Doc text added for beta 5 release as per engineering request. Please update the doc text for GA or just simply set the 'requires_release_note' flag to - and the doc text would be excluded from the GA release notes. 

If you don't set the 'requires_release_note' flag to - for GA, and the doc text gets pulled in for GA release notes, I will not go into every bug to manually remove the text. Beware of the consequences.

Comment 18 haiyang,dong 2015-01-08 11:55:50 UTC
Test version:
rhev-hypervisor7-7.0-20150106.0.el7ev
ovirt-node-3.1.0-0.40.20150105git69f34a6.el7.noarch

Test steps:
1. Although configure kdump via local success, 
but Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell,
system halted and didn't reboot auto. after reboot manual, no dump files in /data/core/.

2.  Although configure kdump via ssh success,
but Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell,
system halted and didn't reboot auto. after reboot manual, no dump files in /var/crash/ of ssh server

3.  configure kdump via nfs -FAILED

An error occurred:                                                                             
KDump configuration failed, location unreachable. Previous configuration was restored.  

so need assigned this bug again

Comment 19 Michal Skrivanek 2015-01-08 15:02:33 UTC
to justify 3.5 blocker....
is this a critical functionality?
is this really an Urgent severity?

Comment 20 Fabian Deutsch 2015-01-08 15:15:26 UTC
We should at least be capable of configuring kdump to store the core dumps locally.

Comment 21 Fabian Deutsch 2015-01-08 15:25:49 UTC
This bug is to broad: It covers failed configuration of using ssh, nfs and local as a target.

To get a more detailed view we should split the bug up, because technically it looks like theer are different causes for the failures.

I'd suggest to keep this bug for local configuration, and open new bugs to cover ssh and nfs.

This will also help to identify the bugs we see as a blocker.

For me only the failed local configuration is a blocker.

Comment 22 Fabian Deutsch 2015-01-08 15:27:20 UTC
The symptoms in comment 18 look like beeing caused by bug 1175967

Comment 23 Fabian Deutsch 2015-01-08 16:05:34 UTC
http://gerrit.ovirt.org/36312 is fixing the local configuration

Comment 24 Ryan Barry 2015-01-08 18:25:15 UTC
> 3.  configure kdump via nfs -FAILED
> 
> An error occurred:                                                          
> 
> KDump configuration failed, location unreachable. Previous configuration was
> restored.  
> 
> so need assigned this bug again

I'm not able to reproduce this, kdump configuration over NFS on rhev-hypervisor7-7.0-20150106.0.el7ev succeeds when I test.

Can you please provide logs and exact steps to reproduce, preferably with the TUI in debug mode?

Comment 25 Fabian Deutsch 2015-01-08 20:15:15 UTC
Clarification: The configuration is corerct now. But the core dump still fails, because kdump does not recognize the root=live:… kernel argument.

From the node side we are good for nopw, we just need to teach kdump how to work with node.

Comment 26 Ryan Barry 2015-01-08 23:28:57 UTC
Wang -

There appear to be a couple of problems here, some we can work around, but at least one that I'm not sure how to work around.

mkdumprd on el7 sets --hostonly by default, with no apparent way to override it, which leaves dmsquash-live out of dracut (rhev-h runs as a squashfs image, and we need this, otherwise dracut doesn't know how to handle root=live...).

If I remove this by hand so an initramfs which includes dmsquash-live is included and trigger a dump, it waits for an incredibly long time unpacking the initramfs. Long enough that it would appear to be a hang, while taking 100% CPU for the duration.

The longest I've waited for this is 30 minutes, and it's never proceeded beyond that point (despite bumping the memory available to the VM). What could be happening here? Any suggestions for debugging this? The kdump kernel appears to be re-execed with mostly the same options, but neither "verbose" nor any other options I've tried (including "debug") show me anything useful.

Any suggestions?

Comment 27 WANG Chao 2015-01-09 03:09:07 UTC
[CCing other kdump developers here]

Hi, Ryan

If I understand correctly, you're using a particular /etc/kdump.conf to enable kdump service in rhev-h environment. We set --hostonly because we need to keep initrd image as minimal as possible.

The first problem is dmsquash-live isn't included by default in "hostonly" mode. You can resolve this by editing /etc/kdump.conf as the following:

# vim /etc/kdump.conf
dracut_args --add dmsquash-live

The next problem is kernel hang at unpacking initramfs. Could you attach the console log? How do you configure crashkernel=X? How much is your custom kdump initrd in size?

If you remove '--hostonly', I'm afraid the initramfs size will be bloating and kdump kernel may be hang at unpacking initramfs because memory is running out.

Thanks
WANG Chao

Comment 28 haiyang,dong 2015-01-09 03:31:18 UTC
(In reply to Fabian Deutsch from comment #21)
> This bug is to broad: It covers failed configuration of using ssh, nfs and
> local as a target.
> 
> To get a more detailed view we should split the bug up, because technically
> it looks like theer are different causes for the failures.
> 
> I'd suggest to keep this bug for local configuration, and open new bugs to
> cover ssh and nfs.

new bug to cover configure kdump via ssh failed:
https://bugzilla.redhat.com/show_bug.cgi?id=1180371

new bug to cover configure kdump via nfs failed:
https://bugzilla.redhat.com/show_bug.cgi?id=1180377

> 
> This will also help to identify the bugs we see as a blocker.
> 
> For me only the failed local configuration is a blocker.

Comment 29 Ryan Barry 2015-01-09 03:43:02 UTC
(In reply to WANG Chao from comment #27)
> [CCing other kdump developers here]
> 
> Hi, Ryan
> 
> If I understand correctly, you're using a particular /etc/kdump.conf to
> enable kdump service in rhev-h environment. We set --hostonly because we
> need to keep initrd image as minimal as possible.
The kdump.conf included in rhev-h is pretty stock, at least. The only real difference is that we're dumping to a LV instead of /var/crash, but that's never been a problem before (in EL6). We don't make a ton of changes.
> 
> The first problem is dmsquash-live isn't included by default in "hostonly"
> mode. You can resolve this by editing /etc/kdump.conf as the following:
> 
> # vim /etc/kdump.conf
> dracut_args --add dmsquash-live
Unfortunately, to my knowledge, dmsquash-live and hostonly are orthogonal. Even adding it to dracut_args (or dracut.conf), dracut (via mkdumprd) spits out messages indicating that it can't be found or included unless hostonly is removed.

If you're aware of a way to run in hostonly mode *and* include dmsquash-live, I would be happy to do that.

In theory, we could patch out module-setup for dmsquash-live or file an RFE allowing for this.
> 
> The next problem is kernel hang at unpacking initramfs. Could you attach the
> console log? How do you configure crashkernel=X? How much is your custom
> kdump initrd in size?
It's 18M with --hostonly. 

18M if I remove the check for hostonly from dmsquash-live. 

33M without --hostonly.

We don't touch crashkernel, so it's likely the default for EL7.
> 
> If you remove '--hostonly', I'm afraid the initramfs size will be bloating
> and kdump kernel may be hang at unpacking initramfs because memory is
> running out.
This looks like it's correct, though surprising.

Using an initramfs with --hostonly *and* with dmsquash-live (by patching the check) successfully unpacks the initramfs and works.

Thanks for the help.

Comment 30 Fabian Deutsch 2015-01-09 09:04:04 UTC
Just a note, we are setting crashkernel=128M on the kernel commandline.

Comment 31 Ryan Barry 2015-01-09 17:20:38 UTC
Created attachment 978298 [details]
console output from failed kdump

Comment 33 haiyang,dong 2015-01-15 03:03:28 UTC
Test version:
rhev-hypervisor7-7.0-20150114.0.iso
ovirt-node-3.2.1-4.el7.noarch

Test steps:
1.Install rhev-hypervisor7-7.0-20150114.0.iso
2.Configure network.
3.Configure kdump by using NFS/SSH/Local under Kernel Dump Page.
4.Check kdump status
5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell.

Test result:
After step4, check kdump status was up.
After step5, check kdump functions works well.system reboot auto. after reboot sucess, found dump files in /data/core/ of rhevh host.

so this bug has been fixed, after the status changed into "ON_QA", i will verify this bug.

Comment 34 haiyang,dong 2015-01-15 05:30:51 UTC
According to comment 33, clean this needinfo.

Comment 35 haiyang,dong 2015-01-15 05:44:34 UTC
updated comment 33, the test step 3 should be Configure kdump by using Local under Kernel Dump Page, didn't contain nfs/ssh types.

Comment 36 haiyang,dong 2015-01-28 09:32:52 UTC
Test version:
rhev-hypervisor7-7.0-20150123.2.iso
ovirt-node-3.2.1-6.el7.noarch

Test steps:
1.Install rhev-hypervisor7-7.0-20150123.2.iso
2.Configure network.
3.Configure kdump by using Local under Kernel Dump Page.
4.Check kdump status
5.Trigger kernel dump with command '$>echo c > /proc/sysrq-trigger' in shell.

Test result:
After step4, check kdump status was up.
After step5, check kdump functions works well.system reboot auto. after reboot sucess, found dump files in /data/core/ of rhevh host.

so this bug has been fixed, changed the status into "VERIFIED".

Comment 38 errata-xmlrpc 2015-02-11 20:56:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2015-0160.html