Bug 269281
Summary: | kdump to remote dump server doesn't transfer anything | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Maarten Broekman <maarten> | ||||||||
Component: | kexec-tools | Assignee: | Neil Horman <nhorman> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 5.0 | ||||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | 2.6.18-8.1.8 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2007-09-04 17:16:48 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Maarten Broekman
2007-08-30 20:56:47 UTC
Can you please send in: a binary tcpdump taken from the server during the clients dump process the kdump initrd you are using Also, while I look those over, can you add this command: default shell to the end of your /etc/kdump.conf file, restart the service and crash the system? That should place you at a shell prompt in the initramfs after you try to transfer the vmcore to your ssh server from there you can try to manually ssh over to the remote system and record any errors that you get in the attempt. Thanks! Created attachment 183461 [details]
initrd
Created attachment 183481 [details]
tcpdump
tcpdump from before the crash through the reboot and including a service kdump
propagate after the fact (just to show some traffic).
I changed 'default reboot' to 'default shell' and I'm trying to determine why I can't type anything when in the crash shell (HP ilo2 console interface). Seems like this is a problem with the ilo2 console interface (case opened with HP on this). I set up a serial console port and I can get a shell prompt. If I try to ssh or ping my dump server, I get nothing. mapping eth0 to eth0 route: resolving dev Saving to remote location netdump.40.81 1+0 records in 1+0 records out lost connection dropping to initramfs shell exiting this shell will reboot your system root:/> ping 10.105.40.81 PING 10.105.40.81 (10.105.40.81): 56 data bytes Nothing happens for several minutes and then the system reboots. I tried this on a different system (HP BL25p G1 vs BL465c G1). The BL25p worked fine. It still shows the same "1+0 records in / out" messages, but then it actually starts transferring data (there's no indication on the console that it is transferring however). The BL465c has multiple NICs and it appears that the problem may be related to which NIC gets picked for the transfer. How can I check / change that? On the BL465c G1, I checked the network settings being used in the crash kernel and the settings used by the regular kernel. Both kernels are using the same network interface but the regular kernel is able to talk over it while the crash kernel isn't. Are you using the same NIC driver on both systems? IIRC we had a tg3 problem with some chip variants that caused problems in resetting the NIC when the module was re-inserted on a kdump boot. You may want to try booting with the RHEL5.1 beta kernel as we incorporated a tg3 update to correct that problem. I tried the -36.el5PAE kernel from the RHEL5 Beta channel (Red Hat Enterprise Linux (v. 5 for 32-bit x86) Beta). I downloaded the kernel, kernel-devel, kernel-headers, kernel-PAE, and kernel-PAE-devel RPMs. I was unable to get the corresponding debuginfo packages as the links from RHN to the debuginfo site seemed to be incorrect. I installed all 5 RPMs and rebooted. The system booted fine. I double- checked that kdump was operational (it was). Checked kdump.conf. Crashed the system. The system panic'd at this point. See new attachment. Created attachment 183721 [details]
kernel-panic
kernel panic on crash after upgrading to 2.6.18-36.el5PAE
sorry, you need to add reset_devices to KEXEC_COMMANDLINE_APPEND in /etc/sysconfig/kdump It dumped over the network but the system rebooted after 10 minutes so it didn't copy the entire dump (still have a vmcore-incomplete). That could be any number of things. Did you get an error message on the serial console prior to reboot? If so, what was it? If you didn't get any error, and the system just seemd to spontaneously reboot, that could be an ilo issue. Do you normally use any health monitoring modules from HP? Or do you have any system activity monitor configured in Ilo? It could be considering the system hung during the kdump period, and it winds up NMI-ing the box inappropriately. If you can disable Ilo completely and use a plain serial console to test with you should be able to confirm this. As you suspected, ASR was enabeld and that was the cause of the reboot. With ASR disabled, everything is working as expected now. Looks like this is resolved with 5.1. |