Bug 758705 - kdump over network hangs
Summary: kdump over network hangs
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: 6.2
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 696653
TreeView+ depends on / blocked
 
Reported: 2011-11-30 14:34 UTC by Martin Wilck
Modified: 2011-12-19 09:36 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-19 09:36:05 UTC
Target Upstream Version:


Attachments (Terms of Use)
sosreport (1.86 MB, application/x-xz)
2011-11-30 14:35 UTC, Martin Wilck
no flags Details
serial log of kdump attempt (80.21 KB, text/plain)
2011-11-30 14:37 UTC, Martin Wilck
no flags Details

Description Martin Wilck 2011-11-30 14:34:19 UTC
Description of problem:
On a PRIMERGY RX600S4, we observe that kdump over NFS fails.

Version-Release number of selected component (if applicable):

kernel 2.6.32-220.el6
kexec-tools-2.0.0-209.el6

How reproducible:
always

Steps to Reproduce:
1. configure kdump via NFS
2. trigger a crash dump via alt-sysrq
  
Actual results:
See description


Expected results:
kdump successfully written


Additional info:
kdump on local disk succeeds

Comment 1 Martin Wilck 2011-11-30 14:35:00 UTC
Created attachment 538584 [details]
sosreport

Comment 2 Martin Wilck 2011-11-30 14:37:58 UTC
Created attachment 538589 [details]
serial log of kdump attempt

You can see that the last messages displayed are from FS-Cache and USB devices.

FS-Cache: Netfs 'nfs' registered for caching
input: Avocent FSC A3C40047297 as /devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0/input/input5
generic-usb 0003:0624:0327.0003: input,hidraw2: USB HID v1.10 Keyboard [Avocent FSC A3C40047297] on usb-0000:00:1d.1-1/input0
input: Avocent FSC A3C40047297 as /devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.1/input/input6
generic-usb 0003:0624:0327.0004: input,hidraw3: USB HID v1.10 Mouse [Avocent FSC A3C40047297] on usb-0000:00:1d.1-1/input1

Comment 4 Cong Wang 2011-12-01 08:12:00 UTC
Hello,

I have two questions:
1. What are the following messages in the first kernel?
2. Does this only happen when you dump over NFS?

Comment 5 Martin Wilck 2011-12-01 14:42:01 UTC
1. See dmesg file in sosreport:

input: Avocent FSC A3C40047297 as /devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.1/input/input6
generic-usb 0003:0624:0327.0004: input,hidraw3: USB HID v1.10 Mouse [Avocent FSC A3C40047297] on usb-0000:00:1d.1-1/input1
  alloc irq_desc for 40 on node -1
  alloc kstat_irqs on node -1
lpfc 0000:64:00.0: irq 40 for MSI/MSI-X
ata_piix 0000:00:1f.2: version 2.13
ata_piix 0000:00:1f.2: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ata_piix 0000:00:1f.2: setting latency timer to 64
scsi3 : ata_piix
scsi4 : ata_piix
ata1: SATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0x1880 irq 14
ata2: SATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0x1888 irq 15
lpfc 0000:64:00.0: 0:1303 Link Up Event x1 received Data: x1 x1 x8 x2 x0 x0 0
lpfc 0000:64:00.0: 0:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
lpfc 0000:64:00.0: 0:(0):2858 FLOGI failure Status:x3/x18 TMO:x0

2. Yes, so I'm told (ssh to be clarified, dump on local disk works fine).

I was just told that the NFS dump worked on another similar machine. The difference between the "good" and "bad" case is that in the "bad" case, a number of additional controllers were in the system:

1 Emulex LPe 1150
1 LSI SAS 8880 EM2
1 Intel Pro 1000 PT Quad Port
1 Intel 10 GB XF SR LAN Controller

That, together with the information under 1.) above, makes me think that lpfc may be involved. Another possible error cause is the presence of the additional LAN controllers that may be causing confusion about the LAN interface.

Comment 8 Martin Wilck 2011-12-05 11:00:26 UTC
The problem occurs also with kdump over ssh.

Comment 9 Martin Wilck 2011-12-09 17:59:40 UTC
I have done several kdump attempts on the said-to-be-affected machine, and all worked fine. Hang on.

Comment 10 Jeff Layton 2011-12-14 12:22:18 UTC
Given that this also occurs with kdump over ssh, I'm going to declare this "not a NFS bug" and reset the owner back to kernel-mgr.

Also cc'ing Rob Evers since he seems to have done some work recently on the lpfc driver.

Comment 11 Martin Wilck 2011-12-19 09:36:05 UTC
The problem isn't reproducable any more. I am sorry for bothering you.


Note You need to log in before you can comment on or make changes to this bug.