Bug 758705

Summary: kdump over network hangs
Product: Red Hat Enterprise Linux 6 Reporter: Martin Wilck <martin.wilck>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.2CC: amwang, gasmith, jlayton, revers, rwheeler, steved
Target Milestone: rc   
Target Release: 6.2   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-19 09:36:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 696653    
Attachments:
Description Flags
sosreport
none
serial log of kdump attempt none

Description Martin Wilck 2011-11-30 14:34:19 UTC
Description of problem:
On a PRIMERGY RX600S4, we observe that kdump over NFS fails.

Version-Release number of selected component (if applicable):

kernel 2.6.32-220.el6
kexec-tools-2.0.0-209.el6

How reproducible:
always

Steps to Reproduce:
1. configure kdump via NFS
2. trigger a crash dump via alt-sysrq
  
Actual results:
See description


Expected results:
kdump successfully written


Additional info:
kdump on local disk succeeds

Comment 1 Martin Wilck 2011-11-30 14:35:00 UTC
Created attachment 538584 [details]
sosreport

Comment 2 Martin Wilck 2011-11-30 14:37:58 UTC
Created attachment 538589 [details]
serial log of kdump attempt

You can see that the last messages displayed are from FS-Cache and USB devices.

FS-Cache: Netfs 'nfs' registered for caching
input: Avocent FSC A3C40047297 as /devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.0/input/input5
generic-usb 0003:0624:0327.0003: input,hidraw2: USB HID v1.10 Keyboard [Avocent FSC A3C40047297] on usb-0000:00:1d.1-1/input0
input: Avocent FSC A3C40047297 as /devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.1/input/input6
generic-usb 0003:0624:0327.0004: input,hidraw3: USB HID v1.10 Mouse [Avocent FSC A3C40047297] on usb-0000:00:1d.1-1/input1

Comment 4 Cong Wang 2011-12-01 08:12:00 UTC
Hello,

I have two questions:
1. What are the following messages in the first kernel?
2. Does this only happen when you dump over NFS?

Comment 5 Martin Wilck 2011-12-01 14:42:01 UTC
1. See dmesg file in sosreport:

input: Avocent FSC A3C40047297 as /devices/pci0000:00/0000:00:1d.1/usb3/3-1/3-1:1.1/input/input6
generic-usb 0003:0624:0327.0004: input,hidraw3: USB HID v1.10 Mouse [Avocent FSC A3C40047297] on usb-0000:00:1d.1-1/input1
  alloc irq_desc for 40 on node -1
  alloc kstat_irqs on node -1
lpfc 0000:64:00.0: irq 40 for MSI/MSI-X
ata_piix 0000:00:1f.2: version 2.13
ata_piix 0000:00:1f.2: PCI INT A -> GSI 17 (level, low) -> IRQ 17
ata_piix 0000:00:1f.2: MAP [ P0 P2 P1 P3 ]
ata_piix 0000:00:1f.2: setting latency timer to 64
scsi3 : ata_piix
scsi4 : ata_piix
ata1: SATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0x1880 irq 14
ata2: SATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0x1888 irq 15
lpfc 0000:64:00.0: 0:1303 Link Up Event x1 received Data: x1 x1 x8 x2 x0 x0 0
lpfc 0000:64:00.0: 0:(0):2858 FLOGI failure Status:x3/x18 TMO:x0
lpfc 0000:64:00.0: 0:(0):2858 FLOGI failure Status:x3/x18 TMO:x0

2. Yes, so I'm told (ssh to be clarified, dump on local disk works fine).

I was just told that the NFS dump worked on another similar machine. The difference between the "good" and "bad" case is that in the "bad" case, a number of additional controllers were in the system:

1 Emulex LPe 1150
1 LSI SAS 8880 EM2
1 Intel Pro 1000 PT Quad Port
1 Intel 10 GB XF SR LAN Controller

That, together with the information under 1.) above, makes me think that lpfc may be involved. Another possible error cause is the presence of the additional LAN controllers that may be causing confusion about the LAN interface.

Comment 8 Martin Wilck 2011-12-05 11:00:26 UTC
The problem occurs also with kdump over ssh.

Comment 9 Martin Wilck 2011-12-09 17:59:40 UTC
I have done several kdump attempts on the said-to-be-affected machine, and all worked fine. Hang on.

Comment 10 Jeff Layton 2011-12-14 12:22:18 UTC
Given that this also occurs with kdump over ssh, I'm going to declare this "not a NFS bug" and reset the owner back to kernel-mgr.

Also cc'ing Rob Evers since he seems to have done some work recently on the lpfc driver.

Comment 11 Martin Wilck 2011-12-19 09:36:05 UTC
The problem isn't reproducable any more. I am sorry for bothering you.