Bug 738163

Summary: [kdump] be2net 0000:04:00.0: mccq poll timed out
Product: Red Hat Enterprise Linux 6 Reporter: Chao Ye <cye>
Component: kernelAssignee: Mike Christie <mchristi>
Status: CLOSED ERRATA QA Contact: Caspar Zhang <czhang>
Severity: high Docs Contact:
Priority: high    
Version: 6.2CC: arozansk, be2net-dev, coughlan, czhang, ivecera, jayamohan.kallickal, jeder, laurie.barry, mchristi, qcai
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.32-209.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 744343 1020632 (view as bug list) Environment:
Last Closed: 2011-12-06 14:11:54 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 744343, 748554, 1020632    
Attachments:
Description Flags
Kdump with be2iscsi loaded
none
Kdump without be2iscsi
none
Fix for kdump failure
none
The patch that was submitted to kernel none

Description Chao Ye 2011-09-14 08:08:09 UTC
Description of problem:
Kdump failed with be2net when using NFS as dump target:
========================================================
Scanning logical volumes 
  Reading all physical volumes.  This may take a while... 
  Found volume group "vg_hpbl465cg701" using metadata type lvm2 
Activating logical volumes 
  3 logical volume(s) in volume group "vg_hpbl465cg701" now active 
Free memory/Total memory (free %): 76872 / 119672 ( 64.2356 ) 
mapping eth0 to eth0 
be2net 0000:04:00.0: mccq poll timed out 
<=============================Hang

Version-Release number of selected component (if applicable):
RHEL6.2-20110907.1+kernel-2.6.32-196+kexec-tools-2.0.0-199

How reproducible:
100%

Steps to Reproduce:
1.Setup Kdump
2.Config Kdump target as NFS
3.Trigger crash
  
Actual results:
Hang

Expected results:
vmcore saved

Additional info:
https://beaker.engineering.redhat.com/recipes/268566
https://beaker.engineering.redhat.com/recipes/268568
https://beaker.engineering.redhat.com/recipes/268288

Comment 3 Qian Cai 2011-09-21 11:37:23 UTC
From developer, please try if this could be a workaround.
1) boot the system
2) rmmod be2iscsi
3) touch /etc/kdump.conf ; service kdump restart
4) perform crash

Comment 4 Ivan Vecera 2011-09-21 13:18:32 UTC
Created attachment 524195 [details]
Kdump with be2iscsi loaded

This is situation when kdump image is created with be2iscsi loaded -> kdump image also loads be2iscsi.

Result: Kernel hangs

Comment 5 Ivan Vecera 2011-09-21 13:20:22 UTC
Created attachment 524196 [details]
Kdump without be2iscsi

This is situation when kdump image is created with be2iscsi unloaded -> kdump image does not load be2iscsi.

Result: Image successfully created

Comment 6 Ivan Vecera 2011-09-21 13:28:57 UTC
CCing Mike as the problem will be with be2iscsi.

Comment 7 Mike Christie 2011-09-21 20:50:19 UTC
Jay, this was supposed to be fixed with the patches from here https://bugzilla.redhat.com/show_bug.cgi?id=688076 right? But it looks like the kernel in this bz 196 has the patches (688076 indicates the patches went into 195).

Comment 8 jayamohank 2011-09-22 01:29:02 UTC
Yes and I had verified it works.

I am wondering if there can be anything to do with NFS because there is a kernel crash in be2iscsi and Redhat was able to save the vmcore

https://bugzilla.redhat.com/show_bug.cgi?id=738934

Comment 9 Ivan Vecera 2011-09-22 11:38:38 UTC
(In reply to comment #7)
> Jay, this was supposed to be fixed with the patches from here
> https://bugzilla.redhat.com/show_bug.cgi?id=688076 right? But it looks like the
> kernel in this bz 196 has the patches (688076 indicates the patches went into
> 195).

This is kdump issue not kexec. In kexec case be2iscsi_shutdown is called, but in kdump case not... So the patches in bug #688076 cannot fix this issue.

Comment 10 Ivan Vecera 2011-09-22 11:41:04 UTC
(In reply to comment #8)
> Yes and I had verified it works.
> 
> I am wondering if there can be anything to do with NFS because there is a
> kernel crash in be2iscsi and Redhat was able to save the vmcore
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=738934

I don't know, but the vmcore was probably saved through different NIC...

Comment 11 Ivan Vecera 2011-09-22 14:26:33 UTC
After several testing rounds I found that be2iscsi could be loaded in kdump initrd, but it has to be loaded *BEFORE* be2net. When the be2iscsi is loaded in kdump environment it is in "crashdump mode" (according source code) and calls the function beiscsi_pci_soft_reset.
Jay, could it be possible that after this reset that the already initialized NIC part of the adapter becomes unresponsive?

Comment 12 jayamohank 2011-09-22 16:40:19 UTC
Great find. It is possible , I will look into the NIC part and get back

Comment 13 jayamohank 2011-09-22 23:01:31 UTC
This is happening because I am resetting the chip in kdump mode. So, the be2net is left hanging to resources the chip doesn't recognize.

I have reworked the code to do Function Level Reset ,so, only a particular function is reset. Am compiling it now, will try it out and update

Comment 14 jayamohank 2011-09-30 23:28:30 UTC
Created attachment 525853 [details]
Fix for kdump failure

This patch should fix the issue.

Now, doing function level reset instead of chip reset when in crash dump mode

Comment 15 Ivan Vecera 2011-10-03 13:28:41 UTC
Jay, good work... your patch fixes this issue. I'm reassigning this BZ to Mike as the patch is for be2iscsi and not be2net.

Comment 16 jayamohank 2011-10-06 00:53:08 UTC
Created attachment 526597 [details]
The patch that was submitted to kernel

This is the patch I submitted to the kernel list

Comment 20 Aristeu Rozanski 2011-10-13 15:21:57 UTC
Patch(es) available on kernel-2.6.32-209.el6

Comment 27 errata-xmlrpc 2011-12-06 14:11:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html