Bug 738163 - [kdump] be2net 0000:04:00.0: mccq poll timed out
Summary: [kdump] be2net 0000:04:00.0: mccq poll timed out
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Mike Christie
QA Contact: Caspar Zhang
URL:
Whiteboard:
Depends On:
Blocks: 744343 748554 1020632
TreeView+ depends on / blocked
 
Reported: 2011-09-14 08:08 UTC by Chao Ye
Modified: 2018-11-28 20:09 UTC (History)
10 users (show)

Fixed In Version: kernel-2.6.32-209.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 744343 1020632 (view as bug list)
Environment:
Last Closed: 2011-12-06 14:11:54 UTC


Attachments (Terms of Use)
Kdump with be2iscsi loaded (20.40 KB, text/plain)
2011-09-21 13:18 UTC, Ivan Vecera
no flags Details
Kdump without be2iscsi (19.17 KB, text/plain)
2011-09-21 13:20 UTC, Ivan Vecera
no flags Details
Fix for kdump failure (2.76 KB, patch)
2011-09-30 23:28 UTC, jayamohank
no flags Details | Diff
The patch that was submitted to kernel (2.76 KB, patch)
2011-10-06 00:53 UTC, jayamohank
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1530 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update 2011-12-06 01:45:35 UTC

Description Chao Ye 2011-09-14 08:08:09 UTC
Description of problem:
Kdump failed with be2net when using NFS as dump target:
========================================================
Scanning logical volumes 
  Reading all physical volumes.  This may take a while... 
  Found volume group "vg_hpbl465cg701" using metadata type lvm2 
Activating logical volumes 
  3 logical volume(s) in volume group "vg_hpbl465cg701" now active 
Free memory/Total memory (free %): 76872 / 119672 ( 64.2356 ) 
mapping eth0 to eth0 
be2net 0000:04:00.0: mccq poll timed out 
<=============================Hang

Version-Release number of selected component (if applicable):
RHEL6.2-20110907.1+kernel-2.6.32-196+kexec-tools-2.0.0-199

How reproducible:
100%

Steps to Reproduce:
1.Setup Kdump
2.Config Kdump target as NFS
3.Trigger crash
  
Actual results:
Hang

Expected results:
vmcore saved

Additional info:
https://beaker.engineering.redhat.com/recipes/268566
https://beaker.engineering.redhat.com/recipes/268568
https://beaker.engineering.redhat.com/recipes/268288

Comment 3 Qian Cai 2011-09-21 11:37:23 UTC
From developer, please try if this could be a workaround.
1) boot the system
2) rmmod be2iscsi
3) touch /etc/kdump.conf ; service kdump restart
4) perform crash

Comment 4 Ivan Vecera 2011-09-21 13:18:32 UTC
Created attachment 524195 [details]
Kdump with be2iscsi loaded

This is situation when kdump image is created with be2iscsi loaded -> kdump image also loads be2iscsi.

Result: Kernel hangs

Comment 5 Ivan Vecera 2011-09-21 13:20:22 UTC
Created attachment 524196 [details]
Kdump without be2iscsi

This is situation when kdump image is created with be2iscsi unloaded -> kdump image does not load be2iscsi.

Result: Image successfully created

Comment 6 Ivan Vecera 2011-09-21 13:28:57 UTC
CCing Mike as the problem will be with be2iscsi.

Comment 7 Mike Christie 2011-09-21 20:50:19 UTC
Jay, this was supposed to be fixed with the patches from here https://bugzilla.redhat.com/show_bug.cgi?id=688076 right? But it looks like the kernel in this bz 196 has the patches (688076 indicates the patches went into 195).

Comment 8 jayamohank 2011-09-22 01:29:02 UTC
Yes and I had verified it works.

I am wondering if there can be anything to do with NFS because there is a kernel crash in be2iscsi and Redhat was able to save the vmcore

https://bugzilla.redhat.com/show_bug.cgi?id=738934

Comment 9 Ivan Vecera 2011-09-22 11:38:38 UTC
(In reply to comment #7)
> Jay, this was supposed to be fixed with the patches from here
> https://bugzilla.redhat.com/show_bug.cgi?id=688076 right? But it looks like the
> kernel in this bz 196 has the patches (688076 indicates the patches went into
> 195).

This is kdump issue not kexec. In kexec case be2iscsi_shutdown is called, but in kdump case not... So the patches in bug #688076 cannot fix this issue.

Comment 10 Ivan Vecera 2011-09-22 11:41:04 UTC
(In reply to comment #8)
> Yes and I had verified it works.
> 
> I am wondering if there can be anything to do with NFS because there is a
> kernel crash in be2iscsi and Redhat was able to save the vmcore
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=738934

I don't know, but the vmcore was probably saved through different NIC...

Comment 11 Ivan Vecera 2011-09-22 14:26:33 UTC
After several testing rounds I found that be2iscsi could be loaded in kdump initrd, but it has to be loaded *BEFORE* be2net. When the be2iscsi is loaded in kdump environment it is in "crashdump mode" (according source code) and calls the function beiscsi_pci_soft_reset.
Jay, could it be possible that after this reset that the already initialized NIC part of the adapter becomes unresponsive?

Comment 12 jayamohank 2011-09-22 16:40:19 UTC
Great find. It is possible , I will look into the NIC part and get back

Comment 13 jayamohank 2011-09-22 23:01:31 UTC
This is happening because I am resetting the chip in kdump mode. So, the be2net is left hanging to resources the chip doesn't recognize.

I have reworked the code to do Function Level Reset ,so, only a particular function is reset. Am compiling it now, will try it out and update

Comment 14 jayamohank 2011-09-30 23:28:30 UTC
Created attachment 525853 [details]
Fix for kdump failure

This patch should fix the issue.

Now, doing function level reset instead of chip reset when in crash dump mode

Comment 15 Ivan Vecera 2011-10-03 13:28:41 UTC
Jay, good work... your patch fixes this issue. I'm reassigning this BZ to Mike as the patch is for be2iscsi and not be2net.

Comment 16 jayamohank 2011-10-06 00:53:08 UTC
Created attachment 526597 [details]
The patch that was submitted to kernel

This is the patch I submitted to the kernel list

Comment 20 Aristeu Rozanski 2011-10-13 15:21:57 UTC
Patch(es) available on kernel-2.6.32-209.el6

Comment 27 errata-xmlrpc 2011-12-06 14:11:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2011-1530.html


Note You need to log in before you can comment on or make changes to this bug.