Bug 719820

Summary: Netdump sometimes hangs up on RHEL4.8 guest which uses e1000 emulation device.
Product: Red Hat Enterprise Linux 4 Reporter: Mark Wu <dwu>
Component: kernelAssignee: jason wang <jasowang>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: urgent    
Version: 4.9CC: atzhang, bsarathy, byount, dhoward, fan-wxa, jpallich, juzhang, kzhang, mkenneth, mkhusid, moshiro, myamazak, nmurray, rhod, sforsber, tburke, virt-maint, yoguma
Target Milestone: rc   
Target Release: 4.9   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.9-104.EL Doc Type: Bug Fix
Doc Text:
A race condition between the e1000 irq handler and e1000 netpoll handler could lead to a deadlock while executing the netdump utility. The disable_irq_nosync() function is now called to eliminate the potential deadlock situation.
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-05-29 10:32:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 658636, 721216, 724925, 727267, 747123, 756082    
Attachments:
Description Flags
tcpdump captured from the tap interface of guest on host
none
strace log of qemu-kvm none

Comment 20 RHEL Program Management 2011-12-27 05:29:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 21 jason wang 2012-01-09 03:25:24 UTC
*** Bug 724925 has been marked as a duplicate of this bug. ***

Comment 28 Weibing Zhang 2012-03-16 15:48:36 UTC
Reproduced on RHEL6.1 with kernel-2.6.32-131.el6.

[root@hp-dl585g7-02 ~]# virsh list
 Id    Name                           State
----------------------------------------------------
 4     test2                          running
 5     test                           running
[root@localhost ~]# ethtool -i eth0
driver: e1000
version: 7.3.20-k3-NAPI
firmware-version: N/A
bus-info: 0000:00:03.0

Server Configuration:
Verify that the netdump server is installed.
Change the password for the "netdump" user: passwd netdump
Enable the netdump server: chkconfig netdump-server on
Start the netdump server: service netdump-server start
Client Configuration:
Verify that the netdump client is installed.
Edit /etc/sysconfig/netdump and add the following line:
NETDUMPADDR=10.66.86.164 
Enter the following command and give the netdump password when prompted: service netdump propagate
Enable the netdump client: chkconfig netdump on
Start the netdump client: service netdump start.

Add "echo c > /proc/sysrq-trigger" to /etc/rc.local to make the guset "test" reboot in a loop.
Got 1 hang in 80 times and the vmcore is not collected as expected.

set qa_ack+.

Comment 31 Eliska Slobodova 2012-05-22 16:35:49 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A race condition between the e1000 irq handler and e1000 netpoll handler could lead to a deadlock while executing the netdump utility. The disable_irq_nosync() function is now called to eliminate the potential deadlock situation.

Comment 34 errata-xmlrpc 2012-05-29 10:32:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0695.html