Bug 684268

Summary: virtio_net: missing schedule on oom [rhel-6.0.z]
Product: Red Hat Enterprise Linux 6 Reporter: RHEL Program Management <pm-rhel>
Component: kernelAssignee: Frantisek Hrbata <fhrbata>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: unspecified Docs Contact:
Priority: urgent    
Version: 6.0CC: arozansk, chayang, dhoward, jasowang, khong, mst, ndai, plyons, pm-eus, tburke
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-2.6.32-71.21.1.el6 Doc Type: Bug Fix
Doc Text:
Intensive usage of resources on a guest lead to a failure of networking on that guest: packets could no longer be received. The failure occurred when a DMA (Direct Memory Access) ring was consumed before NAPI (New API; an interface for networking devices which makes use of interrupt mitigation techniques) was enabled which resulted in a failure to receive the next interrupt request. The regular interrupt handler was not affected in this situation (because it can process packets in-place), however, the OOM (Out Of Memory) handler did not detect the aforementioned situation and caused networking to fail. With this update, NAPI is subsequently scheduled for each napi_enable operation; thus, networking no longer fails under the aforementioned circumstances.
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-04-08 02:59:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 676579    
Bug Blocks:    

Description RHEL Program Management 2011-03-11 16:20:48 UTC
This bug has been copied from bug #676579 and has been proposed
to be backported to 6.0 z-stream (EUS).

Comment 4 Chao Yang 2011-03-28 02:15:53 UTC
Reproduced on rhel6.0 guest with kernel:2.6.32-71.el6.x86_64.
Steps:
1) boot guest with 512M mem and virtio net, ping remote, network works fine.
2) run netserver inside guest.
3) on host, launch 2000 netperf clients in background to stress netserver.
#! /bin/sh
ip=$guest_ip
i=0
while [ $i -lt 2000 ]
do
netperf -H $ip -l 300 &
i=`expr $i + 1`
echo launch Client-No.$i 
done
4) ping guest
Actual Result: network lost, fail to ping remote host.
CLI:
/usr/libexec/qemu-kvm -M rhel6.0.0 -enable-kvm -m 512 -smp 2 -name rhel6.0 -uuid `uuidgen` -rtc base=localtime,clock=vm,driftfix=slew -no-kvm-pit-reinjection -boot c -drive file=/root/RHEL-Server-6.0-64.qcow2,if=none,id=drive-virtio-0-0,media=disk,format=qcow2,cache=none -device virtio-blk-pci,drive=drive-virtio-0-0,id=virt0-0-0 -netdev tap,id=hostnet1 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:40:01:31:e3 -usb -device usb-tablet,id=input1 -vnc :0 -monitor stdio -balloon none

-------------------------------------------------------------------------
Verified on guest kernel-2.6.32-71.23.1.el6.x86_64.rpm with same steps&CLI above, after stressing netserver, network in guest still works fine and can ping remote host.

Comment 5 errata-xmlrpc 2011-04-08 02:59:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0421.html

Comment 6 Martin Prpič 2011-04-12 12:49:27 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Intensive usage of resources on a guest lead to a failure of networking on that guest: packets could no longer be received. The failure occurred when a DMA (Direct Memory Access) ring was consumed before NAPI (New API; an interface for networking devices which makes use of interrupt mitigation techniques) was enabled which resulted in a failure to receive the next interrupt request. The regular interrupt handler was not affected in this situation (because it can process packets in-place), however, the OOM (Out Of Memory) handler did not detect the aforementioned situation and caused networking to fail. With this update, NAPI is subsequently scheduled for each napi_enable operation; thus, networking no longer fails under the aforementioned circumstances.