Bug 782850

Summary: Networking from VM stalls after a few hours of runtime
Product: Red Hat Enterprise Linux 6 Reporter: anna.fischer
Component: qemu-kvmAssignee: jason wang <jasowang>
Status: CLOSED WORKSFORME QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.0CC: acathrow, bsarathy, chayang, juzhang, michen, mkenneth, mst, rhod, virt-maint, wjlhlq
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-10-18 09:39:59 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description anna.fischer 2012-01-18 16:50:36 UTC
Description of problem:
After I have run my VM for a few hours, or sometimes days, suddenly the network seems to hang. When I look inside the guest OS then I can see the TX packet counters increasing while when I look at the host OS then I don't see the RX packet counters on the VIF increasing at all. This happens with both e1000 and virtio-net NIC models. I'm using a standard tun/tap interface on the host which connects to a standard Linux bridge.
I think this must be an issue inside qemu-kvm as it looks like packets are stuck in the QEMU IO path. Or potentially they could be stuck in the tun/tap driver.

Version-Release number of selected component (if applicable):
qemu-kvm-0.12.1.2
host kernel is 2.6.32-71.el6 (64-bit)
guest kernel is 2.6.32-71.29.1

How reproducible:
Run virtual machine with virtio-net or e1000 NIC for hours or a few days. It does not need to have much network traffic going. After some time the outgoing networking from the VM will stall. Packets are still being received by the VM but it cannot send out any packets.
We can consistently reproduce the issue.

Steps to Reproduce:
1. Run virtual machine with virtio-net or e1000 NIC for hours or a few days.
2. Monitor outgoing network traffic from VM.
3. After a while you should see the network hanging.
  
Actual results:
Outgoing networking from VM stops working. Incoming traffic still works.

Expected results:
Outgoing networking should keep working. We are not even stressing the network much at all.

Additional info:

Comment 2 jason wang 2012-02-23 10:55:29 UTC
(In reply to comment #0)
> Description of problem:
> After I have run my VM for a few hours, or sometimes days, suddenly the network
> seems to hang. When I look inside the guest OS then I can see the TX packet
> counters increasing while when I look at the host OS then I don't see the RX
> packet counters on the VIF increasing at all. This happens with both e1000 and
> virtio-net NIC models. I'm using a standard tun/tap interface on the host which
> connects to a standard Linux bridge.
> I think this must be an issue inside qemu-kvm as it looks like packets are
> stuck in the QEMU IO path. Or potentially they could be stuck in the tun/tap
> driver.
> 
> Version-Release number of selected component (if applicable):
> qemu-kvm-0.12.1.2
> host kernel is 2.6.32-71.el6 (64-bit)
> guest kernel is 2.6.32-71.29.1
> 
> How reproducible:
> Run virtual machine with virtio-net or e1000 NIC for hours or a few days. It
> does not need to have much network traffic going. After some time the outgoing
> networking from the VM will stall. Packets are still being received by the VM
> but it cannot send out any packets.
> We can consistently reproduce the issue.
> 
> Steps to Reproduce:
> 1. Run virtual machine with virtio-net or e1000 NIC for hours or a few days.
> 2. Monitor outgoing network traffic from VM.
> 3. After a while you should see the network hanging.
> 
> Actual results:
> Outgoing networking from VM stops working. Incoming traffic still works.
> 
> Expected results:
> Outgoing networking should keep working. We are not even stressing the network
> much at all.
> 
> Additional info:

Hi, in order to locate the cause, could you please try to test with the newest RHEL6 with vhost_net enabled?

Thanks

Comment 3 Ronen Hod 2012-03-12 07:02:06 UTC
No reply from the reporter. It is becoming too late for RHEL6.3, moving to 6.4.

Comment 4 ang639 2012-05-23 03:49:45 UTC
I face same problem. I guess whether is virtio and virtio-blk use a same time will fire this problem.
My environment:
Host PC: centos6.2
Guest PC: centos6.2 virio(net) virtio-blk(disk)

I found another thread discuss same problem.
http://bugs.centos.org/view.php?id=5526

Comment 5 ang639 2012-05-23 03:56:00 UTC
(In reply to comment #4)
> I face same problem. I guess whether is virtio and virtio-blk use a same
> time will fire this problem.
> My environment:
> Host PC: centos6.2
> Guest PC: centos6.2 virio(net) virtio-blk(disk)
> 
> I found another thread discuss same problem.
> http://bugs.centos.org/view.php?id=5526

We create almost 70 guest VMs, and the network crash occur in 6 VMs in two weeks.
We had two eth, it only happen in one of them. 
In those error VMs, some only can receive packets and some only can send packets, restart the network it will work well again.

Thanks.

Comment 6 Ronen Hod 2012-05-23 05:51:41 UTC
Dear ang639,

Thank you for taking the time to enter a bug report with us. We do appreciate the feedback and look to use reports such as this to guide our efforts at improving our products. That being said, this bug tracking system is not a mechanism for getting support, and as such we are not able to make any guarantees as to the timeliness or suitability of a resolution.
 
If this issue is critical or in any way time sensitive, please raise a ticket through your regular Red Hat support channels to make certain that it gets the proper attention and prioritization to assure a timely resolution. 
 
For information on how to contact the Red Hat production support team, please see:
https://www.redhat.com/support/process/production/#howto

Regards, Ronen.

Comment 13 ang639 2012-07-09 00:59:57 UTC
use vhost will resolve this problem