Bug 554078

Summary:

Lost the network in a KVM VM on top of 5.4

Product:

Red Hat Enterprise Linux 5

Reporter:

Herbert Xu <herbert.xu>

Component:

kernel

Assignee:

Herbert Xu <herbert.xu>

Status:

CLOSED ERRATA

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

medium

Docs Contact:

Priority:

high

Version:

5.4

CC:

bruno.cornec, cward, david.jericho, herbert.xu, jean-marc.andre, khong, llim, markmc, mwagner, nsprei, orenault, riek, syeghiay, tburke, todayyang, virt-maint, ykaul

Target Milestone:

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

524651

Clones:

589766 589897 (view as bug list)

Environment:

Last Closed:

2010-03-30 07:15:57 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

524651

Bug Blocks:

528898, 589766, 589897

Attachments:

Description	Flags
virtio: net refill on out-of-memory	none
socket test programs (srv.c clt.c)	none
virtio: net refill on out-of-memory	none
virtio: net refill on out-of-memory	none

Description Herbert Xu 2010-01-10 11:26:38 UTC

+++ This bug was initially created as a clone of Bug #524651 +++

Created an attachment (id=361965)
SOSreport Hyperviseur

Description of problem:
HP is trying their application ( OCMP ) within a 5.4 VM using KVM. The application is using udp as the network protocol. They are using virtio for the network driver. Unfortunately after a while ( eg 10min )when the network is heavily stressed, the network drop. There is no logs available within the VM nor the hypervisor. If you try to restart the network it will failed. The only way to get back the network is to remove the module ( virtio_net ) and to reload it.

Version-Release number of selected component (if applicable):
Hypervisor is RHEL5.4 fully updated
VM is RHEL 5.3 ( we have tryed with the kernel of 5.4 and 5.3 and we have got the same behaviour )

How reproducible:
Always

Steps to Reproduce:
1. Start the VM
2. Stress test the VM and wait
3.
  
Actual results:
The network stop

Expected results:
The network should be able to carry on

Additional info:
I have grabbed an sosreport for the Hypervisor and VM. The Hypervisor is tmobilehv1, the VM is tmobileocmp1

--- Additional comment from orenault on 2009-09-21 11:49:17 EDT ---

Created an attachment (id=361966)
SOSreport VM

--- Additional comment from markmc on 2009-09-21 12:19:21 EDT ---

Here's the similar report from upstream:

  http://www.mail-archive.com/kvm@vger.kernel.org/msg06774.html

The closest I got to figuring it out was here:

  http://www.mail-archive.com/kvm@vger.kernel.org/msg07006.html

Does doing this in the guest fix the issue too?

  $> ip link set eth0 down
  $> ip link set eth0 up

--- Additional comment from bruno.cornec on 2009-09-22 10:08:40 EDT ---

We tried /etc/init.d/network restart without efect.
However rmmod virtio_net and then restarting the network works.

tcpdump in the guest shows nothing.
We also have overruns on the interface.

We will try your command to see, when the driver is hung, what happens.

It really seems the virtio_net driver is completely out of order at that moment.
We are trying to reproduce the issue with a less aplpication dependant context.

If you want us to try a debug kernel or newer one, let us know.

--- Additional comment from jean-marc.andre on 2009-09-22 13:18:03 EDT ---

We moved the virtual machines on another server and we installed a 4 ports Intel Gigabits network card (e1000e module).

Still using the virtio network driver, it seems global network performance is really better and network bandwidth more stable. 
It was not the case with the previous setup using bnx2x module (burst of packets one second and almost nothing the next second).

But we still loose network connectivity with the VM even if the load was far more greater and the test lasted almost 1 hour this time.

On the failed VM, the behavior is also different:
It is not possible to ping the VM but multicast and broadcast packets are received.

I also tried to ping an external server. I received "Destination Host unreachable" for some time ans then a very strange message:

"ping: sendmsg: No buffer space available"

The suggested commands did not restore network connectivity:
  $> ip link set eth0 down
  $> ip link set eth0 up

--- Additional comment from jean-marc.andre on 2009-09-29 13:22:08 EDT ---

Created an attachment (id=363042)
strace ping command

The result of a strace command when a VM's network hangs.

If I increase the values in /proc/sys/net/core/wmem_*, the 'No buffer space available' message disappears for some time (the time the buffer fills up again I guess) and then comes back.

--- Additional comment from markmc on 2009-10-05 09:26:24 EDT ---

Some questions:

  - How is networking configured in the host? e.g. in the sosreport, I don't
    see anything bridging the guest to the 10.3.248.0/21 network

  - How is the guest launched? e.g. 'virsh dumpxml $guest' and the contents
    of /var/log/libvirt/qemu/$guest.log

  - Have you confirmed this only happens with virtio, e.g. have you tried
    model=e1000 ?

  - The "No buffer space" message is from running ping in the guest? I *think*
    that can be ignored as merely a symptom of the virtio interface not sending
    any packets

  - /proc/net/snmp in the host and guest might be interesting. As might 
    'tc -s qdisc' in the host

--- Additional comment from herbert.xu on 2009-10-05 09:45:02 EDT ---

It would be useful to strace qemu to see if we're hitting the limit on the tun socket.  Thanks!

--- Additional comment from jean-marc.andre on 2009-10-05 11:42:11 EDT ---

(In reply to comment #6)
> Some questions:
> 
>   - How is networking configured in the host? e.g. in the sosreport, I don't
>     see anything bridging the guest to the 10.3.248.0/21 network

ocmp1 is a bridge. eth7 is connected to that bridge and to 10.3.248.0/21. The guest is also connected to that bridge

> 
>   - How is the guest launched? e.g. 'virsh dumpxml $guest' and the contents
>     of /var/log/libvirt/qemu/$guest.log

[root@tmobilehv ~]# virsh dumpxml OCMP1
<domain type='kvm'>
  <name>OCMP1</name>
  <uuid>85fda6b8-3f79-f403-471c-8c3c860da2ba</uuid>
  <memory>8388608</memory>
  <currentMemory>8388608</currentMemory>
  <vcpu>8</vcpu>
  <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <source file='/var/lib/libvirt/images/OCMP1.img'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <interface type='bridge'>
      <mac address='54:52:00:22:60:30'/>
      <source bridge='ocmp1'/>
      <model type='virtio'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
  </devices>
</domain>

I cannot provide /var/log/libvirt/qemu/$guest.log right now. The guest has been restarted since then. I'll provide it later when it hangs again.

> 
>   - Have you confirmed this only happens with virtio, e.g. have you tried
>     model=e1000 ?

I does not happen with e1000 driver. We ran the exact same test and the VM was still up after 2 days.

> 
>   - The "No buffer space" message is from running ping in the guest? I *think*
>     that can be ignored as merely a symptom of the virtio interface not sending
>     any packets
> 
>   - /proc/net/snmp in the host and guest might be interesting. As might 
>     'tc -s qdisc' in the host  

Same as for /var/log/libvirt/qemu/$guest.log. I'll provide them when it hangs again.

--- Additional comment from jean-marc.andre on 2009-10-05 11:52:49 EDT ---

(In reply to comment #7)
> It would be useful to strace qemu to see if we're hitting the limit on the tun
> socket.  Thanks! 

Do you think we can limit strace to *network* system calls?
The average network traffic is around 30MB/s. The capture file will be huge.

--- Additional comment from herbert.xu on 2009-10-05 20:02:41 EDT ---

Well you can limit it to read/write/select.  But since strace only shows the first few bytes of each call, 30MB/s shouldn't be an issue.

--- Additional comment from jean-marc.andre on 2009-10-30 09:34:33 EDT ---

It hanged again.
Here is the content of the guest /proc/net/snmp:

Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
Ip: 2 64 2972397 0 1 0 0 0 2964429 2578209 1645 0 0 8912 4456 0 100 34 288
Icmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps
Icmp: 387 140 387 0 0 0 0 0 0 0 0 0 0 703 0 694 0 0 0 0 9 0 0 0 0 0
IcmpMsg: InType3 OutType3 OutType8
IcmpMsg: 387 694 9
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
Tcp: 1 200 120000 -1 2109 1685 629 4 11 812836 384977 1254 0 644
Udp: InDatagrams NoPorts InErrors OutDatagrams
Udp: 2120817 25358 1229 2185641

The content of the host /proc/net/snmp:

Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
Ip: 1 64 387348 0 16 0 0 0 379356 186067 0 0 1 10093 5002 1 5002 0 10092
Icmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps
Icmp: 31 0 31 0 0 0 0 0 0 0 0 0 0 31 0 31 0 0 0 0 0 0 0 0 0 0
IcmpMsg: InType3 OutType3
IcmpMsg: 31 31
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
Tcp: 1 200 120000 -1 13 111 3 0 9 374440 182233 3314 0 3
Udp: InDatagrams NoPorts InErrors OutDatagrams
Udp: 457 21 0 495

And the output of 'tc -s qdisc' on the host:

[root@tmobilehv ~]# tc -s qdisc
qdisc pfifo_fast 0: dev eth0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1015135694 bytes 167399 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
qdisc pfifo_fast 0: dev eth1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 555090013 bytes 2330497 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
qdisc pfifo_fast 0: dev vnet0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1483696670 bytes 2900326 pkt (dropped 354126, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 

vnet0 is the tap interface assigned to the VM. vnet0 and eth1 are connected to the same bridge.

--- Additional comment from jean-marc.andre on 2009-10-30 09:35:57 EDT ---

Created an attachment (id=366797)
libvirt log

--- Additional comment from jean-marc.andre on 2009-10-30 09:54:10 EDT ---

Created an attachment (id=366799)
Output of strace on qemu

Here is the command I ran on the host: strace -e read,write,select -p 10978 -o virtio_hangs

--- Additional comment from herbert.xu on 2009-10-30 10:43:15 EDT ---

OK, the strace shows that there was no attempt to write to tuntap at all so we can rule out the tun driver.

--- Additional comment from herbert.xu on 2009-12-01 02:57:43 EDT ---

Please rebuild the virtio_net module with DEBUG defined.  That way we may get some clue as to what state the guest is in when this happens.

Also if you can arrange remote access for me it would really help in resolving this.

Thanks!

--- Additional comment from orenault on 2009-12-01 04:48:07 EDT ---

Hi Herbert,

I have requested access ( via email ). You should get a reply with your access / info on how to connect.

Could you provide me a how to rebuild virtio_net with DEBUG ?

Regards
Olivier

--- Additional comment from herbert.xu on 2009-12-01 07:44:54 EDT ---

Thanks Olivier!

To build virtio_net with DEBUG, you need to get the srpm of the same version that's being used on the machine, apply the following patch, and then build the kernel.

If you've already built the kernel then make SUBDIRS=drivers/net would be sufficient since we only need the virtio_net module.

--- Additional comment from herbert.xu on 2009-12-01 07:45:49 EDT ---

Created an attachment (id=375048)
Enable debugging in virtio_net

--- Additional comment from herbert.xu on 2010-01-07 07:42:46 EDT ---

Created an attachment (id=382219)
virtio_net: Fix tx wakeup race condition

virtio_net: Fix tx wakeup race condition

We free completed TX requests in xmit_tasklet but do not wake
the queue.  This creates a race condition whereupon the queue
may be emptied by xmit_tasklet and yet it remains in the stopped
state.

This patch fixes this by waking the queue after freeing packets
in xmit_tasklet.

Signed-off-by: Herbert Xu <herbert.org.au>

--- Additional comment from dlaor on 2010-01-07 09:04:55 EDT ---

Changing into the kernel component

Comment 1 Herbert Xu 2010-01-10 11:34:51 UTC

This bug will be used to deal with the RX component of the problem while the original will be for TX only.

Comment 2 RHEL Program Management 2010-01-10 12:11:32 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Herbert Xu 2010-01-24 00:55:07 UTC

Created attachment 386393 [details]
virtio: net refill on out-of-memory

This is a back-port of

    virtio: net refill on out-of-memory
    
    If we run out of memory, use keventd to fill the buffer.  There's a
    report of this happening: "Page allocation failures in guest",
    Message-ID: <20090713115158.0a4892b0.eu>
    
    Signed-off-by: Rusty Russell <rusty.au>
    Signed-off-by: David S. Miller <davem>

Comment 4 Keqin Hong 2010-02-03 10:39:17 UTC

Created attachment 388497 [details]
socket test programs (srv.c clt.c)

srv.c -- server end
clt.c -- client end
README -- read me file
srv, clt -- binary executable on x86_64

Comment 5 Keqin Hong 2010-02-03 11:08:28 UTC

Summary:
Guest still lost network with virtio net, while worked fine with e1000 net . 

Steps:
1. boot a guest with the CLI listed below.
2. check your guest network using ifconfig and make sure it works well (here we mark guest ip as $guest_ip).
3. run ./srv on guest (comment 4 attachment)
4. run multiple "./stress.sh $guestip" from elsewhere, i.e. on other hosts, till no more connections could be established.
note ./stress.sh calls clt program, trying to establish 500 connections to srv.
5. ping $guest_ip to see the network status.
addtionally:
6. run ./clear_clt.sh on client ends to kill all "./clt" processes.
7. ping $guest_ip again to see result. (if possible, you may kill all ./srv processes inside guest, and ping again)

Expected results:
after step 5 and step 7, guest network still keeps live.

CLI:
/usr/libexec/qemu-kvm -m 768M -smp 2 -drive file=RHEL5.4-64-4k.qcow2,if=virtio,cache=off,boot=on -net nic,model=virtio,vlan=1,macaddr=76:00:40:3F:20:10 -net tap,vlan=1,script=/etc/qemu-ifup -boot c -uuid 17644ecc-d3a1-4d3c-a386-12daf50015f1 -usbdevice tablet -no-hpet -rtc-td-hack -no-kvm-pit-reinjection -monitor stdio -notify all -cpu qemu64,+sse2 -balloon none -startdate now -vnc :1 -name 176-guest1

Actual results:
host: 2.6.18-164.10.1, kvm-83-105.el5_4.19
--------------------------------------------------------------------
guest                  | net model  |  connections | network status 
2.6.18-164.11.1.el5PAE | virtio     |  2187        | lost
2.6.18-185.el5 x86_64  | virtio     |  1220        | lost
--------------------------------------------------------------------
note: lost network could be brought up again by ifdown, ifup.

host: 2.6.18-186 x86_64, kvm-83-155.el5
--------------------------------------------------------------------
guest                   | net model  |  connections | network status 
2.6.18-185.el5 x86_64   | virtio     |  1178        | lost
2.6.18-185.el5 x86_64   | e1000      |  3574        | ok
--------------------------------------------------------------------
Note with e1000, even if we could not make more connections to the "srv" program running inside guest, we could still ping the guest.
[root@dhcp-91-175 ~]# ping 10.66.91.51
PING 10.66.91.51 (10.66.91.51) 56(84) bytes of data.
64 bytes from 10.66.91.51: icmp_seq=1 ttl=64 time=58.1 ms
64 bytes from 10.66.91.51: icmp_seq=2 ttl=64 time=33.4 ms
64 bytes from 10.66.91.51: icmp_seq=3 ttl=64 time=21.0 ms
64 bytes from 10.66.91.51: icmp_seq=4 ttl=64 time=1.40 ms
64 bytes from 10.66.91.51: icmp_seq=5 ttl=64 time=91.5 ms
64 bytes from 10.66.91.51: icmp_seq=6 ttl=64 time=0.341 ms
64 bytes from 10.66.91.51: icmp_seq=7 ttl=64 time=0.807 ms
...

Comment 6 Herbert Xu 2010-02-03 11:18:58 UTC

Thanks for testing.  Please let me know whether this problem still exists after applying the patch in this bugzilla entry plus the patch in the bug from which this is cloned.

Comment 7 Dor Laor 2010-02-16 15:26:24 UTC

(In reply to comment #5)
> Summary:
> Guest still lost network with virtio net, while worked fine with e1000 net . 

Keqin, was your tests with Herbert's fix? Do you need a new rpm for the guest kernel?

Comment 9 Lawrence Lim 2010-02-17 03:59:08 UTC

Adjusting Needinfo flag has been set to the wrong person.

llim->Herbert: could you please provide us with a scratch build of the patch attached in Bugzilla?

llim->sly, The bug will be updated before Mon, 22 Feb after the holiday in China once the scratch build from Herbert is available.

Comment 10 Herbert Xu 2010-02-17 04:59:02 UTC

Sorry, but I have no time to produce a scratch build.  Someone else will need to take care of this.  Thanks!

Comment 11 Naphtali Sprei 2010-02-18 09:26:08 UTC

Here's a link to the brew build:
https://brewweb.devel.redhat.com/taskinfo?taskID=2265105

Please let me know if any issues.

Comment 12 Keqin Hong 2010-02-23 05:43:56 UTC

Tested with guest kernel-2.6.18-189.el5.x86_64, but virtio-net still lost. The test methods and results were similar to comment #5.   

(In reply to comment #5)
> Keqin, was your tests with Herbert's fix? Do you need a new rpm for the guest
kernel?

"Patch24891: linux-2.6-net-virtio_net-fix-tx-wakeup-race-condition.patch" is included as of kernel 2.6.18-184.el5, but I couldn't see patch "virtio: net refill on out-of-memory (see comment #3)".

Keqin->Naphtali, has patch "virtio: net refill on out-of-memory (see comment #3)" been applied?

Comment 17 Herbert Xu 2010-03-11 10:49:46 UTC

Created attachment 399313 [details]
virtio: net refill on out-of-memory

As fixing cancel_rearming_delayed_work in RHEL5 is non-trivial, and in order to maintain the ability to unload the virtio_net module, I'm switching the refill work to a timer.

Comment 18 Herbert Xu 2010-03-11 11:03:32 UTC

Created attachment 399317 [details]
virtio: net refill on out-of-memory

The last version was bogus as we can't sleep in timers.  This one simply uses the normal poll path to do the refill.

Comment 26 Keqin Hong 2010-03-16 07:20:46 UTC

Tested on guest kernel 2.6.18-193.el5 that a temporary OOM condition just caused virtio network down shortly which could be restored later. 
(steps are similar to comment 5)

Comment 27 Jarod Wilson 2010-03-17 15:53:09 UTC

in kernel-2.6.18-194.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 29 David Jericho 2010-03-19 01:20:13 UTC

I've been running 2.6.18-194.el5 on x86_64 for over 24 hours now with no repeat of the problems mentioned in this bug. Previously they'd appear within 10 minutes of the host starting service.

I'm not sure if it's related as I can't see any obvious changes in the patch attached, but I'll report it anyway. Under 2.6.18-194.el5 on the guest, ethernet frames larger than 4096 bytes won't make it to the guest when using the e1000 interface type. Rebooting using the 2.6.18-164.el5 kernel, jumbo frames work correctly with the e1000. Jumbo frames under 2.6.18-194.el5 using the virtio interface for the guest work as expected. Watching traffic on the host bridge, the incoming packets are appearing, but the guest never sees the packet.

4096 byte frame limit verified using ping. 4096 bytes - 20 for ip header - 14 for ethernet frame header - 8 for ICMP control, gives 4054 byte maximum payload. ping -M do -s 4055 <jumbo set router interface> fails with the e1000 interface type.

We were using e1000 and virtio interface types for dual interfaced guests as it seemed to help delay the onset of this bug.

Comment 31 errata-xmlrpc 2010-03-30 07:15:57 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 32 todayyang 2012-12-11 13:57:05 UTC

hi guys,

1:I can not get into the bug 528898.so i just update here

2:maybe this issue solved by: 
http://lists.gnu.org/archive/html/qemu-devel/2012-04/msg03587.html.

but i am not sure.