Bug 524651 - Lost the network in a KVM VM on top of 5.4 [NEEDINFO]
Summary: Lost the network in a KVM VM on top of 5.4
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
high
medium
Target Milestone: rc
: ---
Assignee: Herbert Xu
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 528898 554078 589766 589897
TreeView+ depends on / blocked
 
Reported: 2009-09-21 15:47 UTC by Olivier Renault
Modified: 2013-01-09 21:56 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 554078 (view as bug list)
Environment:
Last Closed: 2010-03-30 07:29:15 UTC
Target Upstream Version:
cward: needinfo? (orenault)


Attachments (Terms of Use)
SOSreport Hyperviseur (1.45 MB, application/x-bzip2)
2009-09-21 15:47 UTC, Olivier Renault
no flags Details
SOSreport VM (663.30 KB, application/x-bzip2)
2009-09-21 15:49 UTC, Olivier Renault
no flags Details
strace ping command (10.46 KB, text/plain)
2009-09-29 17:22 UTC, Jean-Marc ANDRE
no flags Details
libvirt log (2.53 KB, application/octet-stream)
2009-10-30 13:35 UTC, Jean-Marc ANDRE
no flags Details
Output of strace on qemu (407.81 KB, application/x-bzip2)
2009-10-30 13:54 UTC, Jean-Marc ANDRE
no flags Details
Enable debugging in virtio_net (444 bytes, patch)
2009-12-01 12:45 UTC, Herbert Xu
no flags Details | Diff
virtio_net: Fix tx wakeup race condition (1.05 KB, patch)
2010-01-07 12:42 UTC, Herbert Xu
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Olivier Renault 2009-09-21 15:47:48 UTC
Created attachment 361965 [details]
SOSreport Hyperviseur

Description of problem:
HP is trying their application ( OCMP ) within a 5.4 VM using KVM. The application is using udp as the network protocol. They are using virtio for the network driver. Unfortunately after a while ( eg 10min )when the network is heavily stressed, the network drop. There is no logs available within the VM nor the hypervisor. If you try to restart the network it will failed. The only way to get back the network is to remove the module ( virtio_net ) and to reload it.

Version-Release number of selected component (if applicable):
Hypervisor is RHEL5.4 fully updated
VM is RHEL 5.3 ( we have tryed with the kernel of 5.4 and 5.3 and we have got the same behaviour )

How reproducible:
Always

Steps to Reproduce:
1. Start the VM
2. Stress test the VM and wait
3.
  
Actual results:
The network stop

Expected results:
The network should be able to carry on

Additional info:
I have grabbed an sosreport for the Hypervisor and VM. The Hypervisor is tmobilehv1, the VM is tmobileocmp1

Comment 1 Olivier Renault 2009-09-21 15:49:17 UTC
Created attachment 361966 [details]
SOSreport VM

Comment 2 Mark McLoughlin 2009-09-21 16:19:21 UTC
Here's the similar report from upstream:

  http://www.mail-archive.com/kvm@vger.kernel.org/msg06774.html

The closest I got to figuring it out was here:

  http://www.mail-archive.com/kvm@vger.kernel.org/msg07006.html

Does doing this in the guest fix the issue too?

  $> ip link set eth0 down
  $> ip link set eth0 up

Comment 3 Bruno Cornec 2009-09-22 14:08:40 UTC
We tried /etc/init.d/network restart without efect.
However rmmod virtio_net and then restarting the network works.

tcpdump in the guest shows nothing.
We also have overruns on the interface.

We will try your command to see, when the driver is hung, what happens.

It really seems the virtio_net driver is completely out of order at that moment.
We are trying to reproduce the issue with a less aplpication dependant context.

If you want us to try a debug kernel or newer one, let us know.

Comment 4 Jean-Marc ANDRE 2009-09-22 17:18:03 UTC
We moved the virtual machines on another server and we installed a 4 ports Intel Gigabits network card (e1000e module).

Still using the virtio network driver, it seems global network performance is really better and network bandwidth more stable. 
It was not the case with the previous setup using bnx2x module (burst of packets one second and almost nothing the next second).

But we still loose network connectivity with the VM even if the load was far more greater and the test lasted almost 1 hour this time.

On the failed VM, the behavior is also different:
It is not possible to ping the VM but multicast and broadcast packets are received.

I also tried to ping an external server. I received "Destination Host unreachable" for some time ans then a very strange message:

"ping: sendmsg: No buffer space available"

The suggested commands did not restore network connectivity:
  $> ip link set eth0 down
  $> ip link set eth0 up

Comment 5 Jean-Marc ANDRE 2009-09-29 17:22:08 UTC
Created attachment 363042 [details]
strace ping command

The result of a strace command when a VM's network hangs.

If I increase the values in /proc/sys/net/core/wmem_*, the 'No buffer space available' message disappears for some time (the time the buffer fills up again I guess) and then comes back.

Comment 6 Mark McLoughlin 2009-10-05 13:26:24 UTC
Some questions:

  - How is networking configured in the host? e.g. in the sosreport, I don't
    see anything bridging the guest to the 10.3.248.0/21 network

  - How is the guest launched? e.g. 'virsh dumpxml $guest' and the contents
    of /var/log/libvirt/qemu/$guest.log

  - Have you confirmed this only happens with virtio, e.g. have you tried
    model=e1000 ?

  - The "No buffer space" message is from running ping in the guest? I *think*
    that can be ignored as merely a symptom of the virtio interface not sending
    any packets

  - /proc/net/snmp in the host and guest might be interesting. As might 
    'tc -s qdisc' in the host

Comment 7 Herbert Xu 2009-10-05 13:45:02 UTC
It would be useful to strace qemu to see if we're hitting the limit on the tun socket.  Thanks!

Comment 8 Jean-Marc ANDRE 2009-10-05 15:42:11 UTC
(In reply to comment #6)
> Some questions:
> 
>   - How is networking configured in the host? e.g. in the sosreport, I don't
>     see anything bridging the guest to the 10.3.248.0/21 network

ocmp1 is a bridge. eth7 is connected to that bridge and to 10.3.248.0/21. The guest is also connected to that bridge

> 
>   - How is the guest launched? e.g. 'virsh dumpxml $guest' and the contents
>     of /var/log/libvirt/qemu/$guest.log

[root@tmobilehv ~]# virsh dumpxml OCMP1
<domain type='kvm'>
  <name>OCMP1</name>
  <uuid>85fda6b8-3f79-f403-471c-8c3c860da2ba</uuid>
  <memory>8388608</memory>
  <currentMemory>8388608</currentMemory>
  <vcpu>8</vcpu>
  <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <source file='/var/lib/libvirt/images/OCMP1.img'/>
      <target dev='vda' bus='virtio'/>
    </disk>
    <interface type='bridge'>
      <mac address='54:52:00:22:60:30'/>
      <source bridge='ocmp1'/>
      <model type='virtio'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/>
  </devices>
</domain>

I cannot provide /var/log/libvirt/qemu/$guest.log right now. The guest has been restarted since then. I'll provide it later when it hangs again.

> 
>   - Have you confirmed this only happens with virtio, e.g. have you tried
>     model=e1000 ?

I does not happen with e1000 driver. We ran the exact same test and the VM was still up after 2 days.

> 
>   - The "No buffer space" message is from running ping in the guest? I *think*
>     that can be ignored as merely a symptom of the virtio interface not sending
>     any packets
> 
>   - /proc/net/snmp in the host and guest might be interesting. As might 
>     'tc -s qdisc' in the host  

Same as for /var/log/libvirt/qemu/$guest.log. I'll provide them when it hangs again.

Comment 9 Jean-Marc ANDRE 2009-10-05 15:52:49 UTC
(In reply to comment #7)
> It would be useful to strace qemu to see if we're hitting the limit on the tun
> socket.  Thanks! 

Do you think we can limit strace to *network* system calls?
The average network traffic is around 30MB/s. The capture file will be huge.

Comment 10 Herbert Xu 2009-10-06 00:02:41 UTC
Well you can limit it to read/write/select.  But since strace only shows the first few bytes of each call, 30MB/s shouldn't be an issue.

Comment 11 Jean-Marc ANDRE 2009-10-30 13:34:33 UTC
It hanged again.
Here is the content of the guest /proc/net/snmp:

Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
Ip: 2 64 2972397 0 1 0 0 0 2964429 2578209 1645 0 0 8912 4456 0 100 34 288
Icmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps
Icmp: 387 140 387 0 0 0 0 0 0 0 0 0 0 703 0 694 0 0 0 0 9 0 0 0 0 0
IcmpMsg: InType3 OutType3 OutType8
IcmpMsg: 387 694 9
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
Tcp: 1 200 120000 -1 2109 1685 629 4 11 812836 384977 1254 0 644
Udp: InDatagrams NoPorts InErrors OutDatagrams
Udp: 2120817 25358 1229 2185641

The content of the host /proc/net/snmp:

Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
Ip: 1 64 387348 0 16 0 0 0 379356 186067 0 0 1 10093 5002 1 5002 0 10092
Icmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps
Icmp: 31 0 31 0 0 0 0 0 0 0 0 0 0 31 0 31 0 0 0 0 0 0 0 0 0 0
IcmpMsg: InType3 OutType3
IcmpMsg: 31 31
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
Tcp: 1 200 120000 -1 13 111 3 0 9 374440 182233 3314 0 3
Udp: InDatagrams NoPorts InErrors OutDatagrams
Udp: 457 21 0 495

And the output of 'tc -s qdisc' on the host:

[root@tmobilehv ~]# tc -s qdisc
qdisc pfifo_fast 0: dev eth0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1015135694 bytes 167399 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
qdisc pfifo_fast 0: dev eth1 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 555090013 bytes 2330497 pkt (dropped 0, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 
qdisc pfifo_fast 0: dev vnet0 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 1483696670 bytes 2900326 pkt (dropped 354126, overlimits 0 requeues 0) 
 rate 0bit 0pps backlog 0b 0p requeues 0 

vnet0 is the tap interface assigned to the VM. vnet0 and eth1 are connected to the same bridge.

Comment 12 Jean-Marc ANDRE 2009-10-30 13:35:57 UTC
Created attachment 366797 [details]
libvirt log

Comment 13 Jean-Marc ANDRE 2009-10-30 13:54:10 UTC
Created attachment 366799 [details]
Output of strace on qemu

Here is the command I ran on the host: strace -e read,write,select -p 10978 -o virtio_hangs

Comment 14 Herbert Xu 2009-10-30 14:43:15 UTC
OK, the strace shows that there was no attempt to write to tuntap at all so we can rule out the tun driver.

Comment 15 Herbert Xu 2009-12-01 07:57:43 UTC
Please rebuild the virtio_net module with DEBUG defined.  That way we may get some clue as to what state the guest is in when this happens.

Also if you can arrange remote access for me it would really help in resolving this.

Thanks!

Comment 16 Olivier Renault 2009-12-01 09:48:07 UTC
Hi Herbert,

I have requested access ( via email ). You should get a reply with your access / info on how to connect.

Could you provide me a how to rebuild virtio_net with DEBUG ?

Regards
Olivier

Comment 17 Herbert Xu 2009-12-01 12:44:54 UTC
Thanks Olivier!

To build virtio_net with DEBUG, you need to get the srpm of the same version that's being used on the machine, apply the following patch, and then build the kernel.

If you've already built the kernel then make SUBDIRS=drivers/net would be sufficient since we only need the virtio_net module.

Comment 18 Herbert Xu 2009-12-01 12:45:49 UTC
Created attachment 375048 [details]
Enable debugging in virtio_net

Comment 19 Herbert Xu 2010-01-07 12:42:46 UTC
Created attachment 382219 [details]
virtio_net: Fix tx wakeup race condition

virtio_net: Fix tx wakeup race condition

We free completed TX requests in xmit_tasklet but do not wake
the queue.  This creates a race condition whereupon the queue
may be emptied by xmit_tasklet and yet it remains in the stopped
state.

This patch fixes this by waking the queue after freeing packets
in xmit_tasklet.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Comment 20 Dor Laor 2010-01-07 14:04:55 UTC
Changing into the kernel component

Comment 24 Chris Ward 2010-02-11 10:05:05 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 28 Herbert Xu 2010-03-10 15:10:46 UTC
This patch fixes the TX direction only.  For problems on the RX direction you need another patch in the bugzilla entry that was cloned off this one.

Did you get a TX lock-up or an RX lock-up?

Comment 29 David Jericho 2010-03-11 01:32:51 UTC
I've come across this issue too on a very similar guest configuration as listed comment 8 while using both the e1000 and virtio interface types. An /sbin/ifdown,/sbin/ifup restores network service. Packets are making it out of the virtual machine and onto the physical network, but the replies are never making it back to the guest. They are seen on the host bridge though.

Michael Kearey suggested trying the 5.5 Beta RPMS which I installed on both the host and the guest, but this did not fix my problems.

Comment 30 Michael Kearey 2010-03-11 02:10:54 UTC
(In reply to comment #29)
> I've come across this issue too on a very similar guest configuration as listed
> comment 8 while using both the e1000 and virtio interface types. An
> /sbin/ifdown,/sbin/ifup restores network service. Packets are making it out of
> the virtual machine and onto the physical network, but the replies are never
> making it back to the guest. They are seen on the host bridge though.
> 
> Michael Kearey suggested trying the 5.5 Beta RPMS which I installed on both the
> host and the guest, but this did not fix my problems.    

G'day David, my assumption is that since outgoing packets are succeeding, but replies are not, this has to be the rx side that is breaking for you. Thus it is the RX Lockup and BZ  554078

Cheers

Comment 41 errata-xmlrpc 2010-03-30 07:29:15 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html


Note You need to log in before you can comment on or make changes to this bug.