Created attachment 361965 [details] SOSreport Hyperviseur Description of problem: HP is trying their application ( OCMP ) within a 5.4 VM using KVM. The application is using udp as the network protocol. They are using virtio for the network driver. Unfortunately after a while ( eg 10min )when the network is heavily stressed, the network drop. There is no logs available within the VM nor the hypervisor. If you try to restart the network it will failed. The only way to get back the network is to remove the module ( virtio_net ) and to reload it. Version-Release number of selected component (if applicable): Hypervisor is RHEL5.4 fully updated VM is RHEL 5.3 ( we have tryed with the kernel of 5.4 and 5.3 and we have got the same behaviour ) How reproducible: Always Steps to Reproduce: 1. Start the VM 2. Stress test the VM and wait 3. Actual results: The network stop Expected results: The network should be able to carry on Additional info: I have grabbed an sosreport for the Hypervisor and VM. The Hypervisor is tmobilehv1, the VM is tmobileocmp1
Created attachment 361966 [details] SOSreport VM
Here's the similar report from upstream: http://www.mail-archive.com/kvm@vger.kernel.org/msg06774.html The closest I got to figuring it out was here: http://www.mail-archive.com/kvm@vger.kernel.org/msg07006.html Does doing this in the guest fix the issue too? $> ip link set eth0 down $> ip link set eth0 up
We tried /etc/init.d/network restart without efect. However rmmod virtio_net and then restarting the network works. tcpdump in the guest shows nothing. We also have overruns on the interface. We will try your command to see, when the driver is hung, what happens. It really seems the virtio_net driver is completely out of order at that moment. We are trying to reproduce the issue with a less aplpication dependant context. If you want us to try a debug kernel or newer one, let us know.
We moved the virtual machines on another server and we installed a 4 ports Intel Gigabits network card (e1000e module). Still using the virtio network driver, it seems global network performance is really better and network bandwidth more stable. It was not the case with the previous setup using bnx2x module (burst of packets one second and almost nothing the next second). But we still loose network connectivity with the VM even if the load was far more greater and the test lasted almost 1 hour this time. On the failed VM, the behavior is also different: It is not possible to ping the VM but multicast and broadcast packets are received. I also tried to ping an external server. I received "Destination Host unreachable" for some time ans then a very strange message: "ping: sendmsg: No buffer space available" The suggested commands did not restore network connectivity: $> ip link set eth0 down $> ip link set eth0 up
Created attachment 363042 [details] strace ping command The result of a strace command when a VM's network hangs. If I increase the values in /proc/sys/net/core/wmem_*, the 'No buffer space available' message disappears for some time (the time the buffer fills up again I guess) and then comes back.
Some questions: - How is networking configured in the host? e.g. in the sosreport, I don't see anything bridging the guest to the 10.3.248.0/21 network - How is the guest launched? e.g. 'virsh dumpxml $guest' and the contents of /var/log/libvirt/qemu/$guest.log - Have you confirmed this only happens with virtio, e.g. have you tried model=e1000 ? - The "No buffer space" message is from running ping in the guest? I *think* that can be ignored as merely a symptom of the virtio interface not sending any packets - /proc/net/snmp in the host and guest might be interesting. As might 'tc -s qdisc' in the host
It would be useful to strace qemu to see if we're hitting the limit on the tun socket. Thanks!
(In reply to comment #6) > Some questions: > > - How is networking configured in the host? e.g. in the sosreport, I don't > see anything bridging the guest to the 10.3.248.0/21 network ocmp1 is a bridge. eth7 is connected to that bridge and to 10.3.248.0/21. The guest is also connected to that bridge > > - How is the guest launched? e.g. 'virsh dumpxml $guest' and the contents > of /var/log/libvirt/qemu/$guest.log [root@tmobilehv ~]# virsh dumpxml OCMP1 <domain type='kvm'> <name>OCMP1</name> <uuid>85fda6b8-3f79-f403-471c-8c3c860da2ba</uuid> <memory>8388608</memory> <currentMemory>8388608</currentMemory> <vcpu>8</vcpu> <os> <type arch='x86_64' machine='pc'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <source file='/var/lib/libvirt/images/OCMP1.img'/> <target dev='vda' bus='virtio'/> </disk> <interface type='bridge'> <mac address='54:52:00:22:60:30'/> <source bridge='ocmp1'/> <model type='virtio'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target port='0'/> </console> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='-1' autoport='yes' keymap='en-us'/> </devices> </domain> I cannot provide /var/log/libvirt/qemu/$guest.log right now. The guest has been restarted since then. I'll provide it later when it hangs again. > > - Have you confirmed this only happens with virtio, e.g. have you tried > model=e1000 ? I does not happen with e1000 driver. We ran the exact same test and the VM was still up after 2 days. > > - The "No buffer space" message is from running ping in the guest? I *think* > that can be ignored as merely a symptom of the virtio interface not sending > any packets > > - /proc/net/snmp in the host and guest might be interesting. As might > 'tc -s qdisc' in the host Same as for /var/log/libvirt/qemu/$guest.log. I'll provide them when it hangs again.
(In reply to comment #7) > It would be useful to strace qemu to see if we're hitting the limit on the tun > socket. Thanks! Do you think we can limit strace to *network* system calls? The average network traffic is around 30MB/s. The capture file will be huge.
Well you can limit it to read/write/select. But since strace only shows the first few bytes of each call, 30MB/s shouldn't be an issue.
It hanged again. Here is the content of the guest /proc/net/snmp: Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 2 64 2972397 0 1 0 0 0 2964429 2578209 1645 0 0 8912 4456 0 100 34 288 Icmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps Icmp: 387 140 387 0 0 0 0 0 0 0 0 0 0 703 0 694 0 0 0 0 9 0 0 0 0 0 IcmpMsg: InType3 OutType3 OutType8 IcmpMsg: 387 694 9 Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts Tcp: 1 200 120000 -1 2109 1685 629 4 11 812836 384977 1254 0 644 Udp: InDatagrams NoPorts InErrors OutDatagrams Udp: 2120817 25358 1229 2185641 The content of the host /proc/net/snmp: Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates Ip: 1 64 387348 0 16 0 0 0 379356 186067 0 0 1 10093 5002 1 5002 0 10092 Icmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps Icmp: 31 0 31 0 0 0 0 0 0 0 0 0 0 31 0 31 0 0 0 0 0 0 0 0 0 0 IcmpMsg: InType3 OutType3 IcmpMsg: 31 31 Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts Tcp: 1 200 120000 -1 13 111 3 0 9 374440 182233 3314 0 3 Udp: InDatagrams NoPorts InErrors OutDatagrams Udp: 457 21 0 495 And the output of 'tc -s qdisc' on the host: [root@tmobilehv ~]# tc -s qdisc qdisc pfifo_fast 0: dev eth0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 1015135694 bytes 167399 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc pfifo_fast 0: dev eth1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 555090013 bytes 2330497 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 qdisc pfifo_fast 0: dev vnet0 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 1483696670 bytes 2900326 pkt (dropped 354126, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 vnet0 is the tap interface assigned to the VM. vnet0 and eth1 are connected to the same bridge.
Created attachment 366797 [details] libvirt log
Created attachment 366799 [details] Output of strace on qemu Here is the command I ran on the host: strace -e read,write,select -p 10978 -o virtio_hangs
OK, the strace shows that there was no attempt to write to tuntap at all so we can rule out the tun driver.
Please rebuild the virtio_net module with DEBUG defined. That way we may get some clue as to what state the guest is in when this happens. Also if you can arrange remote access for me it would really help in resolving this. Thanks!
Hi Herbert, I have requested access ( via email ). You should get a reply with your access / info on how to connect. Could you provide me a how to rebuild virtio_net with DEBUG ? Regards Olivier
Thanks Olivier! To build virtio_net with DEBUG, you need to get the srpm of the same version that's being used on the machine, apply the following patch, and then build the kernel. If you've already built the kernel then make SUBDIRS=drivers/net would be sufficient since we only need the virtio_net module.
Created attachment 375048 [details] Enable debugging in virtio_net
Created attachment 382219 [details] virtio_net: Fix tx wakeup race condition virtio_net: Fix tx wakeup race condition We free completed TX requests in xmit_tasklet but do not wake the queue. This creates a race condition whereupon the queue may be emptied by xmit_tasklet and yet it remains in the stopped state. This patch fixes this by waking the queue after freeing packets in xmit_tasklet. Signed-off-by: Herbert Xu <herbert.org.au>
Changing into the kernel component
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~ RHEL 5.5 Beta has been released! There should be a fix present in this release that addresses your request. Please test and report back results here, by March 3rd 2010 (2010-03-03) or sooner. Upon successful verification of this request, post your results and update the Verified field in Bugzilla with the appropriate value. If you encounter any issues while testing, please describe them and set this bug into NEED_INFO. If you encounter new defects or have additional patch(es) to request for inclusion, please clone this bug per each request and escalate through your support representative.
This patch fixes the TX direction only. For problems on the RX direction you need another patch in the bugzilla entry that was cloned off this one. Did you get a TX lock-up or an RX lock-up?
I've come across this issue too on a very similar guest configuration as listed comment 8 while using both the e1000 and virtio interface types. An /sbin/ifdown,/sbin/ifup restores network service. Packets are making it out of the virtual machine and onto the physical network, but the replies are never making it back to the guest. They are seen on the host bridge though. Michael Kearey suggested trying the 5.5 Beta RPMS which I installed on both the host and the guest, but this did not fix my problems.
(In reply to comment #29) > I've come across this issue too on a very similar guest configuration as listed > comment 8 while using both the e1000 and virtio interface types. An > /sbin/ifdown,/sbin/ifup restores network service. Packets are making it out of > the virtual machine and onto the physical network, but the replies are never > making it back to the guest. They are seen on the host bridge though. > > Michael Kearey suggested trying the 5.5 Beta RPMS which I installed on both the > host and the guest, but this did not fix my problems. G'day David, my assumption is that since outgoing packets are succeeding, but replies are not, this has to be the rx side that is breaking for you. Thus it is the RX Lockup and BZ 554078 Cheers
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days