Bug 426853
Summary: | Missing multicast messages with realtime beta kernel | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise MRG | Reporter: | Graham Biswell <gbiswell> | ||||||
Component: | realtime-kernel | Assignee: | Arnaldo Carvalho de Melo <acme> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | beta | CC: | bhu, gcooper, lgoncalv | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-05-06 11:52:12 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Graham Biswell
2007-12-27 13:14:20 UTC
Created attachment 290452 [details]
tarball containing source for 29West test tools msend.c & mdump.c
The problem happens with 2.6.18-53.1.4.el5 as well, checking now with kernel-rt-vanilla-2.6.21-57.el. Problem also present on: [root@mica barcap]# uname -a Linux mica.ghostprotocols.net 2.6.21-57.el5rtvanilla #1 SMP PREEMPT Fri Nov 30 10:53:20 EST 2007 x86_64 x86_64 x86_64 GNU/Linux [root@mica barcap]# Tried now with: [root@mica barcap]# uname -r 2.6.18-53.el5 And the problem happens even with just one msend instance and two mdump instances. Please see attached patch I'm using to see how many packets are lost each time the packet sequential is different, when we resync and then try to correlate it with SNMP UDP MIB variables, using it the output for mdump becomes: Expected seq 4a6b8a, got 4a6b90 6 packets lost, resync Expected seq 4a6bad, got 4a6cd3 294 packets lost, resync Expected seq 4a6cd4, got 4a6cd5 1 packets lost, resync And we get output only when we detect packet loss, and it correlates with what is reported in the UDP "packet receive error" line in the netstat -s output. Now looking at the situations where this MIB variable is bumped in the udp_rcv routine in the kernel sources. Created attachment 290672 [details]
show how many packets were lost and resync the sequential to avoid useless continuous printfs
mdump is running out of receive buffer space, if one bumps rmem_max as described in the 29west docs the problem goes away. http://www.29west.com/docs/THPM/thpm.html#SETTING-KERNEL-UDP-BUFFER-LIMITS To reproduce this its not even needed to run multiple instances of msend and mdump, just a client + server pair + some other heavy system activity will provide the same results, i.e. packet loss because mdump is not being scheduled fast enough to consume what msend is producing. So please bump the rmem_max, make mdump (and other multicast receivers) use setsockopt(SO_RCVBUF) and also consider making the receiver run at a higher priority so that it can process the incoming packets faster and thus reduce the possibility that it runs out of receive buffer space. In fact mdump.c already has a -r parameter where the user can set SO_RCVBUF, so its just a matter of using '-r $((8192 * 1024))' + 'sysctl -w net.core.rmem_max=$((8192 * 1024))' to make it extremely unlikely that packets will be lost with the tests mentioned in this bug ticket. The buffer limit tests were all performed on non-rt kernels from RHEL5 and RHEL5.1, as this problem is not RT specific. rmem_max was already set to 20971520. We were not seeing any packet errors reported by "netstat -us" We finally tracked down the source of the out of sequence messages when running the mdump/msend against the loopback interface to the mtu being set to 7700. Returning this to the default 16436 resolved that issue. Running the same tests to one of the system's external interfaces, when mdump & msend were given rtpio greater than the softirq for that network interface we also saw similar out of sequence messages. With lower rtprio, no problems. Running mdump & msend with normal TS rtprio class, during a brief test we saw no out of sequence packets (though the customer did see these running the test over a much longer period). UDP is unreliable and the loss perceived was due to the way the system was set up (MTU, etc), so I'm closing this ticket after talking with Grahan. |