Description of problem: Customer's multicast message stream test harness (using Tibco Smart Sockets) reports missing messages with realtime beta kernel. No problems running same test on stock rhel5.1 kernel. Version-Release number of selected component (if applicable): 2.6.21.57-el5rt How reproducible: Problem happens consistently, almost immediately the customer's test starts. Working with their 29West engineer, customer can reproduce problem using 29West's multicast tools msend, mdump. Tarball containing these is attached, though they are also available, including source from http://69.55.236.25/docs/TestNet/testnet.html#INITIAL-FOUR. The 29West engineer modified mdump to display missing message sequence numbers. It's this version that's attached. Steps to Reproduce: 1.Build msend & mdump, "make msend" & "make mdump" is sufficient. 2.The test requires 2 separate multicast streams for the problem to manifest itself (the customer's application test suite uses 6 streams). These can run all be run on a single box provided they use different multicast addresses. The tests need to be bound to one of the system's addresses. We found that running against loopback reproduced the issue more quickly that against an external interface. Initially start 2 "mdump" listener processes in separate windows ./mdump -v -q 224.23.55.11 4400 127.0.0.1 ./mdump -v -q 224.23.55.12 4400 127.0.0.1 There should be no output from either 3. In another window run "msend" against one of the multicast addresses above. ./msend -b400 -m8192 -n500000 -p5 -q -s2000 -S8388608 224.23.55.11 4400 2 127.0.0.1 There should be no output from this, nor either of the mdump processes. 4. Now start a second "msend" against the other multicast address. ./msend -b400 -m8192 -n500000 -p5 -q -s2000 -S8388608 224.23.55.12 4400 2 127.0.0.1 Actual results: What we see after a very short time is something like the following error message in both mdump windows ... Expected seq 1137f, got 11df0 Expected seq 11380, got 11df1 Expected seq 11381, got 11df2 Expected seq 11382, got 11df3 Expected results: No output from either mdump process. Additional info: Hardware is 4-way dual core HP DL585 with 8GB memory.
Created attachment 290452 [details] tarball containing source for 29West test tools msend.c & mdump.c
The problem happens with 2.6.18-53.1.4.el5 as well, checking now with kernel-rt-vanilla-2.6.21-57.el.
Problem also present on: [root@mica barcap]# uname -a Linux mica.ghostprotocols.net 2.6.21-57.el5rtvanilla #1 SMP PREEMPT Fri Nov 30 10:53:20 EST 2007 x86_64 x86_64 x86_64 GNU/Linux [root@mica barcap]#
Tried now with: [root@mica barcap]# uname -r 2.6.18-53.el5 And the problem happens even with just one msend instance and two mdump instances.
Please see attached patch I'm using to see how many packets are lost each time the packet sequential is different, when we resync and then try to correlate it with SNMP UDP MIB variables, using it the output for mdump becomes: Expected seq 4a6b8a, got 4a6b90 6 packets lost, resync Expected seq 4a6bad, got 4a6cd3 294 packets lost, resync Expected seq 4a6cd4, got 4a6cd5 1 packets lost, resync And we get output only when we detect packet loss, and it correlates with what is reported in the UDP "packet receive error" line in the netstat -s output. Now looking at the situations where this MIB variable is bumped in the udp_rcv routine in the kernel sources.
Created attachment 290672 [details] show how many packets were lost and resync the sequential to avoid useless continuous printfs
mdump is running out of receive buffer space, if one bumps rmem_max as described in the 29west docs the problem goes away. http://www.29west.com/docs/THPM/thpm.html#SETTING-KERNEL-UDP-BUFFER-LIMITS To reproduce this its not even needed to run multiple instances of msend and mdump, just a client + server pair + some other heavy system activity will provide the same results, i.e. packet loss because mdump is not being scheduled fast enough to consume what msend is producing. So please bump the rmem_max, make mdump (and other multicast receivers) use setsockopt(SO_RCVBUF) and also consider making the receiver run at a higher priority so that it can process the incoming packets faster and thus reduce the possibility that it runs out of receive buffer space.
In fact mdump.c already has a -r parameter where the user can set SO_RCVBUF, so its just a matter of using '-r $((8192 * 1024))' + 'sysctl -w net.core.rmem_max=$((8192 * 1024))' to make it extremely unlikely that packets will be lost with the tests mentioned in this bug ticket. The buffer limit tests were all performed on non-rt kernels from RHEL5 and RHEL5.1, as this problem is not RT specific.
rmem_max was already set to 20971520. We were not seeing any packet errors reported by "netstat -us" We finally tracked down the source of the out of sequence messages when running the mdump/msend against the loopback interface to the mtu being set to 7700. Returning this to the default 16436 resolved that issue. Running the same tests to one of the system's external interfaces, when mdump & msend were given rtpio greater than the softirq for that network interface we also saw similar out of sequence messages. With lower rtprio, no problems. Running mdump & msend with normal TS rtprio class, during a brief test we saw no out of sequence packets (though the customer did see these running the test over a much longer period).
UDP is unreliable and the loss perceived was due to the way the system was set up (MTU, etc), so I'm closing this ticket after talking with Grahan.