Bug 426853

Summary: Missing multicast messages with realtime beta kernel
Product: Red Hat Enterprise MRG Reporter: Graham Biswell <gbiswell>
Component: realtime-kernelAssignee: Arnaldo Carvalho de Melo <acme>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: betaCC: bhu, gcooper, lgoncalv
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-05-06 11:52:12 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tarball containing source for 29West test tools msend.c & mdump.c
none
show how many packets were lost and resync the sequential to avoid useless continuous printfs none

Description Graham Biswell 2007-12-27 13:14:20 UTC
Description of problem:
Customer's multicast message stream test harness (using Tibco Smart Sockets)
reports missing messages with realtime beta kernel. No problems running same
test on stock rhel5.1 kernel.


Version-Release number of selected component (if applicable):
2.6.21.57-el5rt


How reproducible:
Problem happens consistently, almost immediately the customer's test starts.

Working with their 29West engineer, customer can reproduce problem using
29West's multicast tools msend, mdump. Tarball containing these is attached,
though they are also available, including source from
http://69.55.236.25/docs/TestNet/testnet.html#INITIAL-FOUR. The 29West engineer
modified mdump to display missing message sequence numbers. It's this version
that's attached.

Steps to Reproduce:
1.Build msend & mdump, "make msend" & "make mdump" is sufficient.

2.The test requires 2 separate multicast streams for the problem to manifest
itself (the customer's application test suite uses 6 streams). These can run all
be run on a single box provided they use different multicast addresses. The
tests need to be bound to one of the system's addresses. We found that running
against loopback reproduced the issue more quickly that against an external
interface. Initially start 2 "mdump" listener processes in separate windows
./mdump -v -q 224.23.55.11 4400 127.0.0.1
./mdump -v -q 224.23.55.12 4400 127.0.0.1

There should be no output from either

3. In another window run "msend" against one of the multicast addresses above.
./msend -b400 -m8192 -n500000 -p5 -q -s2000 -S8388608 224.23.55.11 4400 2 127.0.0.1

There should be no output from this, nor either of the mdump processes.

4. Now start a second "msend" against the other multicast address.
./msend -b400 -m8192 -n500000 -p5 -q -s2000 -S8388608 224.23.55.12 4400 2 127.0.0.1


Actual results:
What we see after a very short time is something like the following
error message in both mdump windows ...

Expected seq 1137f, got 11df0
Expected seq 11380, got 11df1
Expected seq 11381, got 11df2
Expected seq 11382, got 11df3


Expected results:
No output from either mdump process.


Additional info:
Hardware is 4-way dual core HP DL585 with 8GB memory.

Comment 1 Graham Biswell 2007-12-27 13:14:20 UTC
Created attachment 290452 [details]
tarball containing source for 29West test tools msend.c & mdump.c

Comment 2 Arnaldo Carvalho de Melo 2008-01-02 16:59:17 UTC
The problem happens with 2.6.18-53.1.4.el5 as well, checking now with
kernel-rt-vanilla-2.6.21-57.el.

Comment 3 Arnaldo Carvalho de Melo 2008-01-02 17:12:46 UTC
Problem also present on:

[root@mica barcap]# uname -a
Linux mica.ghostprotocols.net 2.6.21-57.el5rtvanilla #1 SMP PREEMPT Fri Nov 30
10:53:20 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
[root@mica barcap]#

Comment 4 Arnaldo Carvalho de Melo 2008-01-02 18:08:53 UTC
Tried now with:

[root@mica barcap]# uname -r
2.6.18-53.el5

And the problem happens even with just one msend instance and two mdump instances.

Comment 5 Arnaldo Carvalho de Melo 2008-01-02 18:23:40 UTC
Please see attached patch I'm using to see how many packets are lost each time
the packet sequential is different, when we resync and then try to correlate it
with SNMP UDP MIB variables, using it the output for mdump becomes:

Expected seq 4a6b8a, got 4a6b90
  6 packets lost, resync
Expected seq 4a6bad, got 4a6cd3
  294 packets lost, resync
Expected seq 4a6cd4, got 4a6cd5
  1 packets lost, resync

And we get output only when we detect packet loss, and it correlates with what
is reported in the UDP "packet receive error" line in the netstat -s output. Now
looking at the situations where this MIB variable is bumped in the udp_rcv
routine in the kernel sources.


Comment 6 Arnaldo Carvalho de Melo 2008-01-02 18:24:47 UTC
Created attachment 290672 [details]
show how many packets were lost and resync the sequential to avoid useless continuous printfs

Comment 7 Arnaldo Carvalho de Melo 2008-01-02 18:45:41 UTC
mdump is running out of receive buffer space, if one bumps rmem_max as described
in the 29west docs the problem goes away.

http://www.29west.com/docs/THPM/thpm.html#SETTING-KERNEL-UDP-BUFFER-LIMITS

To reproduce this its not even needed to run multiple instances of msend and
mdump, just a client + server pair + some other heavy system activity will
provide the same results, i.e. packet loss because mdump is not being scheduled
fast enough to consume what msend is producing.

So please bump the rmem_max, make mdump (and other multicast receivers) use
setsockopt(SO_RCVBUF) and also consider making the receiver run at a higher
priority so that it can process the incoming packets faster and thus reduce the
possibility that it runs out of receive buffer space.

Comment 8 Arnaldo Carvalho de Melo 2008-01-02 19:08:38 UTC
In fact mdump.c already has a -r parameter where the user can set SO_RCVBUF, so
its just a matter of using '-r $((8192 * 1024))' + 'sysctl -w
net.core.rmem_max=$((8192 * 1024))' to make it extremely unlikely that packets
will be lost with the tests mentioned in this bug ticket.

The buffer limit tests were all performed on non-rt kernels from RHEL5 and
RHEL5.1, as this problem is not RT specific.

Comment 9 Graham Biswell 2008-01-08 12:49:55 UTC
rmem_max was already set to 20971520.
We were not seeing any packet errors reported by "netstat -us"

We finally tracked down the source of the out of sequence messages when running
the mdump/msend against the loopback interface to the mtu being set to 7700.
Returning this to the default 16436 resolved that issue.

Running the same tests to one of the system's external interfaces, when mdump &
msend were given rtpio greater than the softirq for that network interface we
also saw similar out of sequence messages. With lower rtprio, no problems.

Running mdump & msend with normal TS rtprio class, during a brief test we saw no
out of sequence packets (though the customer did see these running the test over
a much longer period).



Comment 10 Arnaldo Carvalho de Melo 2008-05-06 11:52:12 UTC
UDP is unreliable and the loss perceived was due to the way the system was set
up (MTU, etc), so I'm closing this ticket after talking with Grahan.