Bug 187889 - Upgrade to 2.6.15 causes eth0 to drop all packets under load
Upgrade to 2.6.15 causes eth0 to drop all packets under load
Status: CLOSED CANTFIX
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Thomas Graf
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-04-04 06:45 EDT by Phil Wilson
Modified: 2014-06-18 04:29 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-05-10 08:51:09 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Updated igmp.c (58.63 KB, text/plain)
2006-04-12 11:13 EDT, Phil Wilson
no flags Details
A simple program to test the problem\fix (1.59 KB, text/plain)
2006-04-12 11:13 EDT, Phil Wilson
no flags Details

  None (edit)
Description Phil Wilson 2006-04-04 06:45:29 EDT
Description of problem:
Whilst developing a multicast routing application a bug was found in the 2.6.9
kernel in the ip_mc_msfilter code. This bug prevented setipv4sourcefilter from
switching from source specific to anysource modes.
In order to get around this problem, which was blocking our code development, we
upgraded to the 2.6.15 kernel which has fixes for this problem.

Since the upgrade we have been experiencing network problems on this server.
When there is a high network load, eth0 starts to drop all packets. It will do
this for a period of time and then recover. Occasionally, but not always,
restarting the network service, or removing a VLAN will cause the packets to
flow again. A reboot always solves the problem.

By running the 2.6.9 kernel again, this problem does not happen.

Version-Release number of selected component (if applicable):
Dell Poweredge 1850 server
Kernel 2.6.15


How reproducible:
Very


Steps to Reproduce:
1. Start receiving a network video stream with VLC
2. Cause a CPU load (make the VLC window larger)
  
Actual results:
eth0 drops all packets

Expected results:
Normal operation

Additional info:
Please let me know what debug you will need, and how to capture it.
Comment 1 Phil Wilson 2006-04-04 09:17:15 EDT
OS = Redhat Enterprise Linux 4
uname -a = Linux BM3 2.6.15.ELsmp #1 SMP Wed Jan 5 19:30:39 EST 2005 i686 i686
i386 GNU/Linux
Comment 2 Jason Baron 2006-04-04 11:28:57 EDT
in the RHEL4 context, we would like to fix the routing bug that you encountered
so anymore details about that would be helpful. In terms of a 2.6.15 bug, that
should be filed either against fedora, or against the upstream kernel...
Comment 3 Phil Wilson 2006-04-05 03:45:10 EDT
The error exists in Linux/net/ipv4/igmp.c file in function ip_mc_msfilter.

When passing this function an anysource (exclude no sources) the function
returns an EADDRNOTAVAIL error. This function is used to set the filter for
source addresses. To allow any source, we have a empty filter set which equates
to excluding no addresses (therefore allowing any). This function should not
return the above error when an empty source address range is passed.

The 2.6.15 kernel works correctly via changes to the above file.

I could not limit the exact changes made to the multicast implementation, to
just provide you with them.
Comment 4 Thomas Graf 2006-04-05 03:53:15 EDT
It's most probably this changeset:

diff-tree 8713dbf05754aa777f31bf491cb60a111f7ad828 (from ec1890c5df451799dec969a
Author: Yan Zheng <yanzheng@21cn.com>
Date:   Fri Oct 28 08:02:08 2005 +0800

    [MCAST]: ip[6]_mc_add_src should be called when number of sources is zero
    
    And filter mode is exclude.
    
    Further explanation by David Stevens:
    
    Multicast source filters aren't widely used yet, and that's really the only
    feature that's affected if an application actually exercises this bug, as fa
    as I can tell. An ordinary filter-less multicast join should still work, and
    only forwarded multicast traffic making use of filters and doing empty-sourc
    filters with the MSFILTER ioctl would be at risk of not getting multicast
    traffic forwarded to them because the reports generated would not be based o
    the correct counts.
    
    Signed-off-by: Yan Zheng <yanzheng@21cn.com
    Acked-by: David L Stevens <dlstevens@us.ibm.com>
    Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>

diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c
index 8b6d393..c6247fc 100644
--- a/net/ipv4/igmp.c
+++ b/net/ipv4/igmp.c
@@ -1908,8 +1908,11 @@ int ip_mc_msfilter(struct sock *sk, stru
                        sock_kfree_s(sk, newpsl, IP_SFLSIZE(newpsl->sl_max));
                        goto done;
                }
-       } else
+       } else {
                newpsl = NULL;
+               (void) ip_mc_add_src(in_dev, &msf->imsf_multiaddr,
+                                    msf->imsf_fmode, 0, NULL, 0);
+       }
        psl = pmc->sflist;
        if (psl) {
                (void) ip_mc_del_src(in_dev, &msf->imsf_multiaddr, pmc->sfmode,

Comment 5 Phil Wilson 2006-04-05 04:00:20 EDT
The above all looks correct.
Is there a chance we could recieve a interim patch to fix this? This is blocking
our customer acceptance testing, which is due to complete next week.

Thanks in advance.
Comment 6 Phil Wilson 2006-04-10 05:04:46 EDT
Please can someone advise how I now progress this issue. My company are now
waiting on this issue to be resolved before we can gain customer acceptance.
Comment 7 Phil Wilson 2006-04-12 11:07:30 EDT
The above patch did not work completely. The error flag needs to be reset too:

                        sock_kfree_s(sk, newpsl, IP_SFLSIZE(newpsl->sl_max));
                        goto done;
                }
        } else {
                newpsl = NULL;
                (void) ip_mc_add_src(in_dev, &msf->imsf_multiaddr,
                                        msf->imsf_fmode, 0, NULL, 0);
                err=0;
        }
        psl = pmc->sflist;
        if (psl) {
                (void) ip_mc_del_src(in_dev, &msf->imsf_multiaddr, pmc->sfmode,
Comment 8 Phil Wilson 2006-04-12 11:09:31 EDT
The above changes to igmp.c have fixed the problem.

Please can someone issue me with a compiled kernel with this patch in place, so
that I can upgrade the servers with an official release.

The servers need to be running an offical Redhat released kernel for our
customer acceptance testing.

Thank you.
Comment 9 Phil Wilson 2006-04-12 11:13:30 EDT
Created attachment 127661 [details]
Updated igmp.c
Comment 10 Phil Wilson 2006-04-12 11:13:59 EDT
Created attachment 127662 [details]
A simple program to test the problem\fix
Comment 11 Thomas Graf 2012-05-10 08:51:09 EDT
RHEL4 has entered the Extended Life Phase. There will be no more minor releases.

I'm closing this bug due to inactivity.

Please reopen and provide an explanation if you need this issue to be addressed in RHEL4. Please note that only security and critical bugfixes are considered at this point.

Note You need to log in before you can comment on or make changes to this bug.