From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; rv:1.7.3) Gecko/20040914 Firefox/0.10.1 Description of problem: I've discovered a bug in the kernel 2.4.21-9.EL that is distributed with RedHat Enterprise Linux ES 3.1. It appears to be exactly the bug: [NETFILTER]: Fix checksum bug for multicast/broadcast packets on postrouting hook. However, the bug in question and the fix for it are applied to kernel 2.6.7: http://linux.bkbits.net:8080/linux-2.6/cset@40c002854YGOfqN8yOMFH8gC2xarLw Do you happen to know if there is a similar fix for this bug in the RedHat kernel? I took the patch and manually applied it to the RedHat source; I got it to compile and boot and it does fix the problem, but of course I'm not sure what other problems I've introduced in the RedHat kernel by doing this. Also, I looked at the source for the latest RedHat kernel, 2.4.21-20.EL, and it does not appear that the fix is in that version of the kernel. Version-Release number of selected component (if applicable): kernel-2.4.21-9.EL How reproducible: Always Steps to Reproduce: One way to reproduce this is to use Samba's findsmb: 1. modprobe ip_vs 2. findsmb 3. (no results returned) Another way to verify that the checksum is wrong: 1. Machine A runs kernel-2.4.21-9.EL 2. Machine A: modprobe ip_vs 3. Machine B: Set up ethereal on this machine in the same broadcast domain as Machine A. 4. Machine B: Tell ethereal to catch packets from Machine A. 5. Machine A: findsmb 6. Machine B: Examine the UDP broadcast packets sent because of findsmb; notice that Ethereal reports that the checksum is wrong. Additional info:
I validated that this also happens in RHEL kernel 2.4.21-20.EL.
This seems to be identical to 116110: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=116110
This bug still exists in 2.4.21-20.0.1.EL.
This bug still exists in 2.4.21-27.0.1.EL. Is there any plan to address this? Or will this be ignored until RHEL4? This is a very serious issue for us, and anyone else running Samba or lots of other things for that matter...
Have you called/used support for this? That's probably the best way to get anything like this fixed.
There is no way we could ever include this patch as it is a major kABI breaker. I also can't figure out a non-kABI breaking version of this fix, we really need to be able to mangle the callers skb pointer and thus change the pointer type of these core interfaces and call sites. I don't think we can really fix this one, unfortunately.
Is this going to change to CANTFIX? or is it still being considered? We still have end-users out there encountering this bug.
I'm moving it to CANTFIX. It is even more undesirable to break KABI for this product than it was when I made my previous analysis in comment from 2006-01-30
So we had another customer who wanted this fixed and I managed to come up with a a non-kabi breaker that seems to resolve their problem. I'm not sure exactly how ideal it is, so I'd like to know what others think. (Of course there is still the great debate about whether or not an update to RHEL3 will exist and whether or not this will make the cut, but I wanted to post the patch anyway. I basically took a look at what the upstream code does and how rhel3 behaved and realized that the frames being xmitted are always cloned skb's. Knowing that cloned skb's shouldn't be overwritten, it seems logical to include the check in the upstream code that copies the data to a new buffer if it's cloned and then proceeds to recalculate the correct checksum. Thoughts?
Patch is available here: http://people.redhat.com/agospoda/other/gss-test/gsstest-skb_checksum_help.patch
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
QE ack for 3.9. Fix has been confirmed in internal testing, client testing and customer testing.
A fix for this problem has just been committed to the RHEL3 U9 patch pool this evening (in kernel version 2.4.21-47.2.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2007-0436.html
Internal Status set to 'Resolved' Status set to: Closed by Client This event sent from IssueTracker by solgato issue 97826