Description of problem: We originally detected this when a customer of ours had a broken firewall between two servers trafficing with eachother using SNMP. If the SNMP datagrams were large enough they got fragmented and the firewall code corrupted the the udp checksum. The asynchronous code receiving the snmp datagrams then blocked forever which wasn't really supposed to happen at all. Looking at the original problem with tcpdump we saw that we get both of the datagrams but in the application code never get back from a recvmsg() or similar system call. Some googling around and up turns a bug that got fixed in 2.6.10, after RHEL4 branched. I've attached the patches here for completeness and they are as close to the upstream patches as possible. Patch #1 fixes the problem with handling of udp checksum validation Patch #2 fixes a problem with SOCK_RAW using the code in Patch #1 which breaks someone's pptp server among other things according to the urls. We don't use this code but it is included here for completeness. For more information on Patch #1 see, http://permalink.gmane.org/gmane.linux.kernel.commits.head/ 44781 For more information on Patch #2, see http://www.gatago.com/linux/kernel/15477600.html Version-Release number of selected component (if applicable): 2.6.9-42.0.3 Additional info: It would be nice if you guys could integrate these into the next RHEL4 kernel release or something. Thanks.
Created attachment 139663 [details] [UDP]: Select handling of bad checksums
Created attachment 139664 [details] For SOCK_RAW sockets; should be the same as inet_dgram_ops but without udp_poll
Oh well, looks like someone else hit the same nail on the head. Our problem is the same as bz #212321 and 212325.
*** This bug has been marked as a duplicate of 212321 ***