172777 – UDP packets with bad checksum not dropped

Bug 172777 - UDP packets with bad checksum not dropped

Summary: UDP packets with bad checksum not dropped

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Thomas Graf
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-11-09 16:24 UTC by Justin McNutt
Modified:	2014-06-18 08:28 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-10-19 18:51:28 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Justin McNutt 2005-11-09 16:24:32 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.8) Gecko/20050511

Description of problem:
When using 'snmpwalk' (from the net-snmp-utils package), we sometimes get bad packets in the responses from one of our network devices (which is a separate problem).  When the packets come back corrupted, snmpwalk hangs.  This also happens when using the getnext() Perl function in SNMP.pm (also part of the Net-SNMP suite).

In both cases, running strace on the program that hangs shows that the recvfrom() function is what locks up.  A normal kill or Ctrl-C will successfully terminate the hung program (kill -9 not required).  No other detrimental effect to the system has been observed.

A packet capture in tandem with the queries shows that in every case, an SNMP response with a bad UDP checksum is the last packet to arrive when the program hangs.

A bug report was submitted to the Net-SNMP maintainers via SourceForge, who claim that the problem is that the Linux kernel should be dropping UDP packets with bad checksums before the Net-SNMP library sees the packet in the first place.

Version-Release number of selected component (if applicable):
seen in multiple versions (since at least RHEL3U3)

How reproducible:
Always

Steps to Reproduce:
1.  Must create a scenario where packets will be damaged in transit (perhaps crafted responses?).
2.  Start 'tcpdump -s 0 -n -w bad.packet.pcap "udp port 161"'.
2.  Use 'strace snmpwalk -c yourcommunity -v 2c host mib' to send query.
3.  Ensure that a response with a bad UDP checksum is sent to the querier.  Note bad packet in tcpdump capture file and strace output shows that snmpwalk hung on recvfrom() function.

Actual Results:  Application hangs on recvfrom function every time.  Ethereal or tcpdump captures show that this occurs if and only if the packet has a bad UDP checksum (due to alterations elsewhere in the packet).

Expected Results:  Kernel should drop packets with bad checksums, causing query to time out (forcing retransmission of the original query).

Additional info:

If there is a way to PROVE that this is, in fact, a bug in Net-SNMP, I would be more than happy to add this information to my open bug report on SourceForge.  However, I will need instructions on how to verify this so I can convince them that it's their problem and not RedHat's.

If it IS a problem that must be solved by RedHat, I believe it should be fixed, but we could wait until Update 7, if necessary.  We have a workaround in place on our network (ugly, but functional).

Comment 1 Justin McNutt 2005-11-09 16:39:19 UTC

For reference, here is the bug report I submitted to the Net-SNMP maintainers:

http://sourceforge.net/tracker/index.php?func=detail&aid=1345296&group_id=12694&atid=112694

Comment 2 Thomas Graf 2005-11-16 13:15:10 UTC

That is right, the kernel is supposed to drop UDP fragments with invalid
checksums. It would helpful to have the following additional information to
resolve the issue:
 - What kind of network device is used on the receiving side?
 - Checksumming settings of the device (run ethtool)

Comment 3 Justin McNutt 2005-12-12 17:48:46 UTC

The network device is as follows (per the kernel at boot time):

Tigon3 [partno(BCM95703A30) rev 1002 PHY(5703)] (PCIX:133MHz:64-bit)
10/100/1000BaseT Ethernet

It's the Broadcom 10/100/1000 Ethernet over UTP NIC that came standard with the
Dell PowerEdge 2650.

As for the checksumming settings, I'm not as familiar with ethtool as perhaps I
should be, but is this correct?  See below:

[root@mybox]# ethtool -a eth0
Pause parameters for eth0:
Autonegotiate:  on
RX:             off
TX:             off

I assume this means checksumming is disabled for both inbound and outbound
packets.  I could try enabling this, of course, but wouldn't this be at layer 2?
 The layer 2 checksums are not failing, so the error wouldn't be caught by the
NIC.  The errors we've been seeing are at layer 4 and higher (specifically bad
UDP checksums).  If there is a setting for that, shouldn't it be hidden
somewhere in /proc?

Comment 4 Justin McNutt 2005-12-12 17:50:56 UTC

Additional information:

This same bug manifests itself on my laptop as well (Dell Latitude D600) which
has both TX and RX checksumming enabled (according to 'ethtool -a eth0'), which
would again suggest that the problem is not at layer 2 but higher up in the stack.

Comment 5 Justin McNutt 2006-01-05 18:51:38 UTC

Additional information from the Net-SNMP maintainers:

''My suspicion is that the recv call is possibly blocking until it receives a
(valid) packet, having been led to believe by 'select' that there was one
waiting.  If the network driver discards the mangled packet after having
signalled it using select, but before passing it back via recv, then this might
indeed have the effect of locking up within the recvfrom call.''

What do you think?

Comment 6 Justin McNutt 2006-02-27 02:33:05 UTC

Have head no updates on this issue.  Any ideas or word on if/when this might be
fixed?  Have you been able to reproduce the problem?  Is it something that has
been fixed in RHEL 4 (or 5)?

Comment 7 Justin McNutt 2006-03-08 13:47:35 UTC

This problem continues to crop up from time to time, which means that processes
will hang out on my management station until someone comes along to run strace
on them to confirm they are locked up and kill them.  Some investigation into
this issue would be appreciated.

Comment 8 Kaj J. Niemi 2007-02-22 23:34:18 UTC

Yours is RHEL3 but in RHEL4 there's bug #212321 (and all its reported duplicates). I'm not sure how 2.4 
behaves with regards to UDP or if there is a similar small patch available upstream. You might want to 
look into that. :) HTH

Comment 9 RHEL Program Management 2007-10-19 18:51:28 UTC

This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Comment 10 Justin McNutt 2007-10-22 19:51:43 UTC

So even though I reported this bug TWO YEARS AGO before RHEL 3 was in
maintenance mode, you're still not going to fix it.  That's just great.  Thanks
for nothing.

Note You need to log in before you can comment on or make changes to this bug.