Bug 618325

Summary: tcpdump & wireshark are very slow to start due to lengthy setsockopt PACKET_RX_RING calls
Product: [Fedora] Fedora Reporter: Phil Mayers <p.mayers>
Component: libpcapAssignee: Miroslav Lichvar <mlichvar>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 12CC: anton, dougsland, gansalmon, gharris, itamar, jonathan, kernel-maint, madhu.chinakonda, mlichvar, nhorman
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-07-28 11:05:22 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace of tcpdump; note the slow setsockopt calls on lines 76-81
none
vmstat whilst tcpdump is starting none

Description Phil Mayers 2010-07-26 16:51:08 UTC
Created attachment 434461 [details]
strace of tcpdump; note the slow setsockopt calls on lines 76-81

Description of problem:

I'm a network engineer and use tcpdump and wireshark a lot.

I've recently noticed that, on my Fedora 12 desktop machine, tcpdump and wireshar seem to take a good 10-15 seconds box to start up on average; this is not the usual startup delay I'd expect on a machine this quick (or slow!)

strace seems to indicate several calls to:

setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=31, frame_size=65600, frame_nr=31}, 16) = -1 ENOMEM (Cannot allocate memory)

...which take between 0.5 and 5 seconds to return before finally succeeding. Whilst this is happening, the system is very unresponsive - vmstat seems to think the time is spent in "swap out" and iowait.

This is new behaviour; tcpdump didn't use to do this.

Version-Release number of selected component (if applicable):

libpcap-1.0.0-4.20090922gite154e2.fc12
tcpdump-4.0.0-3.20090921gitdf3cb4.fc12

How reproducible:

Every time

Steps to Reproduce:
1. Start tcpdump
2. Observe high CPU / disk activity and slow wait
3. ?
  
Actual results:

tcpdump seems to block allocating memory

Expected results:

well... it should start a bit quicker; ideally the same sort of snappy speeds it used to

Additional info:

I will attach an strace and vmstat; at the time in question the machine had:

$ free
             total       used       free     shared    buffers     cached
Mem:       3992340    3419804     572536          0      33848     503500
-/+ buffers/cache:    2882456    1109884
Swap:      2031608    1876168     155440

...which doesn't seem unreasonable. How much ram can libpcap need ;o)

Comment 1 Phil Mayers 2010-07-26 16:53:35 UTC
Created attachment 434465 [details]
vmstat whilst tcpdump is starting

Just a note - the delay might not seem "that bad" in this case, but I'd run tcpdump several times in quick succession. I suspect various stuff had been pushed out to swap. This is as *fast* as it ever gets - when the machine has been used for "general stuff" for a few hours, it can take very noticeable time to start tcpdump.

Comment 2 Miroslav Lichvar 2010-07-27 07:48:02 UTC
I'm not sure if there is anything we can do in libpcap to fix this, beside disabling the mmaped capture.

Perhaps kernel developers will have a suggestion?

Comment 3 Chuck Ebbert 2010-07-27 14:56:38 UTC
(In reply to comment #2)
> I'm not sure if there is anything we can do in libpcap to fix this, beside
> disabling the mmaped capture.
> 
> Perhaps kernel developers will have a suggestion?    

You're trying to allocate 128k of contiguous memory and that can always cause this problem. Should probably close as NOTABUG?

Comment 4 Phil Mayers 2010-07-27 15:03:49 UTC
"you" in this case is libpcap (the component I opened the bug against ;o) so if it's a problem to do that, it should perhaps be fixed in libpcap?

Just out of curiosity: why does it take 3 seconds and *then* fail?

Is there a way to tell libpcap to *not* use MMAPed capture? Environment variable?

Comment 5 Phil Mayers 2010-07-27 15:20:42 UTC
Ah, this is interesting. If I run:

tcpdump -s 65484

...I get a single, fast setsockopt() call:

0.000059 setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=65536, block_nr=32, frame_size=65536, frame_nr=32}, 16) = 0

If I run:

tcpdump -s 65485

...I get a series of setsockopt() calls, which are slow and fail until the size is ramped down:

0.000101 setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=31, frame_size=65552, frame_nr=31}, 16) = -1 ENOMEM
1.089152 setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=30, frame_size=65552, frame_nr=30}, 16) = -1 ENOMEM
2.848500 setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=29, frame_size=65552, frame_nr=29}, 16) = -1 ENOMEM
1.025462 setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=28, frame_size=65552, frame_nr=28}, 16) = -1 ENOMEM
0.000827 setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=27, frame_size=65552, frame_nr=27}, 16) = -1 ENOMEM
0.697725 setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=26, frame_size=65552, frame_nr=26}, 16) = -1 ENOMEM
0.000778 setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=25, frame_size=65552, frame_nr=25}, 16) = -1 ENOMEM
1.381146 setsockopt(3, SOL_PACKET, PACKET_RX_RING, {block_size=131072, block_nr=24, frame_size=65552, frame_nr=24}, 16) = 0

...so the issue seems to be that asking for a capture over a certain size will trigger a memory allocation that the kernel is unwilling to perform (and takes a long-ish time to decide that)

I've habitually used "-s 0" to ensure I don't miss any data regardless of the underlying link MTU. It would be nice if this could still be relied on and be fast, but I think we're well into a libpcap/tcpdump bug here.

Comment 6 Miroslav Lichvar 2010-07-28 11:05:22 UTC
I'm still not sure how can we fix this in libpcap.

If you have suggestions, please write to the tcpdump-workers list and I will cherry pick the commit.

Comment 7 Guy Harris 2011-12-09 08:54:19 UTC
The current top-of-the-1.2-branch and trunk versions of libpcap might do a better job of this, at least on ARPHRD_ETHER interfaces (e.g., actual Ethernet interfaces), as, for those interfaces, it tries to allocate a ring buffer based on the minimum of what it thinks is the maximum packet size and the snapshot length, rather than just on the snapshot length.

It requests the MTU in an attempt to properly handle jumbo frames, and it also checks for several forms of offloading and punts if it appears that TCP segmentation/reassembly offloading may be done (as you can then get packets larger than the interface MTU+link-layer header size).

It does *not* attempt it on other network types - in particular, it can't do it on 802.11 interfaces when you're in monitor mode and getting radiotap etc. radio metadata headers, as there's no way to ask for the maximum size of those headers.

It looks as if the new TPACKET_V3 memory-mapped interface in newer kernels might not be using fixed-length slots per packets and might not require that a maximum packet size be specified when the ring buffer is created.  If so, having libpcap use that if available should, I think, work even better.

Comment 8 Neil Horman 2011-12-09 11:50:27 UTC
FWIW, This commit:
http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commit;h=0e3125c755445664f00ad036e4fc2cd32fd52877

Changed the AF_PACKET allocation strategy in the kernel so as to make the PACKET_RX_RING calls much quicker.  If you update to a later fedora it should be improved.

Comment 9 Guy Harris 2011-12-09 19:12:55 UTC
The TPACKET_V3 changes are in

http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commit;h=0d4691ce112be025019999df5f2a5e00c03f03c2

http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commit;h=f6fb8f100b807378fda19e83e5ac6828b638603a

http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commit;h=bc59ba399113fcbcac56ba22edde4b816199d48c

and probably subsequent changes such as

http://git.kernel.org/?p=linux/kernel/git/davem/net-next.git;a=commit;h=eea49cc9009767dfbafd673ee577854454b52e0d

Those, however, won't help without libpcap changes to use TPACKET_V3 - and without running a kernel with TPACKET_V3 support.  It would be Very Nice if somebody with more Copious Free Time(R) than me were to add TPACKET_V3 support to libpcap; I can't guarantee when I'd be able to work on it.