Bug 439923
Summary: | Avoid multi-page allocations in IP fragmentation | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Greg Marsden <greg.marsden> | ||||
Component: | kernel | Assignee: | Neil Horman <nhorman> | ||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Martin Jenner <mjenner> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 4.6 | CC: | davem, okir, tgraf, vgoyal | ||||
Target Milestone: | rc | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | All | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2009-03-23 14:18:20 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Greg Marsden
2008-03-31 23:47:08 UTC
Created attachment 299793 [details]
Avoid multi-page allocations in IP fragmentation
whats the status on this patch? I've read the thread you reference above, and it end after Herbert suggests some changes that are never answered. I'm looking in the upstream tree and I see that this patch hasn't been accepted. Given that the thread above was from 2006, I'm hesitant to accept it. Can you provide me with some sort of upstream status here? Upstream progress for this patch is stalled, but the latest version of the patch (attached here) addresses Herbert's concerns about trailer_len. The problem itself is fairly simple to reproduce on a system that supports large frames. If I am reading this patch and remembering the case correctly, it only lets the first SKB be up to PAGE_SIZE when fragmenting. If PAGE_SIZE is significantly less than the MTU this is going to kill performance, because the result will be that fragments will only be using a fraction of the MTU. This really needs to be reinvigorated upstream if it's still an issue. I suspect that the thing to do is to make protocols like UDP use the sk->sk_sndmsg_page scheme like TCP, so that outgoing frames are paged and thus not susceptible to this problem. Actually... this is exactly what the code does. ip_append_data() uses the sk_sndmsg_page when the device supports scatter-gather. So it appears the that only case that can result in multi-page allocs is when the device does not support scatter-gather. Nearly all devices support that on transmit, so this issue can only effect extremely old and primitive network devices which also support large MTUs which is a group exponentially approaching zero. Based on davem's comments, this is a NOTABUG. Dave is right, this is not an issue in mainline, because in the mainline code, it does page-at-a-time allocations if the interface supports NETIF_SG. However, that's not the behavior for EL4 or EL5, which is at issue here... so there's no upstream for this, as it's already fixed via the scatter-gather solution, but we still have the problem with current releases. Not really sure what you're talking about here, A cursory look at ip_output.c shows identical behavior for NETIF_F_SG between RHEL5 and upstream. So I'm not sure where you're seeing multiple order-0 allocations vs. higher order allocations. I'm looking upstream and both alloc_new_skb implementations call though sock_alloc_send_skb, which in the end just calls alloc_skb, which happily makes higher order allocations in both RHEL5 and upstream. Correct, RHEL5 has the SG code just like upstream. There is no difference. Feel free to call me dense this morning - but to me it looks like in RHEL5, for the first fragment we still call sock_alloc_send_skb() to alloc an skb with essentially the MTU minus overhead. This is the code from 2.6.18-53.*: datalen = length + fraggap; if (datalen > mtu - fragheaderlen) datalen = maxfraglen - fragheaderlen; [...] if ((flags & MSG_MORE) && !(rt->u.dst.dev->features&NETIF_F_SG)) alloclen = mtu; else alloclen = datalen + fragheaderlen; [...] if (transhdrlen) { skb = sock_alloc_send_skb(sk, alloclen ... If the MTU is a multiple of 8, then maxfraglen == mtu; and when we start with length >= mtu we end up with alloclen = mtu. This means alloc_skb will try to do a kmalloc(16k) on loopback, and that's what's causing trouble for Oracle. What am I missing? That may be the case, and that seems to fit with the origional comments (but not comment number #6). If that is the case however, it seems that daves arguments in Comment #4 are applicable. We might improve memory availability if we do multiple allocations, but the result of that is that we send out frames that only taek up a fraction of the interface mtu on each frame, which destroys performance. It seems the easier thing to do in this case, if oracle is having allocation failures under extreeme load, is to reduce the MTU of the loopback interface to an order zero allocation. This solves the memory allocation problem, and performance shuld be identical to what the proposed patch provides. closing due to lack of response from reporter. |