Bug 429417 - [RHEL5 U2] Kernel SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
Summary: [RHEL5 U2] Kernel SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Takahiro Yasui
QA Contact: Martin Jenner
URL: http://rhts.lab.boston.redhat.com/cgi...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-01-19 16:25 UTC by Jeff Burke
Modified: 2014-07-25 05:07 UTC (History)
13 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:07:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Fix truesize setting in ip_append_data (561 bytes, patch)
2008-01-23 02:55 UTC, Herbert Xu
no flags Details | Diff
Fix truesize accounting in ip_append_data (1.10 KB, patch)
2008-01-23 03:01 UTC, Herbert Xu
no flags Details | Diff
Updated upstream version of patch. (1.98 KB, patch)
2008-01-23 06:40 UTC, David Miller
no flags Details | Diff
Fix ip_fragment slow path truesize bug (1.97 KB, patch)
2008-01-24 21:59 UTC, Herbert Xu
no flags Details | Diff
Fix up truesize in audit_expand (448 bytes, patch)
2008-01-27 12:49 UTC, Herbert Xu
no flags Details | Diff
Fix audit_expand to increase truesize (720 bytes, patch)
2008-01-27 21:17 UTC, Herbert Xu
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Jeff Burke 2008-01-19 16:25:58 UTC
Description of problem:
With Kernel 2.6.18-70.el5 running the RHTS connectathon NFS UDP test suite the
system reports SKB BUG: messages in the log.

Version-Release number of selected component (if applicable):
2.6.18-70.el5

How reproducible:
Always

Steps to Reproduce:
1. Install a system with RHEL5.U1 or RHEL5.2-Server-20080117.1
2. Install 2.6.18-70.el5 
3. Run the RHTS connectathon test suite.
  
Actual results:
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172

Expected results:
System should not generate BUG messages during normal operations

Additional info:
- [net] udp: update infiniband driver (Hideo AOKI ) [223593]
- [net] udp: add memory accounting (Hideo AOKI ) [223593]
- [net] udp: new accounting interface (Hideo AOKI ) [223593]

Comment 1 Herbert Xu 2008-01-22 12:09:12 UTC
We need to determine whether this symptom is caused by a latent bug exposed by
this patch-set.  To do this, could you please redo the test after adding a
dump_stack() call to skb_truesize_check() when it prints the warning? That'll
tell us whether it's a TX skb or an RX skb, and if it's the RX skb, please tell
me the NIC in use.

Thanks!

Comment 2 Neil Horman 2008-01-22 12:50:20 UTC
Jeff, can we imply that while -70 fails, -69 running the same test suite works
properly?  I'd just like to be sure, since -68 has some potential candidates for
causing this problem as well.  Also, do you know if this failure is system
specific?  If so, what kind of nic is on the system you're testing on?  Thanks!

Comment 4 Hideo AOKI 2008-01-22 14:35:34 UTC
As far as I know, this bug is reproduced when we use nfs mount and tcpdump.
I'll try to re-test using a kernel with dump_stack() call.

Comment 5 David Miller 2008-01-22 14:38:59 UTC
The problem is that, as noted sock_rfree() now does truesize validation
so it is a latent bug being exposed by the UDP memory accounting
patches.

I suspect some part of the NFS I/O path, or IP write path itself, is
adding pages to an SKB frag list and not bumping skb->truesize properly.
That is what needs to be fixed.

I suspect it's ip_append_page() that's causing the problems.
Unlike ip_append_data() it doesn't bump the skb->truesize properly.

sendpage isn't used on UDP by many things, the xprtsock and svcsock
paths of SUNRPC being one of them.

The fix is simple, add a skb->truesize += len; right after the
skb->data_len += len; line.

Fortunately (or unfortunately, depending upon your perspective :-)
there isn't an IPV6 side code path and therefore no need to look
there for another instance of the same bug.


Comment 6 Hideo AOKI 2008-01-22 14:56:30 UTC
Hello David,

Thank you so much for your comments.
I'll test the fix.

Comment 7 Hideo AOKI 2008-01-22 15:58:20 UTC
(In reply to comment #6) 

I tested the fix. However, I still see the SKB BUG message.
I think that David's fix is correct but the fix will not solve this bug.

Comment 8 Hideo AOKI 2008-01-22 16:10:08 UTC
(In reply to comment #7)
> (In reply to comment #6) 
> 
> I tested the fix. However, I still see the SKB BUG message.
> I think that David's fix is correct but the fix will not solve this bug.

The bug is found during releasing receive buffer. On the other hand,
ip_append_pages() is used send buffer. Then, I thought that fix of 
ip_append_pages() wasn't related to root cause of this bug.
 

Comment 9 Neil Horman 2008-01-22 16:21:42 UTC

perhaps it would be worthwhile to do a test build and pepper the receive
pathwith truesize checks? start  at netif_rx, and check at the top of every
routine through to udp_rcv and tcp_rcv?


Comment 10 Hideo AOKI 2008-01-22 16:30:32 UTC
I probably found proper fix of this bug.

Here is a trace using systemtap.

 0xc05aa194 : skb_truesize_bug+0x1/0x1c []
 0xc05a7ffc : sock_rfree+0x1e/0x43 []
 0xc05aa7c4 : __kfree_skb+0x97/0xe3 []
 0xc05ac689 : skb_free_datagram+0xa/0x29 []
 0xc0603e98 : packet_recvmsg+0x1ac/0x1b8 []
 0xc05a5203 : sock_recvmsg+0xe5/0x100 []
 0xc04362e5 : autoremove_wake_function+0x0/0x2d []
 0xc052f934 : n_tty_receive_buf+0xc39/0xcab []
 0xc0456217 : __generic_file_aio_read+0x166/0x198 []
 0xc0454e1f : file_read_actor+0x0/0xd1 []
 0xc05a6930 : sys_recvfrom+0xcf/0x137 []
 0xc045913a : __pagevec_free+0x14/0x1a []
 0xc045b13c : release_pages+0x11f/0x127 []
 0xc0436489 : remove_wait_queue+0x16/0x25 []
 0xc041f87e : __wake_up+0x2a/0x3d []
 0xc05a6dfa : sys_socketcall+0x135/0x19e []
 0xc0404eff : syscall_call+0x7/0xb []

It seems that the bug message appears when law socket calls sock_rfree(). 
Memory accounting is a feature for some stream protocols and UDP. Law
socket isn't supported currently. So, skb_truesize_check() should be 
called only if memory accounting is supported. We tested that this 
fix passed the test. I'll confirm brew build.
 

Comment 11 Hideo AOKI 2008-01-22 16:35:28 UTC
(In reply to comment #10)

> It seems that the bug message appears when law socket calls sock_rfree(). 
> Memory accounting is a feature for some stream protocols and UDP. Law
> socket isn't supported currently. So, skb_truesize_check() should be 
> called only if memory accounting is supported.

This wasn't Law but RAW.


Comment 12 Herbert Xu 2008-01-22 23:44:32 UTC
You said raw sockets, but your backtrace indicates an AF_PACKET socket, so which
is it? Also, is this packet coming in from the network or is it locally
generated? If the former please tell me what NIC it is as it's probably buggy.

Thanks!

Comment 13 Hideo AOKI 2008-01-23 00:29:17 UTC
(In reply to comment #5)

> I suspect some part of the NFS I/O path, or IP write path itself, is
> adding pages to an SKB frag list and not bumping skb->truesize properly.
> That is what needs to be fixed.
> 
> I suspect it's ip_append_page() that's causing the problems.
> Unlike ip_append_data() it doesn't bump the skb->truesize properly.
> 
> sendpage isn't used on UDP by many things, the xprtsock and svcsock
> paths of SUNRPC being one of them.
> 
> The fix is simple, add a skb->truesize += len; right after the
> skb->data_len += len; line.

To solve the problem, we found that ip_append_data() needed this fix.


Comment 14 Takahiro Yasui 2008-01-23 00:42:47 UTC
ip_append_data() did not set truesize correctly, then, this problem happend.
Packets are sent by packet_rcv() function and call trace when packet_rcv() is
called is as follows.

 0xc060480d : packet_rcv+0x1/0x2f1 []
 0xc05aead9 : dev_hard_start_xmit+0xd6/0x1ee []
 0xc05bc886 : __qdisc_run+0xe4/0x19a []
 0xc05b0395 : dev_queue_xmit+0x144/0x24b []
 0xc05b497e : neigh_resolve_output+0x1ba/0x1e4 []
 0xc05cc8b7 : ip_finish_output2+0x164/0x19a []
 0xc05cc1cd : ip_fragment+0x30f/0x6ba []
 0xc05cc753 : ip_finish_output2+0x0/0x19a []
 0xc05cb972 : ip_append_data+0x5d0/0x945 []
 0xc05cd869 : ip_output+0xce/0x265 []
 0xc05cb2ec : ip_push_pending_frames+0x2e9/0x39f []
 0xc05e3ee9 : udp_push_pending_frames+0x1ed/0x20e []
 0xc05e47ef : udp_sendmsg+0x40c/0x4d7 []
 0xc0608e1f : _spin_lock_bh+0x8/0x18 []
 0xc0608e1f : _spin_lock_bh+0x8/0x18 []
 0xc05a7943 : release_sock+0xc/0x91 []
 0xc05e47f8 : udp_sendmsg+0x415/0x4d7 []
 0xc05e9da5 : inet_sendmsg+0x35/0x3f []
 0xc05a52ec : sock_sendmsg+0xce/0xe8 []
 0xc04362e5 : autoremove_wake_function+0x0/0x2d []
 0xc04362e5 : autoremove_wake_function+0x0/0x2d []
 0xc045e6d9 : set_page_address+0xa8/0x121 []
 0xc05a6e8a : kernel_sendmsg+0x27/0x35 []
 0xc05a7b1c : sock_no_sendpage+0x4d/0x60 []
 0xf8d2b115 : xs_udp_send_request+0x1ae/0x2a4 [sunrpc]
 0xc05a7acf : sock_no_sendpage+0x0/0x60 []
 0xf8d29059 : xprt_transmit+0xbf/0x19e [sunrpc]
 0xf8e2cee5 : nfs3_xdr_writeargs+0x0/0x7a [nfs]
 0xf8d2737f : call_transmit+0x19b/0x1c8 [sunrpc]
 0xf8d2bf17 : __rpc_execute+0x7a/0x1f8 [sunrpc]
 0xf8e287a2 : nfs_execute_write+0x35/0x49 [nfs]
 0xf8e2999c : nfs_flush_one+0xb7/0xfd [nfs]
 0xf8e28536 : nfs_flush_list+0x87/0xcf [nfs]
 0xf8e298e5 : nfs_flush_one+0x0/0xfd [nfs]
 0xf8e29292 : nfs_sync_inode_wait+0xc5/0x1ab [nfs]
 0xf8e20a93 : nfs_file_flush+0x48/0x7e [nfs]
 0xc046ef44 : filp_close+0x2f/0x54 []
 0xc0404eff : syscall_call+0x7/0xb []

ip_fragment() function sets data for each packets, but truesize of packets
are showed as follows, and the truesize in the third packet is already wrong
when it is sent.

skb=0xf77f9180 skb->truesize=4524 skb->len=1500 skb->data_len=1308
skb=0xf77f9280 skb->truesize=1708 skb->len=1500 skb->data_len=0
skb=0xf77f9380 skb->truesize=1580 skb->len=1500 skb->data_len=172 *****
skb=0xf77f9080 skb->truesize=1708 skb->len=1500 skb->data_len=0
skb=0xf77f9e80 skb->truesize=1708 skb->len=1500 skb->data_len=0
skb=0xf77f9d80 skb->truesize=1196 skb->len=984 skb->data_len=0

This happens because ip_append_data() function does not add truesize
correctly. David-san posted the same type of patch as to ip_append_page()
function.

--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1172,6 +1172,7 @@ ssize_t	ip_append_page(struct sock *sk, struct page *page,
 
 		skb->len += len;
 		skb->data_len += len;
+		skb->truesize += len;
 		offset += len;
 		size -= len;
 	}

But modification is necessary for ip_append_data() function as follows.

--- ip_output.c.org     2008-01-22 18:19:12.000000000 -0500
+++ ip_output.c 2008-01-22 18:19:41.000000000 -0500
@@ -1037,6 +1037,7 @@ alloc_new_skb:
                        frag->size += copy;
                        skb->len += copy;
                        skb->data_len += copy;
+                       skb->truesize += copy;
                }
                offset += copy;
                length -= copy;

These both patches fix this problem.

Thanks,


Comment 15 Takahiro Yasui 2008-01-23 01:01:05 UTC
truesize_check() was originally called only from TCP because it has an
accounting feature and now UDP accounting patch introduced truesize_check()
for all protocols, including TCP. However, other protocols does not handle
truesize and no check like truesize_check() was done, therfore, I think that
this type of bug did not found till now.

By the way, there are two way to fix this problem.

First, only protocols with accounting feature do truesize_check() as TCP did.
It is safe because no precedure change was done for other protocols without
accounting feature.

Second, fix bugs in ip_appned_data() and ip_append_page() and make truesize
calculation correct. This is a positive solution because it is good to support
accounting feature for other protocols in the future.

Thanks,


Comment 16 Takahiro Yasui 2008-01-23 02:00:08 UTC
ip_append_data() patch on comment #14 is not good.
I will try to check again.


Comment 17 Herbert Xu 2008-01-23 02:08:27 UTC
Can you tell me in what way it's bad? Perhaps I can help since I'm in your
timezone :)

Does it fail with the same back trace still or is it different? Thanks!

Comment 18 David Miller 2008-01-23 02:11:45 UTC
The ip_append_data() patch, as noted, is not good.

A few lines previous, if ip_append_data() attached pages, it
adds PAGE_SIZE to skb->truesize so it should be OK.

Comment 19 Takahiro Yasui 2008-01-23 02:35:27 UTC
#17 I appreciate your comments. I found that that patch was logically bad
as David-san mensioned.

I am now investigating why the following data was produced.
I appreciate any hints.

skb=0xf77f9180 skb->truesize=4524 skb->len=1500 skb->data_len=1308
skb=0xf77f9280 skb->truesize=1708 skb->len=1500 skb->data_len=0
skb=0xf77f9380 skb->truesize=1580 skb->len=1500 skb->data_len=172 *****
skb=0xf77f9080 skb->truesize=1708 skb->len=1500 skb->data_len=0
skb=0xf77f9e80 skb->truesize=1708 skb->len=1500 skb->data_len=0
skb=0xf77f9d80 skb->truesize=1196 skb->len=984 skb->data_len=0

Thanks,


Comment 20 Herbert Xu 2008-01-23 02:41:45 UTC
Thanks Dave :) Yeah I'm looking at ip_fragment too.  I'll let you guys know if I
find anything.

Comment 21 Herbert Xu 2008-01-23 02:42:47 UTC
I think this line looks fishy:

        skb->truesize -= frag->truesize;

Should probably use frag->len or something like that.


Comment 22 Herbert Xu 2008-01-23 02:44:36 UTC
It looks like a false alarm since ip_push_pending_frames sets the overall
truesize to be the sum of all fragment truesizes.

Comment 23 Herbert Xu 2008-01-23 02:52:53 UTC
I get it.  The bug is in ip_append_data.  That page is shared for packets on the
same socket.  So only the first packet gets the PAGE_SIZE truesize increase, but
not the subsequent ones.

Comment 24 Herbert Xu 2008-01-23 02:55:21 UTC
Created attachment 292578 [details]
Fix truesize setting in ip_append_data

Here's a completely untested patch.

Comment 25 Herbert Xu 2008-01-23 03:01:04 UTC
Created attachment 292579 [details]
Fix truesize accounting in ip_append_data

Well at least it compiled :) Here's a patch which fixes the IPv6 path too.

Comment 26 David Miller 2008-01-23 03:08:57 UTC
That patch looks about right.

It's also a good thing to make the skb->truesize modifications
coincide directly with skb->data_len changes.  It makes auditing
this stuff that much easier.


Comment 27 Takahiro Yasui 2008-01-23 03:25:00 UTC
Thanks Herbert-san,

I tried your patch and the error message has gone.

ip_fragment: skb=0xf7952380 skb->truesize=1736 skb->len=1500 skb->data_len=1308
ip_fragment: skb=0xf7952080 skb->truesize=1708 skb->len=1500 skb->data_len=0
ip_fragment: skb=0xf712ea80 skb->truesize=1752 skb->len=1500 skb->data_len=172
ip_fragment: skb=0xf7952280 skb->truesize=1708 skb->len=1500 skb->data_len=0
ip_fragment: skb=0xf7952480 skb->truesize=1708 skb->len=1500 skb->data_len=0
ip_fragment: skb=0xf7952580 skb->truesize=1196 skb->len=984 skb->data_len=0

Also, the truesize of the first packets is 1736 and it looks good,
because it was 4572 without your patch as follows.

> skb=0xf77f9180 skb->truesize=4524 skb->len=1500 skb->data_len=1308

Thanks,


Comment 28 Takahiro Yasui 2008-01-23 03:50:51 UTC
(In reply to comment #23)
> I get it.  The bug is in ip_append_data.  That page is shared for packets on the
> same socket.  So only the first packet gets the PAGE_SIZE truesize increase, but
> not the subsequent ones.

Could you explain how your patch works? I suppose that the first skb allocs one
page and increase truesize by PAGE_SIZE and the second fragment does not need to
be increased if left space is enough. However, your patch increase in both case.
It would be appreciate your comments to undertand your patch.

Thanks,


Comment 29 Herbert Xu 2008-01-23 03:58:03 UTC
Yes my patch increases in both cases.  However, the first packet now gets
charged by how much of that page it has actually used instead of PAGE_SIZE. 
This makes sense because it's the very same page that then gets used by the
second packet, etc.

Comment 30 Takahiro Yasui 2008-01-23 04:23:52 UTC
(In reply to comment #29)
> Yes my patch increases in both cases.  However, the first packet now gets
> charged by how much of that page it has actually used instead of PAGE_SIZE. 
> This makes sense because it's the very same page that then gets used by the
> second packet, etc.

Thank you for the explanation on your patch. I found that your patch was solution
on this problem.


Comment 31 Takahiro Yasui 2008-01-23 06:26:17 UTC
We found that probem on this patch!

When a large data was copyed through NFS, NFS was hunguped. I suspect that
sk_wmem_alloc is not recovered because truesize was changed.

We think that we also have to move atomic_add(PAGE_SIZE, &sk->sk_wmem_alloc) at
the same position of truesize update like:

                        skb->data_len += copy;
                        skb->truesize += copy;
                        atomic_add(copy, &sk->sk_wmem_alloc);

Thanks,


Comment 32 David Miller 2008-01-23 06:40:16 UTC
Created attachment 292610 [details]
Updated upstream version of patch.

Good catch, I have updated Herbert's upstream submission of this
fix like so.

Comment 33 Takahiro Yasui 2008-01-23 06:42:42 UTC
I also suspect that the same thing happnes for ip_append_page patch.
If truesize is increased, sk_sk_wmem_alloc should be also increased,
so that sk_wmem_alloc would be reduced correctly by subtracting truesize
when socket buffer is released like:

size_t	ip_append_page(struct sock *sk, struct page *page,

 		skb->len += len;
 		skb->data_len += len;
+		skb->truesize += len;
+               atomic_add(len, &sk->sk_wmem_alloc);

 		offset += len;
 		size -= len;
 	}

Thanks,


Comment 34 Herbert Xu 2008-01-23 06:45:21 UTC
Heh, first Dave beat me to the sk_wmem_alloc fix, and then Yasui-san beat me to
the bit in ip_append_page :)

Comment 35 David Miller 2008-01-23 07:55:47 UTC
Thank you Takahiro-san, I will make this correction in the upstream
version of the bug fix and credit you.


Comment 36 Linda Wang 2008-01-23 19:09:02 UTC
bug fix patch posted 3 a.m 1/23/08 morning.. 

Thank you so much, Hideo, for staying up all night working on this!


Comment 37 RHEL Program Management 2008-01-23 19:17:00 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 39 Jeff Burke 2008-01-24 15:18:17 UTC
Running last night's kernel. We still saw intermittent failures. It looks like
it still failed while running in a DomU

Here are the failure message from the DomU.
SKB BUG: Invalid truesize (1264) len=7564, sizeof(sk_buff)=240
SKB BUG: Invalid truesize (623) len=7551, sizeof(sk_buff)=240

Also I saw the following with the kernel-PAE variant.
SKB BUG: Invalid truesize (568) len=7564, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (555) len=7551, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (568) len=7564, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (555) len=7551, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (568) len=7564, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (555) len=7551, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (568) len=7564, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (555) len=7551, sizeof(sk_buff)=172

Also one additional note. The test that this was seen with is different. It is
now seen with the audit-test-2088. Connectathon did not show the issue.

Comment 40 Hideo AOKI 2008-01-24 16:05:51 UTC
(In reply to comment #39)
> Running last night's kernel. We still saw intermittent failures. It looks like
> it still failed while running in a DomU
> 
> Also I saw the following with the kernel-PAE variant.
> 
> Also one additional note. The test that this was seen with is different. It is
> now seen with the audit-test-2088. Connectathon did not show the issue.

Thanks for the info. I'll investigate it.


Comment 41 Don Zickus 2008-01-24 16:09:08 UTC
in 2.6.18-74.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 42 Don Zickus 2008-01-24 16:11:57 UTC
setting back to ASSIGNED as the issue is still being investigated.  But the
original patch will still be included.

Comment 43 Herbert Xu 2008-01-24 21:39:40 UTC
Jeff, any chance you could retest this after adding a dump_stack() in the
truesize check function? Thanks!

Comment 44 Herbert Xu 2008-01-24 21:59:09 UTC
Created attachment 292869 [details]
Fix ip_fragment slow path truesize bug

I've found a bug that looks like it might be the one that caused this.	So
please try this patch and let me know if it goes away.	Thanks!

Comment 45 Hideo AOKI 2008-01-24 22:04:10 UTC
(In reply to comment #43)
> Jeff, any chance you could retest this after adding a dump_stack() in the
> truesize check function? Thanks!

I made the kernel. I'll retest this after I get a test server.

(In reply to comment #44)
> Created an attachment (id=292869) [edit]
> Fix ip_fragment slow path truesize bug
> 
> I've found a bug that looks like it might be the one that caused this.	So
> please try this patch and let me know if it goes away.	Thanks!

Great! I'll test this patch too.


Comment 46 Hideo AOKI 2008-01-25 05:22:36 UTC
(In reply to comment #45)
> (In reply to comment #43)
> > Jeff, any chance you could retest this after adding a dump_stack() in the
> > truesize check function? Thanks!
> 
> I made the kernel. I'll retest this after I get a test server.

I couldn't get stack trace, while I ran audit-test-2088 more than
3 and half hours.

I saw the bug message two times. However, at that time, the kernel was
normal 74.el5, unfortunately.

I guess that the bug doesn't depend on PAE and Xen DomU. Moreover, it
also might not depend on autid test. In my big guess, the bug could
happen when test machine receives specific packet like multicast DNS.

I'll do more test on 1/25 EST morning.

Thanks,


Comment 47 Hideo AOKI 2008-01-25 15:56:32 UTC
I got stack trace. Please note that this kernel doesn't include
UDP memory accounting patch set. I just added skb_truesize_check()
to sock_rfree() and dump_stack() to skb_truesize_check().  

Jan 25 08:47:31 host kernel: EXT3-fs warning (device sda2): ext3_block_to_path: blo
ck < 0
Jan 25 08:48:02 host kernel: SKB BUG: Invalid truesize (571) len=7567, sizeof(sk_bu
ff)=172
Jan 25 08:48:02 host kernel:  [<c05a7837>] sock_rfree+0x1e/0x2b
Jan 25 08:48:02 host kernel:  [<c05a9f94>] __kfree_skb+0x97/0xe3
Jan 25 08:48:02 host kernel:  [<c05c1c74>] netlink_recvmsg+0x132/0x281
Jan 25 08:48:02 host kernel:  [<c05a4bcf>] sock_recvmsg+0xe5/0x100
Jan 25 08:48:02 host kernel:  [<c041febf>] enqueue_task+0x29/0x39
Jan 25 08:48:02 host kernel:  [<c0436731>] autoremove_wake_function+0x0/0x2d
Jan 25 08:48:02 host kernel:  [<c0449931>] audit_sockaddr+0x35/0x78
Jan 25 08:48:02 host kernel:  [<c04e5871>] copy_to_user+0x31/0x48
Jan 25 08:48:02 host kernel:  [<c05a6050>] move_addr_to_user+0x50/0x68
Jan 25 08:48:02 host kernel:  [<c05a62fc>] sys_recvfrom+0xcf/0x137
Jan 25 08:48:02 host kernel:  [<c0404a2f>] do_notify_resume+0x53b/0x67d
Jan 25 08:48:02 host kernel:  [<c05a67c6>] sys_socketcall+0x135/0x19e
Jan 25 08:48:02 host kernel:  [<c0404eff>] syscall_call+0x7/0xb
Jan 25 08:48:02 host kernel:  =======================
Jan 25 08:48:02 host kernel: SKB BUG: Invalid truesize (558) len=7554, sizeof(sk_bu
ff)=172
Jan 25 08:48:02 host kernel:  [<c05a7837>] sock_rfree+0x1e/0x2b
Jan 25 08:48:02 host kernel:  [<c05a9f94>] __kfree_skb+0x97/0xe3
Jan 25 08:48:02 host kernel:  [<c05c1c74>] netlink_recvmsg+0x132/0x281
Jan 25 08:48:02 host kernel:  [<c05a4bcf>] sock_recvmsg+0xe5/0x100
Jan 25 08:48:02 host kernel:  [<c041febf>] enqueue_task+0x29/0x39
Jan 25 08:48:02 host kernel:  [<c0436731>] autoremove_wake_function+0x0/0x2d
Jan 25 08:48:02 host kernel:  [<c041fcf2>] __wake_up+0x2a/0x3d
Jan 25 08:48:02 host kernel:  [<c05a62fc>] sys_recvfrom+0xcf/0x137
Jan 25 08:48:02 host kernel:  [<c0404a2f>] do_notify_resume+0x53b/0x67d
Jan 25 08:48:03 host kernel:  [<c05a67c6>] sys_socketcall+0x135/0x19e
Jan 25 08:48:03 host kernel:  [<c0404eff>] syscall_call+0x7/0xb
Jan 25 08:48:03 host kernel:  =======================


Comment 48 Herbert Xu 2008-01-25 23:12:59 UTC
Hmm, I didn't know we had anything generating non-linear netlink packets.  Could
you please add a check to netlink_recvmsg to print out the offending netlink
family and other relevant details? Thanks!

Comment 49 Hideo AOKI 2008-01-26 01:04:21 UTC
(In reply to comment #48)
> Hmm, I didn't know we had anything generating non-linear netlink packets.  Could
> you please add a check to netlink_recvmsg to print out the offending netlink
> family and other relevant details? Thanks!

Thank you for the comment. Yes. We are doing similar investigation right now.
We modified skb_truesize_bug() to print detailed information of the socket.
Additionally,  we added skb_truesize_bug() to other functions in net/netlink.   

I'll report the result when I get the information. 

Comment 50 Takahiro Yasui 2008-01-27 02:28:06 UTC
Hi, got new information on this problem. Wrong packet was sent by kauditd_thread???

SKB BUG: Invalid truesize (1196) len=1782, sizeof(sk_buff)=172
skb=0xf786c980 data_len=0 data=0xe9d8e000 head=0xe9d8e000 tail=0xe9d8e6f6
end=0xe9d8e800
sk_type=3 sk_protocol=9
[<c05c0a71>] netlink_sendskb+0x1f/0x4b
[<c044564e>] kauditd_thread+0x44/0x11f
[<c0420753>] default_wake_function+0x0/0xc
[<c044560a>] kauditd_thread+0x0/0x11f
[<c043666d>] kthread+0xc0/0xeb
[<c04365ad>] kthread+0x0/0xeb
[<c0405c3b>] kernel_thread_helper+0x7/0x10
=======================
SKB BUG: Invalid truesize (1196) len=1782, sizeof(sk_buff)=172
skb=0xf786c980 data_len=0 data=0xe9d8e000 head=0xe9d8e000 tail=0xe9d8e6f6
end=0xe9d8e800
sk_type=3 sk_protocol=9
[<c05a78e0>] sock_rfree+0x1e/0x43
[<c05aa0f8>] __kfree_skb+0x97/0xe3
[<c05abfcd>] skb_free_datagram+0xa/0x29
[<c05c1da0>] netlink_recvmsg+0x132/0x281
[<c05a4ae7>] sock_recvmsg+0xe5/0x100
[<c041fed3>] enqueue_task+0x29/0x39
[<c0436739>] autoremove_wake_function+0x0/0x2d
[<c041fd06>] __wake_up+0x2a/0x3d
[<c05a6214>] sys_recvfrom+0xcf/0x137
[<c04179c6>] smp_send_reschedule+0x51/0x53
[<c05a66de>] sys_socketcall+0x135/0x19e
[<c0404eff>] syscall_call+0x7/0xb
=======================

sk_type=3 (SOCK_RAW), sk_protocol=9 (???)

Thanks,


Comment 51 Takahiro Yasui 2008-01-27 03:04:33 UTC
The first call trace and skb information was showed by truesize_check() function,
which we added in netlink_sendskb(). skb has already wrong value when it was
sent.

int netlink_sendskb(struct sock *sk, struct sk_buff *skb, int protocol)
{
        int len = skb->len;

        skb_truesize_check(skb);
        skb_queue_tail(&sk->sk_receive_queue, skb);

Also, we can guess that len is good but truesize is not from from head and tail
value. (tail - head = 0x6f6 (1782))

Thanks,

Comment 52 Takahiro Yasui 2008-01-27 03:26:29 UTC
truesize_check() was added also in netlink_trim() function, but no error
message was displayed there. Therefore, skb was changed in
netlink_unicast() function.

However, I could find codes which change skb, then, skb->truesize might be
changed during sleeping in netlink_attachskb() function.

int netlink_unicast(struct sock *ssk, struct sk_buff *skb, u32 pid, int nonblock)
{
        ...
        skb = netlink_trim(skb, gfp_any());     *** added truesize_check()

        timeo = sock_sndtimeo(ssk, nonblock);
retry:
        sk = netlink_getsockbypid(ssk, pid);
        if (IS_ERR(sk)) {
                kfree_skb(skb);
                return PTR_ERR(sk);
        }
        err = netlink_attachskb(sk, skb, nonblock, timeo, ssk);
        if (err == 1)
                goto retry;
        if (err)
                return err;

        return netlink_sendskb(sk, skb, ssk->sk_protocol);
}


Comment 53 Takahiro Yasui 2008-01-27 03:31:06 UTC
Sorry, typo.
However, I could not find codes which change skb, then, skb->truesize might
                 ~~~


Comment 54 Takahiro Yasui 2008-01-27 04:52:31 UTC
(In reply to comment #50)
> sk_type=3 (SOCK_RAW), sk_protocol=9 (???)

sk_protocol=9 (NETLINK_AUDIT)

This is a socket which was just created by audit.

audit_init()
 audit_sock = netlink_kernel_create(NETLINK_AUDIT, 0, audit_receive, THIS_MODULE);


Comment 55 Takahiro Yasui 2008-01-27 06:29:01 UTC
Sorry, I found conditions that skb_truesize_check() in netlink_trim() function
was not executed. So it is possible that skb has bad data when netlink_unicast()
is called. I will try another test to get more detailed information about skb
data in netlink_unicast() function.


Comment 56 Herbert Xu 2008-01-27 12:49:39 UTC
Created attachment 293075 [details]
Fix up truesize in audit_expand

I think I've found the audit bug.  It calls pskb_expand_head which does not
adjust truesize.  When pskb_expand_head is called before truesize is counted to
a socket, the caller needs to adjust truesize manually.

Please let me know if this patch fixes it or not.

Comment 57 Takahiro Yasui 2008-01-27 16:36:37 UTC
Thanks, Herbert-san, but I think you uploaded another patch instead of
fix audit_expand() function. Is it for crypt? Could you please upload
your right patch again?

I also suspect audit_log_start() function. audit allocates skb memory
by a unit of AUDIT_BUFSIZE=1024 and originally, skb is allocated here.

  audit_log_start()
    audit_buffer_alloc()
    alloc_skb(AUDIT_BUFSIZE=1024, ...)
  audit_log_end()

At this time, truesize is a sum of data size and sizeof(struct skb_buff).
and it explains truesize of these messages.

  SKB BUG: Invalid truesize (1264) len=7564, sizeof(sk_buff)=240 # 1264=1024+240
  SKB BUG: Invalid truesize (1196) len=1782, sizeof(sk_buff)=172 # 1196=1024+172

> SKB BUG: Invalid truesize (1196) len=1782, sizeof(sk_buff)=172
> skb=0xf786c980 data_len=0 data=0xe9d8e000 head=0xe9d8e000 tail=0xe9d8e6f6
> end=0xe9d8e800

This shows that skb size (0x800 (=2048) might be expanded by 1024, but
truesize is not increased.

So, I think that truesize need to be increased after pskb_expand_head()
function.

Thanks,


Comment 58 Herbert Xu 2008-01-27 21:17:55 UTC
Created attachment 293086 [details]
Fix audit_expand to increase truesize

Sorry, I'm on the road so I got confused about which machine I was working on
:)

Comment 59 Takahiro Yasui 2008-01-28 00:41:14 UTC
Thanks, Herbert-san. I believe that your patch would solve this problem.

This problem rarely happens. We are going to get a little more information
on this problem, so that we can easily reproduce this problem to confirm
your patch works well.

Thanks a lot,


Comment 60 Takahiro Yasui 2008-01-28 04:58:23 UTC
We got detailed information on how skb is extended by adding skb_truesize_check()
at the both begining/end of audit_expand() function. This result shows that
audit_log_n_string() function increases a skb size by 128 byte and
audit_log_vformat() function increases 1024 byte., also len is increased again
and again, but truesize is not increased.

We can guess that the first audit_expand() result marked by ***** is actually
the second call, because the initial size of skb is 1024 byte and it is already
increased by 128 byte by audit_expand() function. After the first audit_expand()
function call, truesize had wrong value.

--------------------------------------------------------------------------------
audit_log_n_string() -> audit_expand(ab, 0x80) *****
SKB BUG: Invalid truesize (1196) len=1146, sizeof(sk_buff)=172
IN:  data=0xecc25000 head=0xecc25000 tail=0xecc2547a end=0xecc25480 (end-head=0x480)
OUt: data=0xea6ba000 head=0xea6ba000 tail=0xea6ba47a end=0xea6ba500 (end-head=0x500)

audit_log_n_string() -> audit_expand(ab, 0x80)
SKB BUG: Invalid truesize (1196) len=1273, sizeof(sk_buff)=172
IN:  data=0xea6ba000 head=0xea6ba000 tail=0xea6ba4f9 end=0xea6ba500 (end-head=0x500)
OUT: data=0xecc25000 head=0xecc25000 tail=0xecc254f9 end=0xecc25580 (end-head=0x580)

audit_log_vformat() -> audit_expand(ab, AUDIT_BUFSIZ=0x400)
SKB BUG: Invalid truesize (1196) len=1407, sizeof(sk_buff)=172
IN:  data=0xecc25000 head=0xecc25000 tail=0xecc2557f end=0xecc25580 (end-head=0x580)
OUT: data=0xea6b1000 head=0xea6b1000 tail=0xea6b157f end=0xea6b1980 (end-head=0x980)

audit_log_vformat() -> audit_expand(ab, AUDIT_BUFSIZ=0x400)
SKB BUG: Invalid truesize (1196) len=2429, sizeof(sk_buff)=172
IN:  data=0xea6b1000 head=0xea6b1000 tail=0xea6b197d end=0xea6b1980 (end-head=0x980)
OUT: data=0xea750000 head=0xea750000 tail=0xea75097d end=0xea750d80 (end-head=0xd80)

...

audit_log_n_string() -> audit_expand(ab, 0x80)
SKB BUG: Invalid truesize (1196) len=6902, sizeof(sk_buff)=172
IN:  data=0xea8b4000 head=0xea8b4000 tail=0xea8b5af6 end=0xea8b5b00
(end-head=0x1b00)
OUT: data=0xea63a000 head=0xea63a000 tail=0xea63baf6 end=0xea63bb80
(end-head=0x1b80)

audit_log_n_string() -> audit_expand(ab, 0x80)
SKB BUG: Invalid truesize (1196) len=7028, sizeof(sk_buff)=172
IN:  data=0xea63a000 head=0xea63a000 tail=0xea63bb74 end=0xea63bb80
(end-head=0x1b80)
OUT: data=0xea8b4000 head=0xea8b4000 tail=0xea8b5b74 end=0xea8b5c00
(end-head=0x1c00)

audit_log_vformat() -> audit_expand(ab, AUDIT_BUFSIZ=1024)
SKB BUG: Invalid truesize (1196) len=7166, sizeof(sk_buff)=172
IN:  data=0xea8b4000 head=0xea8b4000 tail=0xea8b5bfe end=0xea8b5c00
(end-head=0x1c00)
OUT: data=0xea6f8000 head=0xea6f8000 tail=0xea6f9bfe end=0xea6fa000
(end-head=0x2000)
--------------------------------------------------------------------------------
* IN:  shows skb data passed to audit_expand() function
* OUT: shows skb data modified by audit_expand() function

If we apply the patch made by Herbert-san, truesize would be increased by
extended size of skb and the skb would be changed as follows. As a result,
this problem will gone.

--------------------------------------------------------------------------------
audit_log_n_string() -> audit_expand(ab, 0x80)
IN:  data=0xXXXXXXXX head=0xXXXXXXXX tail=0xXXXXXXXX end=0xXXXXXXXX (end-head=0x400)
     truesize (1196) len=XXXX, sizeof(sk_buff)=172
OUT: data=0xea6ba000 head=0xea6ba000 tail=0xea6ba47a end=0xea6ba500 (end-head=0x480)
     truesize (1324: 1196+128)

audit_log_n_string() -> audit_expand(ab, 0x80)
IN:  data=0xecc25000 head=0xecc25000 tail=0xecc2547a end=0xecc25480 (end-head=0x480)
     truesize (1324) len=1146, sizeof(sk_buff)=172
OUT: data=0xea6ba000 head=0xea6ba000 tail=0xea6ba47a end=0xea6ba500 (end-head=0x500)
     truesize (1452: 1324+128)
...
--------------------------------------------------------------------------------

We will continue tests and check if other possibility exists.

Thanks,


Comment 61 Hideo AOKI 2008-01-30 00:01:52 UTC
(In reply to comment #58)
> Created an attachment (id=293086) [edit]
> Fix audit_expand to increase truesize

I test your patch almost 21 hours using audit-test-2088. And I don't see any
SKB BUG message, while the message usually appears within 15 or 16 hours.
So, we can say that your patch fixes the SKB BUG in audit code.

Comment 62 Don Zickus 2008-02-01 21:11:07 UTC
in 2.6.18-77.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 65 errata-xmlrpc 2008-05-21 15:07:03 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html


Comment 66 Ahmed Kamal 2008-10-13 08:53:45 UTC
Well, I am still facing this bug on this kernel version on centos
2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:33:52 EDT 2008 i686 i686 i386 GNU/Linux

Getting zillions of 

Oct 13 10:52:00 routerng kernel: SKB BUG: Invalid truesize (300) len=136, sizeof(sk_buff)=172
Oct 13 10:52:01 routerng kernel: SKB BUG: Invalid truesize (1452) len=1286, sizeof(sk_buff)=172
Oct 13 10:52:01 routerng kernel: SKB BUG: Invalid truesize (300) len=136, sizeof(sk_buff)=172

Comment 67 Hideo AOKI 2008-10-17 17:10:17 UTC
(In reply to comment #66)
> Well, I am still facing this bug on this kernel version on centos
> 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:33:52 EDT 2008 i686 i686 i386 GNU/Linux
> 
> Getting zillions of 
> 
> Oct 13 10:52:00 routerng kernel: SKB BUG: Invalid truesize (300) len=136,
> sizeof(sk_buff)=172
> Oct 13 10:52:01 routerng kernel: SKB BUG: Invalid truesize (1452) len=1286,
> sizeof(sk_buff)=172
> Oct 13 10:52:01 routerng kernel: SKB BUG: Invalid truesize (300) len=136,
> sizeof(sk_buff)=172

What kind of network device and protocol do you use on the machine?

Comment 68 Doncho Gunchev 2008-10-24 17:58:31 UTC
I see (In reply to comment #67)
> (In reply to comment #66)
> > Well, I am still facing this bug on this kernel version on centos
> > 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:33:52 EDT 2008 i686 i686 i386 GNU/Linux
> > 
> > Getting zillions of 
> > 
> > Oct 13 10:52:00 routerng kernel: SKB BUG: Invalid truesize (300) len=136,
> > sizeof(sk_buff)=172
> ...
> 
> What kind of network device and protocol do you use on the machine?

On a CentOS machine running 2.6.18-92.1.13.el5 (x86_64) 

Oct 24 20:43:21 centos kernel: SKB BUG: Invalid truesize (1648) len=1420, sizeof(sk_buff)=240
Oct 24 20:43:21 centos kernel: SKB BUG: Invalid truesize (1648) len=1420, sizeof(sk_buff)=240
Oct 24 20:43:21 centos kernel: SKB BUG: Invalid truesize (1392) len=1162, sizeof(sk_buff)=240
Oct 24 20:43:21 centos kernel: SKB BUG: Invalid truesize (368) len=131, sizeof(sk_buff)=240

Network hardware (lspci):
00:19.0 Ethernet controller: Intel Corporation 82566DM Gigabit Network Connection (rev 02)
06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)
06:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10)

This stops as soon as I stop 'ntop'. tcpdump on any cards produces the same message (given traffic). The machine is not locking up or something.

Comment 69 Igor Neves 2009-03-03 09:48:18 UTC
Hi,

In think this problem it's not solved.

I'm seeing this error with kernel "2.6.18-131.el5xen" from dzickus that i think will be in redhat 5.3.

SKB BUG: Invalid truesize (1580) len=1420, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1420, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (1580) len=1420, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (556) len=387, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (556) len=387, sizeof(sk_buff)=172
SKB BUG: Invalid truesize (556) len=387, sizeof(sk_buff)=172

Can i do something about it?

Comment 70 Takahiro Yasui 2009-03-03 16:09:19 UTC
The truesize variable in SKB is changed in many functions, and it needs to
be handled correctly in all functions. I fixed the issue which happens under
the condition reported in this bugzilla, but there still seems to be a place
where the truesize variable is handled wrongly.

I ask you to open a new bugzilla as a separate issue because I guess that
the root cause is different from the one reported in this bugzilla. And,
in the new bugzilla, please report the detailed information about how to
reproduce this issue.

"SKB BUG" message is not helpful to investigate where the bug is, because
it is displayed at the point where SKB is closed, but is not displayed where
the truesize variable was handled wrongly.

Thanks,

Comment 71 Igor Neves 2009-03-03 16:35:42 UTC
Ok, I can do that, i understand your point of view, and it makes perfect sense.

I just have one problem, I don't know how to reproduce it and I don't know how to debug it. I have the messages in dmesg but the machine it's still working, no panic, nothing.

I guess it happen when I get a lot of load in the machine, but I can't exacly reproduce it. 

What should I say when opening the new bug report that will help solving this, another then profile my hardware and show the message that I get in dmesg?

Thanks,

Comment 72 Takahiro Yasui 2009-03-03 17:08:13 UTC
Yes, could you please report information about your system environment?
I would like to know in which kernel, dom0 or domU produced this messages.
sosreport will be useful to tell it.

Thanks,

Comment 73 Igor Neves 2009-03-05 11:49:18 UTC
My problem it's being followed in here:

https://bugzilla.redhat.com/show_bug.cgi?id=488710

Thanks,


Note You need to log in before you can comment on or make changes to this bug.