Description of problem: With Kernel 2.6.18-70.el5 running the RHTS connectathon NFS UDP test suite the system reports SKB BUG: messages in the log. Version-Release number of selected component (if applicable): 2.6.18-70.el5 How reproducible: Always Steps to Reproduce: 1. Install a system with RHEL5.U1 or RHEL5.2-Server-20080117.1 2. Install 2.6.18-70.el5 3. Run the RHTS connectathon test suite. Actual results: SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1514, sizeof(sk_buff)=172 Expected results: System should not generate BUG messages during normal operations Additional info: - [net] udp: update infiniband driver (Hideo AOKI ) [223593] - [net] udp: add memory accounting (Hideo AOKI ) [223593] - [net] udp: new accounting interface (Hideo AOKI ) [223593]
We need to determine whether this symptom is caused by a latent bug exposed by this patch-set. To do this, could you please redo the test after adding a dump_stack() call to skb_truesize_check() when it prints the warning? That'll tell us whether it's a TX skb or an RX skb, and if it's the RX skb, please tell me the NIC in use. Thanks!
Jeff, can we imply that while -70 fails, -69 running the same test suite works properly? I'd just like to be sure, since -68 has some potential candidates for causing this problem as well. Also, do you know if this failure is system specific? If so, what kind of nic is on the system you're testing on? Thanks!
As far as I know, this bug is reproduced when we use nfs mount and tcpdump. I'll try to re-test using a kernel with dump_stack() call.
The problem is that, as noted sock_rfree() now does truesize validation so it is a latent bug being exposed by the UDP memory accounting patches. I suspect some part of the NFS I/O path, or IP write path itself, is adding pages to an SKB frag list and not bumping skb->truesize properly. That is what needs to be fixed. I suspect it's ip_append_page() that's causing the problems. Unlike ip_append_data() it doesn't bump the skb->truesize properly. sendpage isn't used on UDP by many things, the xprtsock and svcsock paths of SUNRPC being one of them. The fix is simple, add a skb->truesize += len; right after the skb->data_len += len; line. Fortunately (or unfortunately, depending upon your perspective :-) there isn't an IPV6 side code path and therefore no need to look there for another instance of the same bug.
Hello David, Thank you so much for your comments. I'll test the fix.
(In reply to comment #6) I tested the fix. However, I still see the SKB BUG message. I think that David's fix is correct but the fix will not solve this bug.
(In reply to comment #7) > (In reply to comment #6) > > I tested the fix. However, I still see the SKB BUG message. > I think that David's fix is correct but the fix will not solve this bug. The bug is found during releasing receive buffer. On the other hand, ip_append_pages() is used send buffer. Then, I thought that fix of ip_append_pages() wasn't related to root cause of this bug.
perhaps it would be worthwhile to do a test build and pepper the receive pathwith truesize checks? start at netif_rx, and check at the top of every routine through to udp_rcv and tcp_rcv?
I probably found proper fix of this bug. Here is a trace using systemtap. 0xc05aa194 : skb_truesize_bug+0x1/0x1c [] 0xc05a7ffc : sock_rfree+0x1e/0x43 [] 0xc05aa7c4 : __kfree_skb+0x97/0xe3 [] 0xc05ac689 : skb_free_datagram+0xa/0x29 [] 0xc0603e98 : packet_recvmsg+0x1ac/0x1b8 [] 0xc05a5203 : sock_recvmsg+0xe5/0x100 [] 0xc04362e5 : autoremove_wake_function+0x0/0x2d [] 0xc052f934 : n_tty_receive_buf+0xc39/0xcab [] 0xc0456217 : __generic_file_aio_read+0x166/0x198 [] 0xc0454e1f : file_read_actor+0x0/0xd1 [] 0xc05a6930 : sys_recvfrom+0xcf/0x137 [] 0xc045913a : __pagevec_free+0x14/0x1a [] 0xc045b13c : release_pages+0x11f/0x127 [] 0xc0436489 : remove_wait_queue+0x16/0x25 [] 0xc041f87e : __wake_up+0x2a/0x3d [] 0xc05a6dfa : sys_socketcall+0x135/0x19e [] 0xc0404eff : syscall_call+0x7/0xb [] It seems that the bug message appears when law socket calls sock_rfree(). Memory accounting is a feature for some stream protocols and UDP. Law socket isn't supported currently. So, skb_truesize_check() should be called only if memory accounting is supported. We tested that this fix passed the test. I'll confirm brew build.
(In reply to comment #10) > It seems that the bug message appears when law socket calls sock_rfree(). > Memory accounting is a feature for some stream protocols and UDP. Law > socket isn't supported currently. So, skb_truesize_check() should be > called only if memory accounting is supported. This wasn't Law but RAW.
You said raw sockets, but your backtrace indicates an AF_PACKET socket, so which is it? Also, is this packet coming in from the network or is it locally generated? If the former please tell me what NIC it is as it's probably buggy. Thanks!
(In reply to comment #5) > I suspect some part of the NFS I/O path, or IP write path itself, is > adding pages to an SKB frag list and not bumping skb->truesize properly. > That is what needs to be fixed. > > I suspect it's ip_append_page() that's causing the problems. > Unlike ip_append_data() it doesn't bump the skb->truesize properly. > > sendpage isn't used on UDP by many things, the xprtsock and svcsock > paths of SUNRPC being one of them. > > The fix is simple, add a skb->truesize += len; right after the > skb->data_len += len; line. To solve the problem, we found that ip_append_data() needed this fix.
ip_append_data() did not set truesize correctly, then, this problem happend. Packets are sent by packet_rcv() function and call trace when packet_rcv() is called is as follows. 0xc060480d : packet_rcv+0x1/0x2f1 [] 0xc05aead9 : dev_hard_start_xmit+0xd6/0x1ee [] 0xc05bc886 : __qdisc_run+0xe4/0x19a [] 0xc05b0395 : dev_queue_xmit+0x144/0x24b [] 0xc05b497e : neigh_resolve_output+0x1ba/0x1e4 [] 0xc05cc8b7 : ip_finish_output2+0x164/0x19a [] 0xc05cc1cd : ip_fragment+0x30f/0x6ba [] 0xc05cc753 : ip_finish_output2+0x0/0x19a [] 0xc05cb972 : ip_append_data+0x5d0/0x945 [] 0xc05cd869 : ip_output+0xce/0x265 [] 0xc05cb2ec : ip_push_pending_frames+0x2e9/0x39f [] 0xc05e3ee9 : udp_push_pending_frames+0x1ed/0x20e [] 0xc05e47ef : udp_sendmsg+0x40c/0x4d7 [] 0xc0608e1f : _spin_lock_bh+0x8/0x18 [] 0xc0608e1f : _spin_lock_bh+0x8/0x18 [] 0xc05a7943 : release_sock+0xc/0x91 [] 0xc05e47f8 : udp_sendmsg+0x415/0x4d7 [] 0xc05e9da5 : inet_sendmsg+0x35/0x3f [] 0xc05a52ec : sock_sendmsg+0xce/0xe8 [] 0xc04362e5 : autoremove_wake_function+0x0/0x2d [] 0xc04362e5 : autoremove_wake_function+0x0/0x2d [] 0xc045e6d9 : set_page_address+0xa8/0x121 [] 0xc05a6e8a : kernel_sendmsg+0x27/0x35 [] 0xc05a7b1c : sock_no_sendpage+0x4d/0x60 [] 0xf8d2b115 : xs_udp_send_request+0x1ae/0x2a4 [sunrpc] 0xc05a7acf : sock_no_sendpage+0x0/0x60 [] 0xf8d29059 : xprt_transmit+0xbf/0x19e [sunrpc] 0xf8e2cee5 : nfs3_xdr_writeargs+0x0/0x7a [nfs] 0xf8d2737f : call_transmit+0x19b/0x1c8 [sunrpc] 0xf8d2bf17 : __rpc_execute+0x7a/0x1f8 [sunrpc] 0xf8e287a2 : nfs_execute_write+0x35/0x49 [nfs] 0xf8e2999c : nfs_flush_one+0xb7/0xfd [nfs] 0xf8e28536 : nfs_flush_list+0x87/0xcf [nfs] 0xf8e298e5 : nfs_flush_one+0x0/0xfd [nfs] 0xf8e29292 : nfs_sync_inode_wait+0xc5/0x1ab [nfs] 0xf8e20a93 : nfs_file_flush+0x48/0x7e [nfs] 0xc046ef44 : filp_close+0x2f/0x54 [] 0xc0404eff : syscall_call+0x7/0xb [] ip_fragment() function sets data for each packets, but truesize of packets are showed as follows, and the truesize in the third packet is already wrong when it is sent. skb=0xf77f9180 skb->truesize=4524 skb->len=1500 skb->data_len=1308 skb=0xf77f9280 skb->truesize=1708 skb->len=1500 skb->data_len=0 skb=0xf77f9380 skb->truesize=1580 skb->len=1500 skb->data_len=172 ***** skb=0xf77f9080 skb->truesize=1708 skb->len=1500 skb->data_len=0 skb=0xf77f9e80 skb->truesize=1708 skb->len=1500 skb->data_len=0 skb=0xf77f9d80 skb->truesize=1196 skb->len=984 skb->data_len=0 This happens because ip_append_data() function does not add truesize correctly. David-san posted the same type of patch as to ip_append_page() function. --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1172,6 +1172,7 @@ ssize_t ip_append_page(struct sock *sk, struct page *page, skb->len += len; skb->data_len += len; + skb->truesize += len; offset += len; size -= len; } But modification is necessary for ip_append_data() function as follows. --- ip_output.c.org 2008-01-22 18:19:12.000000000 -0500 +++ ip_output.c 2008-01-22 18:19:41.000000000 -0500 @@ -1037,6 +1037,7 @@ alloc_new_skb: frag->size += copy; skb->len += copy; skb->data_len += copy; + skb->truesize += copy; } offset += copy; length -= copy; These both patches fix this problem. Thanks,
truesize_check() was originally called only from TCP because it has an accounting feature and now UDP accounting patch introduced truesize_check() for all protocols, including TCP. However, other protocols does not handle truesize and no check like truesize_check() was done, therfore, I think that this type of bug did not found till now. By the way, there are two way to fix this problem. First, only protocols with accounting feature do truesize_check() as TCP did. It is safe because no precedure change was done for other protocols without accounting feature. Second, fix bugs in ip_appned_data() and ip_append_page() and make truesize calculation correct. This is a positive solution because it is good to support accounting feature for other protocols in the future. Thanks,
ip_append_data() patch on comment #14 is not good. I will try to check again.
Can you tell me in what way it's bad? Perhaps I can help since I'm in your timezone :) Does it fail with the same back trace still or is it different? Thanks!
The ip_append_data() patch, as noted, is not good. A few lines previous, if ip_append_data() attached pages, it adds PAGE_SIZE to skb->truesize so it should be OK.
#17 I appreciate your comments. I found that that patch was logically bad as David-san mensioned. I am now investigating why the following data was produced. I appreciate any hints. skb=0xf77f9180 skb->truesize=4524 skb->len=1500 skb->data_len=1308 skb=0xf77f9280 skb->truesize=1708 skb->len=1500 skb->data_len=0 skb=0xf77f9380 skb->truesize=1580 skb->len=1500 skb->data_len=172 ***** skb=0xf77f9080 skb->truesize=1708 skb->len=1500 skb->data_len=0 skb=0xf77f9e80 skb->truesize=1708 skb->len=1500 skb->data_len=0 skb=0xf77f9d80 skb->truesize=1196 skb->len=984 skb->data_len=0 Thanks,
Thanks Dave :) Yeah I'm looking at ip_fragment too. I'll let you guys know if I find anything.
I think this line looks fishy: skb->truesize -= frag->truesize; Should probably use frag->len or something like that.
It looks like a false alarm since ip_push_pending_frames sets the overall truesize to be the sum of all fragment truesizes.
I get it. The bug is in ip_append_data. That page is shared for packets on the same socket. So only the first packet gets the PAGE_SIZE truesize increase, but not the subsequent ones.
Created attachment 292578 [details] Fix truesize setting in ip_append_data Here's a completely untested patch.
Created attachment 292579 [details] Fix truesize accounting in ip_append_data Well at least it compiled :) Here's a patch which fixes the IPv6 path too.
That patch looks about right. It's also a good thing to make the skb->truesize modifications coincide directly with skb->data_len changes. It makes auditing this stuff that much easier.
Thanks Herbert-san, I tried your patch and the error message has gone. ip_fragment: skb=0xf7952380 skb->truesize=1736 skb->len=1500 skb->data_len=1308 ip_fragment: skb=0xf7952080 skb->truesize=1708 skb->len=1500 skb->data_len=0 ip_fragment: skb=0xf712ea80 skb->truesize=1752 skb->len=1500 skb->data_len=172 ip_fragment: skb=0xf7952280 skb->truesize=1708 skb->len=1500 skb->data_len=0 ip_fragment: skb=0xf7952480 skb->truesize=1708 skb->len=1500 skb->data_len=0 ip_fragment: skb=0xf7952580 skb->truesize=1196 skb->len=984 skb->data_len=0 Also, the truesize of the first packets is 1736 and it looks good, because it was 4572 without your patch as follows. > skb=0xf77f9180 skb->truesize=4524 skb->len=1500 skb->data_len=1308 Thanks,
(In reply to comment #23) > I get it. The bug is in ip_append_data. That page is shared for packets on the > same socket. So only the first packet gets the PAGE_SIZE truesize increase, but > not the subsequent ones. Could you explain how your patch works? I suppose that the first skb allocs one page and increase truesize by PAGE_SIZE and the second fragment does not need to be increased if left space is enough. However, your patch increase in both case. It would be appreciate your comments to undertand your patch. Thanks,
Yes my patch increases in both cases. However, the first packet now gets charged by how much of that page it has actually used instead of PAGE_SIZE. This makes sense because it's the very same page that then gets used by the second packet, etc.
(In reply to comment #29) > Yes my patch increases in both cases. However, the first packet now gets > charged by how much of that page it has actually used instead of PAGE_SIZE. > This makes sense because it's the very same page that then gets used by the > second packet, etc. Thank you for the explanation on your patch. I found that your patch was solution on this problem.
We found that probem on this patch! When a large data was copyed through NFS, NFS was hunguped. I suspect that sk_wmem_alloc is not recovered because truesize was changed. We think that we also have to move atomic_add(PAGE_SIZE, &sk->sk_wmem_alloc) at the same position of truesize update like: skb->data_len += copy; skb->truesize += copy; atomic_add(copy, &sk->sk_wmem_alloc); Thanks,
Created attachment 292610 [details] Updated upstream version of patch. Good catch, I have updated Herbert's upstream submission of this fix like so.
I also suspect that the same thing happnes for ip_append_page patch. If truesize is increased, sk_sk_wmem_alloc should be also increased, so that sk_wmem_alloc would be reduced correctly by subtracting truesize when socket buffer is released like: size_t ip_append_page(struct sock *sk, struct page *page, skb->len += len; skb->data_len += len; + skb->truesize += len; + atomic_add(len, &sk->sk_wmem_alloc); offset += len; size -= len; } Thanks,
Heh, first Dave beat me to the sk_wmem_alloc fix, and then Yasui-san beat me to the bit in ip_append_page :)
Thank you Takahiro-san, I will make this correction in the upstream version of the bug fix and credit you.
bug fix patch posted 3 a.m 1/23/08 morning.. Thank you so much, Hideo, for staying up all night working on this!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Running last night's kernel. We still saw intermittent failures. It looks like it still failed while running in a DomU Here are the failure message from the DomU. SKB BUG: Invalid truesize (1264) len=7564, sizeof(sk_buff)=240 SKB BUG: Invalid truesize (623) len=7551, sizeof(sk_buff)=240 Also I saw the following with the kernel-PAE variant. SKB BUG: Invalid truesize (568) len=7564, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (555) len=7551, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (568) len=7564, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (555) len=7551, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (568) len=7564, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (555) len=7551, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (568) len=7564, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (555) len=7551, sizeof(sk_buff)=172 Also one additional note. The test that this was seen with is different. It is now seen with the audit-test-2088. Connectathon did not show the issue.
(In reply to comment #39) > Running last night's kernel. We still saw intermittent failures. It looks like > it still failed while running in a DomU > > Also I saw the following with the kernel-PAE variant. > > Also one additional note. The test that this was seen with is different. It is > now seen with the audit-test-2088. Connectathon did not show the issue. Thanks for the info. I'll investigate it.
in 2.6.18-74.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
setting back to ASSIGNED as the issue is still being investigated. But the original patch will still be included.
Jeff, any chance you could retest this after adding a dump_stack() in the truesize check function? Thanks!
Created attachment 292869 [details] Fix ip_fragment slow path truesize bug I've found a bug that looks like it might be the one that caused this. So please try this patch and let me know if it goes away. Thanks!
(In reply to comment #43) > Jeff, any chance you could retest this after adding a dump_stack() in the > truesize check function? Thanks! I made the kernel. I'll retest this after I get a test server. (In reply to comment #44) > Created an attachment (id=292869) [edit] > Fix ip_fragment slow path truesize bug > > I've found a bug that looks like it might be the one that caused this. So > please try this patch and let me know if it goes away. Thanks! Great! I'll test this patch too.
(In reply to comment #45) > (In reply to comment #43) > > Jeff, any chance you could retest this after adding a dump_stack() in the > > truesize check function? Thanks! > > I made the kernel. I'll retest this after I get a test server. I couldn't get stack trace, while I ran audit-test-2088 more than 3 and half hours. I saw the bug message two times. However, at that time, the kernel was normal 74.el5, unfortunately. I guess that the bug doesn't depend on PAE and Xen DomU. Moreover, it also might not depend on autid test. In my big guess, the bug could happen when test machine receives specific packet like multicast DNS. I'll do more test on 1/25 EST morning. Thanks,
I got stack trace. Please note that this kernel doesn't include UDP memory accounting patch set. I just added skb_truesize_check() to sock_rfree() and dump_stack() to skb_truesize_check(). Jan 25 08:47:31 host kernel: EXT3-fs warning (device sda2): ext3_block_to_path: blo ck < 0 Jan 25 08:48:02 host kernel: SKB BUG: Invalid truesize (571) len=7567, sizeof(sk_bu ff)=172 Jan 25 08:48:02 host kernel: [<c05a7837>] sock_rfree+0x1e/0x2b Jan 25 08:48:02 host kernel: [<c05a9f94>] __kfree_skb+0x97/0xe3 Jan 25 08:48:02 host kernel: [<c05c1c74>] netlink_recvmsg+0x132/0x281 Jan 25 08:48:02 host kernel: [<c05a4bcf>] sock_recvmsg+0xe5/0x100 Jan 25 08:48:02 host kernel: [<c041febf>] enqueue_task+0x29/0x39 Jan 25 08:48:02 host kernel: [<c0436731>] autoremove_wake_function+0x0/0x2d Jan 25 08:48:02 host kernel: [<c0449931>] audit_sockaddr+0x35/0x78 Jan 25 08:48:02 host kernel: [<c04e5871>] copy_to_user+0x31/0x48 Jan 25 08:48:02 host kernel: [<c05a6050>] move_addr_to_user+0x50/0x68 Jan 25 08:48:02 host kernel: [<c05a62fc>] sys_recvfrom+0xcf/0x137 Jan 25 08:48:02 host kernel: [<c0404a2f>] do_notify_resume+0x53b/0x67d Jan 25 08:48:02 host kernel: [<c05a67c6>] sys_socketcall+0x135/0x19e Jan 25 08:48:02 host kernel: [<c0404eff>] syscall_call+0x7/0xb Jan 25 08:48:02 host kernel: ======================= Jan 25 08:48:02 host kernel: SKB BUG: Invalid truesize (558) len=7554, sizeof(sk_bu ff)=172 Jan 25 08:48:02 host kernel: [<c05a7837>] sock_rfree+0x1e/0x2b Jan 25 08:48:02 host kernel: [<c05a9f94>] __kfree_skb+0x97/0xe3 Jan 25 08:48:02 host kernel: [<c05c1c74>] netlink_recvmsg+0x132/0x281 Jan 25 08:48:02 host kernel: [<c05a4bcf>] sock_recvmsg+0xe5/0x100 Jan 25 08:48:02 host kernel: [<c041febf>] enqueue_task+0x29/0x39 Jan 25 08:48:02 host kernel: [<c0436731>] autoremove_wake_function+0x0/0x2d Jan 25 08:48:02 host kernel: [<c041fcf2>] __wake_up+0x2a/0x3d Jan 25 08:48:02 host kernel: [<c05a62fc>] sys_recvfrom+0xcf/0x137 Jan 25 08:48:02 host kernel: [<c0404a2f>] do_notify_resume+0x53b/0x67d Jan 25 08:48:03 host kernel: [<c05a67c6>] sys_socketcall+0x135/0x19e Jan 25 08:48:03 host kernel: [<c0404eff>] syscall_call+0x7/0xb Jan 25 08:48:03 host kernel: =======================
Hmm, I didn't know we had anything generating non-linear netlink packets. Could you please add a check to netlink_recvmsg to print out the offending netlink family and other relevant details? Thanks!
(In reply to comment #48) > Hmm, I didn't know we had anything generating non-linear netlink packets. Could > you please add a check to netlink_recvmsg to print out the offending netlink > family and other relevant details? Thanks! Thank you for the comment. Yes. We are doing similar investigation right now. We modified skb_truesize_bug() to print detailed information of the socket. Additionally, we added skb_truesize_bug() to other functions in net/netlink. I'll report the result when I get the information.
Hi, got new information on this problem. Wrong packet was sent by kauditd_thread??? SKB BUG: Invalid truesize (1196) len=1782, sizeof(sk_buff)=172 skb=0xf786c980 data_len=0 data=0xe9d8e000 head=0xe9d8e000 tail=0xe9d8e6f6 end=0xe9d8e800 sk_type=3 sk_protocol=9 [<c05c0a71>] netlink_sendskb+0x1f/0x4b [<c044564e>] kauditd_thread+0x44/0x11f [<c0420753>] default_wake_function+0x0/0xc [<c044560a>] kauditd_thread+0x0/0x11f [<c043666d>] kthread+0xc0/0xeb [<c04365ad>] kthread+0x0/0xeb [<c0405c3b>] kernel_thread_helper+0x7/0x10 ======================= SKB BUG: Invalid truesize (1196) len=1782, sizeof(sk_buff)=172 skb=0xf786c980 data_len=0 data=0xe9d8e000 head=0xe9d8e000 tail=0xe9d8e6f6 end=0xe9d8e800 sk_type=3 sk_protocol=9 [<c05a78e0>] sock_rfree+0x1e/0x43 [<c05aa0f8>] __kfree_skb+0x97/0xe3 [<c05abfcd>] skb_free_datagram+0xa/0x29 [<c05c1da0>] netlink_recvmsg+0x132/0x281 [<c05a4ae7>] sock_recvmsg+0xe5/0x100 [<c041fed3>] enqueue_task+0x29/0x39 [<c0436739>] autoremove_wake_function+0x0/0x2d [<c041fd06>] __wake_up+0x2a/0x3d [<c05a6214>] sys_recvfrom+0xcf/0x137 [<c04179c6>] smp_send_reschedule+0x51/0x53 [<c05a66de>] sys_socketcall+0x135/0x19e [<c0404eff>] syscall_call+0x7/0xb ======================= sk_type=3 (SOCK_RAW), sk_protocol=9 (???) Thanks,
The first call trace and skb information was showed by truesize_check() function, which we added in netlink_sendskb(). skb has already wrong value when it was sent. int netlink_sendskb(struct sock *sk, struct sk_buff *skb, int protocol) { int len = skb->len; skb_truesize_check(skb); skb_queue_tail(&sk->sk_receive_queue, skb); Also, we can guess that len is good but truesize is not from from head and tail value. (tail - head = 0x6f6 (1782)) Thanks,
truesize_check() was added also in netlink_trim() function, but no error message was displayed there. Therefore, skb was changed in netlink_unicast() function. However, I could find codes which change skb, then, skb->truesize might be changed during sleeping in netlink_attachskb() function. int netlink_unicast(struct sock *ssk, struct sk_buff *skb, u32 pid, int nonblock) { ... skb = netlink_trim(skb, gfp_any()); *** added truesize_check() timeo = sock_sndtimeo(ssk, nonblock); retry: sk = netlink_getsockbypid(ssk, pid); if (IS_ERR(sk)) { kfree_skb(skb); return PTR_ERR(sk); } err = netlink_attachskb(sk, skb, nonblock, timeo, ssk); if (err == 1) goto retry; if (err) return err; return netlink_sendskb(sk, skb, ssk->sk_protocol); }
Sorry, typo. However, I could not find codes which change skb, then, skb->truesize might ~~~
(In reply to comment #50) > sk_type=3 (SOCK_RAW), sk_protocol=9 (???) sk_protocol=9 (NETLINK_AUDIT) This is a socket which was just created by audit. audit_init() audit_sock = netlink_kernel_create(NETLINK_AUDIT, 0, audit_receive, THIS_MODULE);
Sorry, I found conditions that skb_truesize_check() in netlink_trim() function was not executed. So it is possible that skb has bad data when netlink_unicast() is called. I will try another test to get more detailed information about skb data in netlink_unicast() function.
Created attachment 293075 [details] Fix up truesize in audit_expand I think I've found the audit bug. It calls pskb_expand_head which does not adjust truesize. When pskb_expand_head is called before truesize is counted to a socket, the caller needs to adjust truesize manually. Please let me know if this patch fixes it or not.
Thanks, Herbert-san, but I think you uploaded another patch instead of fix audit_expand() function. Is it for crypt? Could you please upload your right patch again? I also suspect audit_log_start() function. audit allocates skb memory by a unit of AUDIT_BUFSIZE=1024 and originally, skb is allocated here. audit_log_start() audit_buffer_alloc() alloc_skb(AUDIT_BUFSIZE=1024, ...) audit_log_end() At this time, truesize is a sum of data size and sizeof(struct skb_buff). and it explains truesize of these messages. SKB BUG: Invalid truesize (1264) len=7564, sizeof(sk_buff)=240 # 1264=1024+240 SKB BUG: Invalid truesize (1196) len=1782, sizeof(sk_buff)=172 # 1196=1024+172 > SKB BUG: Invalid truesize (1196) len=1782, sizeof(sk_buff)=172 > skb=0xf786c980 data_len=0 data=0xe9d8e000 head=0xe9d8e000 tail=0xe9d8e6f6 > end=0xe9d8e800 This shows that skb size (0x800 (=2048) might be expanded by 1024, but truesize is not increased. So, I think that truesize need to be increased after pskb_expand_head() function. Thanks,
Created attachment 293086 [details] Fix audit_expand to increase truesize Sorry, I'm on the road so I got confused about which machine I was working on :)
Thanks, Herbert-san. I believe that your patch would solve this problem. This problem rarely happens. We are going to get a little more information on this problem, so that we can easily reproduce this problem to confirm your patch works well. Thanks a lot,
We got detailed information on how skb is extended by adding skb_truesize_check() at the both begining/end of audit_expand() function. This result shows that audit_log_n_string() function increases a skb size by 128 byte and audit_log_vformat() function increases 1024 byte., also len is increased again and again, but truesize is not increased. We can guess that the first audit_expand() result marked by ***** is actually the second call, because the initial size of skb is 1024 byte and it is already increased by 128 byte by audit_expand() function. After the first audit_expand() function call, truesize had wrong value. -------------------------------------------------------------------------------- audit_log_n_string() -> audit_expand(ab, 0x80) ***** SKB BUG: Invalid truesize (1196) len=1146, sizeof(sk_buff)=172 IN: data=0xecc25000 head=0xecc25000 tail=0xecc2547a end=0xecc25480 (end-head=0x480) OUt: data=0xea6ba000 head=0xea6ba000 tail=0xea6ba47a end=0xea6ba500 (end-head=0x500) audit_log_n_string() -> audit_expand(ab, 0x80) SKB BUG: Invalid truesize (1196) len=1273, sizeof(sk_buff)=172 IN: data=0xea6ba000 head=0xea6ba000 tail=0xea6ba4f9 end=0xea6ba500 (end-head=0x500) OUT: data=0xecc25000 head=0xecc25000 tail=0xecc254f9 end=0xecc25580 (end-head=0x580) audit_log_vformat() -> audit_expand(ab, AUDIT_BUFSIZ=0x400) SKB BUG: Invalid truesize (1196) len=1407, sizeof(sk_buff)=172 IN: data=0xecc25000 head=0xecc25000 tail=0xecc2557f end=0xecc25580 (end-head=0x580) OUT: data=0xea6b1000 head=0xea6b1000 tail=0xea6b157f end=0xea6b1980 (end-head=0x980) audit_log_vformat() -> audit_expand(ab, AUDIT_BUFSIZ=0x400) SKB BUG: Invalid truesize (1196) len=2429, sizeof(sk_buff)=172 IN: data=0xea6b1000 head=0xea6b1000 tail=0xea6b197d end=0xea6b1980 (end-head=0x980) OUT: data=0xea750000 head=0xea750000 tail=0xea75097d end=0xea750d80 (end-head=0xd80) ... audit_log_n_string() -> audit_expand(ab, 0x80) SKB BUG: Invalid truesize (1196) len=6902, sizeof(sk_buff)=172 IN: data=0xea8b4000 head=0xea8b4000 tail=0xea8b5af6 end=0xea8b5b00 (end-head=0x1b00) OUT: data=0xea63a000 head=0xea63a000 tail=0xea63baf6 end=0xea63bb80 (end-head=0x1b80) audit_log_n_string() -> audit_expand(ab, 0x80) SKB BUG: Invalid truesize (1196) len=7028, sizeof(sk_buff)=172 IN: data=0xea63a000 head=0xea63a000 tail=0xea63bb74 end=0xea63bb80 (end-head=0x1b80) OUT: data=0xea8b4000 head=0xea8b4000 tail=0xea8b5b74 end=0xea8b5c00 (end-head=0x1c00) audit_log_vformat() -> audit_expand(ab, AUDIT_BUFSIZ=1024) SKB BUG: Invalid truesize (1196) len=7166, sizeof(sk_buff)=172 IN: data=0xea8b4000 head=0xea8b4000 tail=0xea8b5bfe end=0xea8b5c00 (end-head=0x1c00) OUT: data=0xea6f8000 head=0xea6f8000 tail=0xea6f9bfe end=0xea6fa000 (end-head=0x2000) -------------------------------------------------------------------------------- * IN: shows skb data passed to audit_expand() function * OUT: shows skb data modified by audit_expand() function If we apply the patch made by Herbert-san, truesize would be increased by extended size of skb and the skb would be changed as follows. As a result, this problem will gone. -------------------------------------------------------------------------------- audit_log_n_string() -> audit_expand(ab, 0x80) IN: data=0xXXXXXXXX head=0xXXXXXXXX tail=0xXXXXXXXX end=0xXXXXXXXX (end-head=0x400) truesize (1196) len=XXXX, sizeof(sk_buff)=172 OUT: data=0xea6ba000 head=0xea6ba000 tail=0xea6ba47a end=0xea6ba500 (end-head=0x480) truesize (1324: 1196+128) audit_log_n_string() -> audit_expand(ab, 0x80) IN: data=0xecc25000 head=0xecc25000 tail=0xecc2547a end=0xecc25480 (end-head=0x480) truesize (1324) len=1146, sizeof(sk_buff)=172 OUT: data=0xea6ba000 head=0xea6ba000 tail=0xea6ba47a end=0xea6ba500 (end-head=0x500) truesize (1452: 1324+128) ... -------------------------------------------------------------------------------- We will continue tests and check if other possibility exists. Thanks,
(In reply to comment #58) > Created an attachment (id=293086) [edit] > Fix audit_expand to increase truesize I test your patch almost 21 hours using audit-test-2088. And I don't see any SKB BUG message, while the message usually appears within 15 or 16 hours. So, we can say that your patch fixes the SKB BUG in audit code.
in 2.6.18-77.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html
Well, I am still facing this bug on this kernel version on centos 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:33:52 EDT 2008 i686 i686 i386 GNU/Linux Getting zillions of Oct 13 10:52:00 routerng kernel: SKB BUG: Invalid truesize (300) len=136, sizeof(sk_buff)=172 Oct 13 10:52:01 routerng kernel: SKB BUG: Invalid truesize (1452) len=1286, sizeof(sk_buff)=172 Oct 13 10:52:01 routerng kernel: SKB BUG: Invalid truesize (300) len=136, sizeof(sk_buff)=172
(In reply to comment #66) > Well, I am still facing this bug on this kernel version on centos > 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:33:52 EDT 2008 i686 i686 i386 GNU/Linux > > Getting zillions of > > Oct 13 10:52:00 routerng kernel: SKB BUG: Invalid truesize (300) len=136, > sizeof(sk_buff)=172 > Oct 13 10:52:01 routerng kernel: SKB BUG: Invalid truesize (1452) len=1286, > sizeof(sk_buff)=172 > Oct 13 10:52:01 routerng kernel: SKB BUG: Invalid truesize (300) len=136, > sizeof(sk_buff)=172 What kind of network device and protocol do you use on the machine?
I see (In reply to comment #67) > (In reply to comment #66) > > Well, I am still facing this bug on this kernel version on centos > > 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:33:52 EDT 2008 i686 i686 i386 GNU/Linux > > > > Getting zillions of > > > > Oct 13 10:52:00 routerng kernel: SKB BUG: Invalid truesize (300) len=136, > > sizeof(sk_buff)=172 > ... > > What kind of network device and protocol do you use on the machine? On a CentOS machine running 2.6.18-92.1.13.el5 (x86_64) Oct 24 20:43:21 centos kernel: SKB BUG: Invalid truesize (1648) len=1420, sizeof(sk_buff)=240 Oct 24 20:43:21 centos kernel: SKB BUG: Invalid truesize (1648) len=1420, sizeof(sk_buff)=240 Oct 24 20:43:21 centos kernel: SKB BUG: Invalid truesize (1392) len=1162, sizeof(sk_buff)=240 Oct 24 20:43:21 centos kernel: SKB BUG: Invalid truesize (368) len=131, sizeof(sk_buff)=240 Network hardware (lspci): 00:19.0 Ethernet controller: Intel Corporation 82566DM Gigabit Network Connection (rev 02) 06:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) 06:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139/8139C/8139C+ (rev 10) This stops as soon as I stop 'ntop'. tcpdump on any cards produces the same message (given traffic). The machine is not locking up or something.
Hi, In think this problem it's not solved. I'm seeing this error with kernel "2.6.18-131.el5xen" from dzickus that i think will be in redhat 5.3. SKB BUG: Invalid truesize (1580) len=1420, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1420, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (1580) len=1420, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (300) len=135, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (556) len=387, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (556) len=387, sizeof(sk_buff)=172 SKB BUG: Invalid truesize (556) len=387, sizeof(sk_buff)=172 Can i do something about it?
The truesize variable in SKB is changed in many functions, and it needs to be handled correctly in all functions. I fixed the issue which happens under the condition reported in this bugzilla, but there still seems to be a place where the truesize variable is handled wrongly. I ask you to open a new bugzilla as a separate issue because I guess that the root cause is different from the one reported in this bugzilla. And, in the new bugzilla, please report the detailed information about how to reproduce this issue. "SKB BUG" message is not helpful to investigate where the bug is, because it is displayed at the point where SKB is closed, but is not displayed where the truesize variable was handled wrongly. Thanks,
Ok, I can do that, i understand your point of view, and it makes perfect sense. I just have one problem, I don't know how to reproduce it and I don't know how to debug it. I have the messages in dmesg but the machine it's still working, no panic, nothing. I guess it happen when I get a lot of load in the machine, but I can't exacly reproduce it. What should I say when opening the new bug report that will help solving this, another then profile my hardware and show the message that I get in dmesg? Thanks,
Yes, could you please report information about your system environment? I would like to know in which kernel, dom0 or domU produced this messages. sosreport will be useful to tell it. Thanks,
My problem it's being followed in here: https://bugzilla.redhat.com/show_bug.cgi?id=488710 Thanks,