Description of problem: This is a recurrence of a problem I reported under fc4 and fc5 (but which was apparently fixed in fc6) - network file trasnfers/copies hang without warning. Version-Release number of selected component (if applicable): kernel: 2.6.22.9-91.fc7 (but has occurred under all .fc7 kernels so far) libnetfilter_conntrack-0.0.81-1.fc7 libnfnetlink-0.0.30-1.fc7 This is an AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ sys with 2GB mem, 8GB swap, and 1450 GB disk space How reproducible: always on large (>150Mb) file transfers, occasionally on smaller if a large number of them are done together (such as "mget *" in an ftp session. Also occurs with rcp and rsync. Steps to Reproduce: 1.start an ftp or an rcp on a very large file to a remote host (I am on 3mb/s dsl if it makes a difference) 2.wait a few minutes - the file transfer will hang (usually at between 25% and 75% complete) and will never continue (not for up to 12 hours, anyway). Actual results: Part way through the file transfer hangs and checking the system console the messages occur: Removing netfilter NETLINK layer. ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. nf_conntrack version 0.5.0 (8192 buckets, 65536 max) Removing netfilter NETLINK layer. ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. nf_conntrack version 0.5.0 (8192 buckets, 65536 max) Removing netfilter NETLINK layer. KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156) ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. nf_conntrack version 0.5.0 (8192 buckets, 65536 max) Removing netfilter NETLINK layer. KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156) ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. nf_conntrack version 0.5.0 (8192 buckets, 65536 max) Removing netfilter NETLINK layer. ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. nf_conntrack version 0.5.0 (8192 buckets, 65536 max) Removing netfilter NETLINK layer. ip_tables: (C) 2000-2006 Netfilter Core Team Netfilter messages via NETLINK v0.30. nf_conntrack version 0.5.0 (8192 buckets, 65536 max) Expected results: File transfer should complete without interruption Additional info: IF the "dead" ftp (rcp) process is killed, then it is often possible to do another ftp (rcp) and retrieve the "offending" file (the one being coppied when the hang occurred) successfully.
(In reply to comment #0) > Part way through the file transfer hangs and checking the system console the > messages occur: Any additional messages in the system logs?
Actually the same messages are printed into /var/log/messages - the only difference is that they have a timestamp (and a system name, of course) inserted ahead of them... Here is a short sample: Nov 4 22:03:02 dsl027-161-055 kernel: KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156) Nov 4 22:03:02 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Nov 4 22:03:02 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30. Nov 4 22:23:01 dsl027-161-055 kernel: Removing netfilter NETLINK layer. Nov 4 22:23:01 dsl027-161-055 kernel: KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156) Nov 4 22:23:01 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Nov 4 22:23:01 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30. Nov 4 22:43:01 dsl027-161-055 kernel: Removing netfilter NETLINK layer. Nov 4 22:43:01 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Nov 4 22:43:01 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30. Nov 4 23:03:02 dsl027-161-055 kernel: Removing netfilter NETLINK layer. Nov 4 23:03:02 dsl027-161-055 kernel: KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156) Nov 4 23:03:02 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Nov 4 23:03:02 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30. Nov 4 23:23:02 dsl027-161-055 kernel: Removing netfilter NETLINK layer. Nov 4 23:23:02 dsl027-161-055 kernel: KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156) Nov 4 23:23:02 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team Nov 4 23:23:02 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30. Nov 4 23:43:01 dsl027-161-055 kernel: Removing netfilter NETLINK layer. Hmmm. This is odd - I never notcied before but they are happening every 20 minutes. (Apologies for my own stupidity in missing the obvious here ...) There are no entries that I can find in any of the /var/spool/cron/* crontabs OR in the /etc/cron* cron*/* crontabs which correspond to these times however. Are there any other likely places to check for cron-like stuff? (I don't think there are, but it doesn't hurt to ask.)
Hello, I'm reviewing this bug as part of the kernel bug triage project, an attempt to isolate current bugs in the Fedora kernel. http://fedoraproject.org/wiki/KernelBugTriage I am CC'ing myself to this bug and will try and assist you in resolving it if I can. There hasn't been much activity on this bug for a while. Could you tell me if you are still having problems with the latest kernel? If the problem no longer exists then please close this bug or I'll do so in a few days if there is no additional information lodged.
Actually I'm reasonably that it is some really obscure bug which I have introduced into a configuration file, but I have given up on finding it. I'm running the kernel-2.6.23.8-34.fc7.x86_64 kernel on this machine and it still happens. I can get around it if every 2 hours (or so - but I have the cron job run every 2 hours) I stop and restart the network and reload the the routing table. I accidentally turned that off (by stopping cron and forgetting to restart it - as a part of trying a new replacement program) for a few hours, and the network went to sleep again. You can probably go ahead and close this one - I finally suspect pilot error (me, that is), not the software. If I had the opportunity, I'd do a clean re-install and NOT add the special routing required to do something I have to do. But I just don't have that luxury. Thanks for the help.
Okay William, thanks for the update and for taking the time to file the report originally. Please don't hesitate to re-open if you change your mind. Cheers Chris