Bug 366921 - network goes to sleep and stops file transfers on large or multiple files
network goes to sleep and stops file transfers on large or multiple files
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
7
x86_64 Linux
low Severity medium
: ---
: ---
Assigned To: Kernel Maintainer List
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2007-11-05 10:48 EST by William W. Austin
Modified: 2008-01-16 08:14 EST (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-01-16 08:14:13 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description William W. Austin 2007-11-05 10:48:51 EST
Description of problem:

This is a recurrence of a problem I reported under fc4 and fc5 (but which was
apparently fixed in fc6) - network file trasnfers/copies hang without warning.


Version-Release number of selected component (if applicable):
kernel: 2.6.22.9-91.fc7 (but has occurred under all .fc7 kernels so far)
libnetfilter_conntrack-0.0.81-1.fc7
libnfnetlink-0.0.30-1.fc7
This is an AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ sys with 2GB mem, 8GB
swap, and 1450 GB disk space


How reproducible:
always on large (>150Mb) file transfers, occasionally on smaller if a large
number of them are done together (such as "mget *" in an ftp session.  Also
occurs with rcp and rsync.


Steps to Reproduce:
1.start an ftp or an rcp on a very large file to a remote host (I am on 3mb/s
dsl if it makes a difference)
2.wait a few minutes - the file transfer will hang (usually at between 25% and
75% complete) and will never continue (not for up to 12 hours, anyway).

  
Actual results:

Part way through the file transfer hangs and checking the system console the
messages occur:
Removing netfilter NETLINK layer.
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
nf_conntrack version 0.5.0 (8192 buckets, 65536 max)
Removing netfilter NETLINK layer.
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
nf_conntrack version 0.5.0 (8192 buckets, 65536 max)
Removing netfilter NETLINK layer.
KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at
net/netlink/af_netlink.c (156)
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
nf_conntrack version 0.5.0 (8192 buckets, 65536 max)
Removing netfilter NETLINK layer.
KERNEL: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at
net/netlink/af_netlink.c (156)
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
nf_conntrack version 0.5.0 (8192 buckets, 65536 max)
Removing netfilter NETLINK layer.
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
nf_conntrack version 0.5.0 (8192 buckets, 65536 max)
Removing netfilter NETLINK layer.
ip_tables: (C) 2000-2006 Netfilter Core Team
Netfilter messages via NETLINK v0.30.
nf_conntrack version 0.5.0 (8192 buckets, 65536 max)


Expected results:
File transfer should complete without interruption

Additional info:

IF the "dead" ftp (rcp) process is killed, then it is often possible to do
another ftp (rcp) and retrieve the "offending" file (the one being coppied when
the hang occurred) successfully.
Comment 1 Chuck Ebbert 2007-11-05 15:03:21 EST
(In reply to comment #0)
> Part way through the file transfer hangs and checking the system console the
> messages occur:

Any additional messages in the system logs?
Comment 2 William W. Austin 2007-11-06 00:41:14 EST
Actually the same messages are printed into /var/log/messages - the only
difference is that they have a timestamp (and a system name, of course) inserted
ahead of them...  Here is a short sample:

Nov  4 22:03:02 dsl027-161-055 kernel: KERNEL: assertion
(!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156)
Nov  4 22:03:02 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Nov  4 22:03:02 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30.
Nov  4 22:23:01 dsl027-161-055 kernel: Removing netfilter NETLINK layer.
Nov  4 22:23:01 dsl027-161-055 kernel: KERNEL: assertion
(!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156)
Nov  4 22:23:01 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Nov  4 22:23:01 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30.
Nov  4 22:43:01 dsl027-161-055 kernel: Removing netfilter NETLINK layer.
Nov  4 22:43:01 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Nov  4 22:43:01 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30.
Nov  4 23:03:02 dsl027-161-055 kernel: Removing netfilter NETLINK layer.
Nov  4 23:03:02 dsl027-161-055 kernel: KERNEL: assertion
(!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156)
Nov  4 23:03:02 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Nov  4 23:03:02 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30.
Nov  4 23:23:02 dsl027-161-055 kernel: Removing netfilter NETLINK layer.
Nov  4 23:23:02 dsl027-161-055 kernel: KERNEL: assertion
(!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (156)
Nov  4 23:23:02 dsl027-161-055 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Nov  4 23:23:02 dsl027-161-055 kernel: Netfilter messages via NETLINK v0.30.
Nov  4 23:43:01 dsl027-161-055 kernel: Removing netfilter NETLINK layer.

Hmmm.  This is odd - I never notcied before but they are happening every 20
minutes.  (Apologies for my own stupidity in missing the obvious here ...)

There are no entries that I can find in any of the /var/spool/cron/* crontabs OR
in the /etc/cron* cron*/* crontabs which correspond to these times however.  
Are there any other likely places to check for cron-like stuff?  (I don't think
there are, but it doesn't hurt to ask.) 
Comment 3 Christopher Brown 2008-01-15 22:32:33 EST
Hello,

I'm reviewing this bug as part of the kernel bug triage project, an attempt to
isolate current bugs in the Fedora kernel.

http://fedoraproject.org/wiki/KernelBugTriage

I am CC'ing myself to this bug and will try and assist you in resolving it if I can.

There hasn't been much activity on this bug for a while. Could you tell me if
you are still having problems with the latest kernel?

If the problem no longer exists then please close this bug or I'll do so in a
few days if there is no additional information lodged.
Comment 4 William W. Austin 2008-01-15 23:56:48 EST
Actually I'm reasonably that it is some really obscure bug which I have
introduced  into a configuration file, but I have given up on finding it.  I'm
running the kernel-2.6.23.8-34.fc7.x86_64 kernel on this machine and it still
happens.

I can get around it if every 2 hours (or so - but I have the cron job run every
2 hours) I stop and restart the network and reload the the routing table.  I
accidentally turned that off (by stopping cron and forgetting to restart it - as
a part of trying a new replacement program) for a few hours, and the network
went to sleep again.

You can probably go ahead and close this one - I finally suspect pilot error
(me, that is), not the software.  If I had the opportunity, I'd do a clean
re-install and NOT add the special routing required to do something I have to
do.  But I just don't have that luxury.

Thanks for the help.
Comment 5 Christopher Brown 2008-01-16 08:14:13 EST
Okay William, thanks for the update and for taking the time to file the report
originally. Please don't hesitate to re-open if you change your mind.

Cheers
Chris

Note You need to log in before you can comment on or make changes to this bug.