Bug 241545 - smbfs mountpoint hangs after network transfer error on massive reads
Summary: smbfs mountpoint hangs after network transfer error on massive reads
Keywords:
Status: CLOSED DUPLICATE of bug 355141
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.5
Hardware: All
OS: Linux
urgent
high
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Martin Jenner
URL:
Whiteboard:
: 373171 (view as bug list)
Depends On:
Blocks: 422551 430698
TreeView+ depends on / blocked
 
Reported: 2007-05-27 21:51 UTC by Arenas Belon, Carlo Marcelo
Modified: 2008-01-29 15:34 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-28 15:45:33 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
set of upstream patches that may or may not help this (5.00 KB, patch)
2007-10-24 14:34 UTC, Jeff Layton
no flags Details | Diff
patch -- calculate rq_rlen locally in smb_receive (2.03 KB, patch)
2007-10-26 20:41 UTC, Jeff Layton
no flags Details | Diff
patch -- calculate rq_rlen locally in smb_receive (1.92 KB, patch)
2007-11-06 12:17 UTC, Jeff Layton
no flags Details | Diff

Description Arenas Belon, Carlo Marcelo 2007-05-27 21:51:33 UTC
Description of problem:
while reading several GB of data through an smbfs mountpoint (several tar -cvf
would do it), a message is printed out in the kernel ring for the client that
reports (or something similar) :

May 27 04:51:28 sbr-qse1-3 kernel: smb_get_length: Invalid NBT packet, code=6c

after which smbiod CPU spikes, and the mount point becomes unaccessible as shown
below :

$ ls -al /net/idx
ls: /net/idx: Input/output error

Version-Release number of selected component (if applicable):
kernel-2.6.9-55.ELsmp

How reproducible:
Allways

Steps to Reproduce:
1. export authenticated share using samba (writeable=yes, public=no)
2. mount share in client using smbfs
3. run several processes that just read the data from that mount point until
deadlock is triggered.
  
Actual results:
$ ls -al /net/idx
ls: /net/idx: Input/output error

Expected results:
no hungs that require manual intervention to recover

Additional info:
similar to the issue shown in BUG175430 except that at least on i386, the
problem is recoverable by stopping all processes in that mount point and
remounting it.

BUG177523 is probably also similar, as it is triggered by read operations as
well, but again, no PANIC is observed.

I suspect there might be also a problem with the mount.smbfs process as the
server shows the connection is closed, and on the client together with the
following kernel messages :

May 27 05:09:27 sbr-qse1-3 kernel: smb_add_request: request [ed5bb380, mid=56991
] timed out!
May 27 05:09:27 sbr-qse1-3 kernel: smb_add_request: request [ed5bba80, mid=56992
] timed out!
May 27 05:10:57 sbr-qse1-3 kernel: smb_lookup: find Engines.200705250237/3 faile
d, error=-5

there are also errors from mount.smbfs having problems with its locking DB as
shown by :

May 27 05:06:05 sbr-qse1-3 mount.smbfs[14537]:   tdb(/var/cache/samba/gencache.t
db): tdb_lock failed on list 21 ltype=1 (Bad file descriptor) 
May 27 05:06:05 sbr-qse1-3 mount.smbfs[14537]: [2007/05/27 05:06:05, 0] tdb/tdbu
til.c:tdb_log(725) 
May 27 05:06:05 sbr-qse1-3 mount.smbfs[14537]:   tdb(/var/cache/samba/gencache.t
db): tdb_lock failed on list 21 ltype=1 (Bad file descriptor)

Comment 1 Arenas Belon, Carlo Marcelo 2007-05-29 17:55:36 UTC
the mount.smbfs errors are being tracked in BUG241575

Comment 2 Andrew Bartlett 2007-06-04 04:12:16 UTC
If the kernel is showing errors, such that a remount is required, then you have
bigger problems than buggy smbmount tools.

My point is that this:
May 27 04:51:28 sbr-qse1-3 kernel: smb_get_length: Invalid NBT packet, code=6c

should never happen.  If it does, then the integrity of the data on the
connection is in question, and the kernel-level issue needs to be addressed
regardless.

is switching to cifsfs an option? 

Comment 3 Arenas Belon, Carlo Marcelo 2007-06-04 15:56:22 UTC
switching to cifs is currently the implemented workaround, but is not a solution
as it runs fully in kernel space and would require a system reboot if hung.

agree about this being a bigger problem than buggy smbmount tools, which is why
I splitted this issue in 2 bugs to be addressed independently.

Comment 4 Jeff Layton 2007-06-08 11:08:18 UTC
I tried to set up a reproducer for this. It basically runs an infinite loop that
cpio's a kernel source directory that's sitting on a samba share to /dev/null.
The dir is ~2G and the memory in this box is 512M, so pagecache should be
getting completely flushed out each run and smbiod should be nice and busy. It's
been running >12 hours and is still going.

I'm using the kernels on my people page:

http://people.redhat.com/jlayton

...and am using the following samba-client pkg:

samba-client-3.0.10-1.4E.11

I did see a few messages in the ring buffer:

smbiod_handle_request: smbiod got a request ... and we don't implement oplocks!
smb_get_length: Invalid NBT packet, code=80
smb_add_request: request [ffffff801b7a8500, mid=17180] timed out!
smb_proc_readdir_long: error=-5, breaking
smb_get_length: Invalid NBT packet, code=7c
smb_add_request: request [ffffff8008a29e00, mid=54912] timed out!
smb_proc_readdir_long: error=-5, breaking

..but smbiod seems to still be moving along.

Could you test the kernels on my people page and see if this problem is already
resolved for you in them? If it's not then could you provide some details of how
you're reproducing this?


Comment 5 Arenas Belon, Carlo Marcelo 2007-06-08 18:27:59 UTC
sorry about that, I couldn't reproduce either using as the sink /dev/null but
using a real file in the local disk as for some reason using /dev/null seems to
run all from cache (even if it wouldn't fit, because we have 6GB of memory there
and the data set is bigger than that); probably should had mention that.

so my setup for reproducing this has in the server (sbr-qse1-1) a share which is
authenticated with a local account

# useradd foo
# passwd foo
New UNIX password: bar
BAD PASSWORD: it is WAY too short
Retype new UNIX password: bar
passwd: all authentication tokens updated successfully.

# smbpasswd -a foo
New SMB password: bar
Retype new SMB password: bar
Added user foo.

and share in samba (/etc/samba/smb.conf) that exports that for that user as :

[foo]
    path = /home/foo
    valid users = foo
    public = no
    writeable = yes

in the client side I am mounting that share by doing something like :

# mount -t smbfs //sbr-qse1-1.shopping.com/foo -o username=foo,password=bar /mnt

and 3 terminals just running :

# while true; do tar -cvf /var/tmp/mnt1.tar /mnt; done
# while true; do tar -cvf /var/tmp/mnt2.tar /mnt; done
# while true; do tar -cvf /var/tmp/mnt3.tar /mnt; done

Comment 6 Jeff Layton 2007-06-08 19:54:30 UTC
Ok, I'll look at this when I get an opportunity. I'm not sure about this
statement, however:

> switching to cifs is currently the implemented workaround, but is not a
> solution as it runs fully in kernel space and would require a system reboot
> if hung.

That seems like faulty reasoning to me. Given that smbfs has significant kernel
component, you may *still* be rebooting if there are problems. And I believe
that problems are much more likely with smbfs. The design of smbfs is just not
as robust as cifs.

The CIFS code is in much better shape than the smbfs code both upstream and in
RHEL. So much so that smbfs is disabled in RHEL5.


Comment 7 Arenas Belon, Carlo Marcelo 2007-06-14 07:13:35 UTC
tested with 2.6.9-55.4.EL.jtltest.9smp and was still able to reproduce the problem.

[root@sbr-qse1-3 log]# ls -al /mnt
ls: /mnt: Input/output error
[root@sbr-qse1-3 log]# ps faux | head -1 && ps faux | grep smb | grep -v grep
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root      3296  0.0  0.0  8760 1708 ?        Ss   Jun12   0:00 /sbin/mount.smbfs
//sbr-qse1-1.shopping.com/foo /mnt/ -o rw username foo password XXXXX
root      3298 96.5  0.0     0    0 ?        R    Jun12 2700:24 [smbiod]

which leaves a smbiod kthread in a busy loop until the mountpoint is remounted :

[root@sbr-qse1-3 log]# umount /mnt
[root@sbr-qse1-3 log]# lsmod | grep smbfs
smbfs                  63417  0 
[root@sbr-qse1-3 log]# mount -t smbfs -o username=foo,password=bar
//sbr-qse1-1.shopping.com/foo /mnt
[root@sbr-qse1-3 log]# ls -al /mnt/ | wc -l
21

Comment 8 Arenas Belon, Carlo Marcelo 2007-06-14 08:52:21 UTC
(In reply to comment #6)
> > switching to cifs is currently the implemented workaround, but is not a
> > solution as it runs fully in kernel space and would require a system reboot
> > if hung.
> 
> That seems like faulty reasoning to me. Given that smbfs has significant kernel
> component, you may *still* be rebooting if there are problems.

when I mentioned that cifs was fully in kernel space I didn't imply that smbfs
didn't have a kernel component as well, but that eventhough it has a one, still
relies on a user level helper for managing the mount point and the connection to
the server.

so in case of a lockup it can be unmounted, the userspace helper can be killed
if hung and the kernel hooks removed, that way allowing for the kernel part to
be also restarted by removing the module and loading it back.  All without a reboot.

of course, if the failure is on smbfs, the kernel will crash and require a
reboot, but at least on the current version we are using in production (from 2.4
without smbiod) we had been so far lucky enough not to hit a bug like that.

> And I believe
> that problems are much more likely with smbfs. The design of smbfs is just not
> as robust as cifs.

agree, but since we limit our use of smbfs to very basic operations which are
hopefully mature the design deficiencies of smbfs are not that big of a concern,
and the limits of functionality mean also a reduction in code complexity and
size (1/4 of cifs looking at the 2.6.20 sources).

from our tests with smbfs (from 2.6 with smbiod) we had only 1 ocasion of smbfs
related crash, and it was while doing writes and on smbiod due to a kernel
spinlock bug.  with cifs we had more serious crashes and data corruption concerns.

Comment 9 Jeff Layton 2007-06-19 17:03:40 UTC
I'm trying again to reproduce this using a method closer to what you're using,
but still no luck so far.

In the meantime, I'm looking over this code. smb_get_length also does some
sanity checking of the packets by checking the session packet type. From RFC 1002:

   Session Packet Types (in hexidecimal):

           00 -  SESSION MESSAGE
           81 -  SESSION REQUEST
           82 -  POSITIVE SESSION RESPONSE
           83 -  NEGATIVE SESSION RESPONSE
           84 -  RETARGET SESSION RESPONSE
           85 -  SESSION KEEP ALIVE

...the fact that we're getting occasional packets that don't look like this
leads me to think that this has some sort of alignment issue.


Comment 10 Jeff Layton 2007-06-19 19:07:18 UTC
Went through git and pulled out pretty much any smbfs-related patch that had
been committed since RHEL4's release that looked like it might be related.
Upstream development on smbfs has been pretty much dead for a long while now,
so there weren't many:

old-2.6-bkcvs:
b189c38d82770a5b281233babbf318caac09ad74

linux-2.6:
caf736085f2f0d22a992a855d9caae14973f7ea4
ac34dd052400b1e63aa8e711a13c0670943296fd
54b21a7992a31d30c9a91f7e0a00ffdb4bd0caee
53b27584db0446e356c86b3aa43f5107b0170704
ccb6e363a68144cdfdaa6d964d63d620c8ac9a9b

...I applied these and still got occasional errors like:

smb_get_length: Invalid NBT packet, code=fb

...though I don't get the situation where smbiod starts eating CPU time. I'm
taking this to either mean that this is still an upstream bug, or is due to a
non-smbfs specific problem (that may or may not have been fixed upstream).

When I get time, I'll take a crack at building a smbfs kernel module for f7 and
seeing if I can reproduce it there.


Comment 11 Jeff Layton 2007-06-19 19:34:54 UTC
Added a dump_stack() to smb_add_request just after the "timed out!" message
(still working with RHEL4 here):

smb_get_length: Invalid NBT packet, code=5a
smb_add_request: request [ffffff8007d7fc80, mid=11925] timed out!

Call Trace:<ffffffffa01b3060>{:smbfs:smb_add_request+1117}
<ffffffff8012daec>{autoremove_wake_function+0} 
       <ffffffff8012daec>{autoremove_wake_function+0}
<ffffffffa01abae1>{:smbfs:smb_request_ok+44} 
       <ffffffffa01ac886>{:smbfs:smb_proc_readX+220}
<ffffffffa01b1a0b>{:smbfs:smb_readpage+208} 
       <ffffffff8015a204>{read_pages+177} <ffffffff8015749c>{__alloc_pages+200} 
       <ffffffff8015a5a2>{do_page_cache_readahead+345}
<ffffffff8015a7ad>{page_cache_readahead+478} 
       <ffffffff80153ab6>{do_generic_mapping_read+295}
<ffffffff80153e9b>{file_read_actor+0} 
       <ffffffff80155d3b>{__generic_file_aio_read+382}
<ffffffff80155ee7>{generic_file_read+187} 
       <ffffffff80174eab>{do_sync_write+178}
<ffffffff8012daec>{autoremove_wake_function+0} 
       <ffffffffa01b0e06>{:smbfs:smb_revalidate_inode+172} 
       <ffffffffa01b1cf7>{:smbfs:smb_file_read+99} <ffffffff80174dc4>{vfs_read+207} 
       <ffffffff80175020>{sys_read+69} <ffffffff8010d66e>{system_call+134} 
       <ffffffff8010d5e8>{system_call+0} 


...in almost all cases, it seems to happen on a read call, and the stack traces
are generally very similar. So it seems like we see this Invalid NBT packet
message, and then queueing the job to smbiod times out. It also seems to
generally happen on read calls (though maybe that's an artifact of the fact that
most of what I'm doing here are read calls).





Comment 12 Jeff Layton 2007-06-19 20:08:44 UTC
Tested this on a f7 xen kernel (2.6.20-2925.9.fc7xen):

smb_get_length: Invalid NBT packet, code=9
smb_proc_readX_data: offset is larger than SMB_READX_MAX_PAD or negative!
smb_proc_readX_data: -35 > 64 || -35 < 0
smb_add_request: request [ffff88001f5a6800, mid=60595] timed out!
smb_add_request: request [ffff880017c3cc80, mid=60596] timed out!
smb_add_request: request [ffff88001f5a6980, mid=60597] timed out!
smb_get_length: Invalid NBT packet, code=69
smb_add_request: request [ffff88001f5a6680, mid=35822] timed out!
smb_add_request: request [ffff88001f5a6b00, mid=39974] timed out!
smb_get_length: Invalid NBT packet, code=74

...so this appears to be a long standing upstream bug. It did throw some other
messages that I haven't seen before (the SMB_READX_MAX_PAD stuff). So maybe
that's related...


Comment 13 Jeff Layton 2007-06-19 20:11:18 UTC
...again though, I've yet to reproduce this situation where smbiod starts eating
CPU time. I'm presuming that the "Invalid NBT packet" messages are related to
that problem and am going to see if I can track down the cause of that and the
smb_add_request timeouts.


Comment 26 Jeff Layton 2007-10-24 14:34:26 UTC
Created attachment 236241 [details]
set of upstream patches that may or may not help this

I went through the upstream git trees and pulled out some smbfs patches that
looked like they might affect this. Most of them are probably patches we want
anyway, so I may try to queue some of these up for 4.7.

I'm testing a kernel with these patches now to see if I can get it to fail...

Comment 28 Jeff Layton 2007-10-26 15:12:27 UTC
Since we don't have a way to reproduce these hangs, I'll focus for now on these
messages:

smb_get_length: Invalid NBT packet, code=3d

...since these packets look like garbage, that suggests to me that in certain
cases the prior request is either reading too much or not enough data off the
socket before starting the next request. The next request comes in and tries to
read the length, but the read is not aligned with the head of the response and
the data looks like garbage.

I added a bit of debug code to smb_request_recv, looking for cases where we're
starting a new response (SMB_RECV_START) but smb_len (which should represent the
total packet length) is not the same as smb_read (which should be how much data
was read from the socket). This showed up in the ring buffer:

mismatch -- smb_read=4159 smb_len=3341
smb_get_length: Invalid NBT packet, code=3d

There's a difference of 818 bytes here.


Comment 29 Jeff Layton 2007-10-26 17:54:53 UTC
Again, focusing on the bad packet alignment for now. The problem seems to be
that smb_receive occasionally calls kernel_recvmsg with a larger size parameter
than smb_len-smb_read.

I added this just before the "out:" label in smb_receive:

        if (server->smb_read > server->smb_len)
                PARANOIA("smb_read=%d smb_len(req->rq_header)=%d smb_len=%d
req->rq_rlen=%d req->rq_bytes_recvd=%d rlen=%d result=%d state=%d\n",
server->smb_read, smb_len(req->rq_header), server->smb_len, req->rq_rlen,
req->rq_bytes_recvd, rlen, result, server->rstate);


One such printk looked like this:

smb_receive: smb_read=1006 smb_len(req->rq_header)=963 smb_len=967
req->rq_rlen=904 req->rq_bytes_recvd=943 rlen=904 result=58 state=4

This is after smb_read and rq_bytes_received was adjusted, so on entry to the
function:

server->smb_read = 948
req->rq_bytes_recvd = 885

On entry into the function, there should have been only 19 bytes (967-948) left
in this response, but it calls kernel_recvmsg with a size of 904 (rlen, which is
set from rq_rlen).

rq_rlen seems to almost always be set by this formula:

   req->rq_rlen = smb_len(req->rq_header) + 4 - req->rq_bytes_recvd;

...so in order to get a rq_rlen of 904 from a request that has a smb_len of 963,
req->rq_bytes_received had to have been 63 at the time it was set. So apparently
we're occasionally adjusting rq_bytes_recvd and then attempting to use rq_rlen
without recalculating it.


Comment 30 Jeff Layton 2007-10-26 18:15:34 UTC
rq_rlen is only recalculated in the SMB_RECV_PARAM phase. So I think I can see a
possible scenario for these errors:

smbiod calls smb_recv_param() for the req and then calls smb_recv_data(). We
don't receive the entire response (req->rq_bytes_recvd < req->rq_rlen), but we
get some of it. smb_recv_data exits early, and we go back into smb_request_recv
with server->rstate still set to SMB_RECV_DATA. rq_rlen is still set to the
original value, however, so when we call smb_receive again, the size passed to
kernel_recvmsg is larger than it should be.

I'm going to test a patch that adjusts rq_rlen along with rq_bytes_received in
smb_receive.


Comment 31 Jeff Layton 2007-10-26 20:41:43 UTC
Created attachment 239561 [details]
patch -- calculate rq_rlen locally in smb_receive

I think this patch does the right thing. Changing req->rq_rlen didn't work.
There are other routines that expect that that won't change for the life of the
request. Since rq_bytes_received is set to 0 at the end of smb_receive_param,
this should be safe. If the request takes more than 1 pass through
smb_recv_data, then the amount of data already read is now taken into account.

I've not been able to reproduce any "Invalid NBT Packet" messages with this
patch in place.

Comment 34 Jeff Layton 2007-10-27 13:21:52 UTC
I've built some kernels that contain the patches in comment #26 and comment #31
and put them on my people page:

http://people.redhat.com/jlayton/

Please test them and let me know whether this problem is still reproducible.


Comment 35 Jeff Layton 2007-10-31 13:43:29 UTC
Posted a couple of smbfs patches upstream today, one is variant of the patch in
comment #31 and the other is one to fix up smbfs debug builds. Awaiting review
there at this point...



Comment 36 Jeff Layton 2007-11-06 12:17:25 UTC
Created attachment 249001 [details]
patch -- calculate rq_rlen locally in smb_receive

This patch is a bit cleaner, but is functionally the same.

Guenter Kukkukk reviewed and acked this patch. I've gone ahead and pushed it
upstream to Andrew. If Andrew takes it in -mm, I'll plan to propose this for
4.7.

Comment 37 Jeff Layton 2007-11-09 13:19:20 UTC
This patch has been accepted into -mm, I'll probably plan to roll up a set of
RHEL4 smbfs patches and post them internally for review.


Comment 38 Jeff Layton 2007-11-13 11:20:02 UTC
*** Bug 373171 has been marked as a duplicate of this bug. ***

Comment 39 Jeff Layton 2007-11-28 15:45:33 UTC
I've folded this patch into a larger patchset that I've posted internally for
review. Closing this as a duplicate of the bug tracking the larger set.

Please reopen this bug if the customer reports that this patch doesn't fix their
issue.


*** This bug has been marked as a duplicate of 355141 ***


Note You need to log in before you can comment on or make changes to this bug.