Description of problem: while reading several GB of data through an smbfs mountpoint (several tar -cvf would do it), a message is printed out in the kernel ring for the client that reports (or something similar) : May 27 04:51:28 sbr-qse1-3 kernel: smb_get_length: Invalid NBT packet, code=6c after which smbiod CPU spikes, and the mount point becomes unaccessible as shown below : $ ls -al /net/idx ls: /net/idx: Input/output error Version-Release number of selected component (if applicable): kernel-2.6.9-55.ELsmp How reproducible: Allways Steps to Reproduce: 1. export authenticated share using samba (writeable=yes, public=no) 2. mount share in client using smbfs 3. run several processes that just read the data from that mount point until deadlock is triggered. Actual results: $ ls -al /net/idx ls: /net/idx: Input/output error Expected results: no hungs that require manual intervention to recover Additional info: similar to the issue shown in BUG175430 except that at least on i386, the problem is recoverable by stopping all processes in that mount point and remounting it. BUG177523 is probably also similar, as it is triggered by read operations as well, but again, no PANIC is observed. I suspect there might be also a problem with the mount.smbfs process as the server shows the connection is closed, and on the client together with the following kernel messages : May 27 05:09:27 sbr-qse1-3 kernel: smb_add_request: request [ed5bb380, mid=56991 ] timed out! May 27 05:09:27 sbr-qse1-3 kernel: smb_add_request: request [ed5bba80, mid=56992 ] timed out! May 27 05:10:57 sbr-qse1-3 kernel: smb_lookup: find Engines.200705250237/3 faile d, error=-5 there are also errors from mount.smbfs having problems with its locking DB as shown by : May 27 05:06:05 sbr-qse1-3 mount.smbfs[14537]: tdb(/var/cache/samba/gencache.t db): tdb_lock failed on list 21 ltype=1 (Bad file descriptor) May 27 05:06:05 sbr-qse1-3 mount.smbfs[14537]: [2007/05/27 05:06:05, 0] tdb/tdbu til.c:tdb_log(725) May 27 05:06:05 sbr-qse1-3 mount.smbfs[14537]: tdb(/var/cache/samba/gencache.t db): tdb_lock failed on list 21 ltype=1 (Bad file descriptor)
the mount.smbfs errors are being tracked in BUG241575
If the kernel is showing errors, such that a remount is required, then you have bigger problems than buggy smbmount tools. My point is that this: May 27 04:51:28 sbr-qse1-3 kernel: smb_get_length: Invalid NBT packet, code=6c should never happen. If it does, then the integrity of the data on the connection is in question, and the kernel-level issue needs to be addressed regardless. is switching to cifsfs an option?
switching to cifs is currently the implemented workaround, but is not a solution as it runs fully in kernel space and would require a system reboot if hung. agree about this being a bigger problem than buggy smbmount tools, which is why I splitted this issue in 2 bugs to be addressed independently.
I tried to set up a reproducer for this. It basically runs an infinite loop that cpio's a kernel source directory that's sitting on a samba share to /dev/null. The dir is ~2G and the memory in this box is 512M, so pagecache should be getting completely flushed out each run and smbiod should be nice and busy. It's been running >12 hours and is still going. I'm using the kernels on my people page: http://people.redhat.com/jlayton ...and am using the following samba-client pkg: samba-client-3.0.10-1.4E.11 I did see a few messages in the ring buffer: smbiod_handle_request: smbiod got a request ... and we don't implement oplocks! smb_get_length: Invalid NBT packet, code=80 smb_add_request: request [ffffff801b7a8500, mid=17180] timed out! smb_proc_readdir_long: error=-5, breaking smb_get_length: Invalid NBT packet, code=7c smb_add_request: request [ffffff8008a29e00, mid=54912] timed out! smb_proc_readdir_long: error=-5, breaking ..but smbiod seems to still be moving along. Could you test the kernels on my people page and see if this problem is already resolved for you in them? If it's not then could you provide some details of how you're reproducing this?
sorry about that, I couldn't reproduce either using as the sink /dev/null but using a real file in the local disk as for some reason using /dev/null seems to run all from cache (even if it wouldn't fit, because we have 6GB of memory there and the data set is bigger than that); probably should had mention that. so my setup for reproducing this has in the server (sbr-qse1-1) a share which is authenticated with a local account # useradd foo # passwd foo New UNIX password: bar BAD PASSWORD: it is WAY too short Retype new UNIX password: bar passwd: all authentication tokens updated successfully. # smbpasswd -a foo New SMB password: bar Retype new SMB password: bar Added user foo. and share in samba (/etc/samba/smb.conf) that exports that for that user as : [foo] path = /home/foo valid users = foo public = no writeable = yes in the client side I am mounting that share by doing something like : # mount -t smbfs //sbr-qse1-1.shopping.com/foo -o username=foo,password=bar /mnt and 3 terminals just running : # while true; do tar -cvf /var/tmp/mnt1.tar /mnt; done # while true; do tar -cvf /var/tmp/mnt2.tar /mnt; done # while true; do tar -cvf /var/tmp/mnt3.tar /mnt; done
Ok, I'll look at this when I get an opportunity. I'm not sure about this statement, however: > switching to cifs is currently the implemented workaround, but is not a > solution as it runs fully in kernel space and would require a system reboot > if hung. That seems like faulty reasoning to me. Given that smbfs has significant kernel component, you may *still* be rebooting if there are problems. And I believe that problems are much more likely with smbfs. The design of smbfs is just not as robust as cifs. The CIFS code is in much better shape than the smbfs code both upstream and in RHEL. So much so that smbfs is disabled in RHEL5.
tested with 2.6.9-55.4.EL.jtltest.9smp and was still able to reproduce the problem. [root@sbr-qse1-3 log]# ls -al /mnt ls: /mnt: Input/output error [root@sbr-qse1-3 log]# ps faux | head -1 && ps faux | grep smb | grep -v grep USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 3296 0.0 0.0 8760 1708 ? Ss Jun12 0:00 /sbin/mount.smbfs //sbr-qse1-1.shopping.com/foo /mnt/ -o rw username foo password XXXXX root 3298 96.5 0.0 0 0 ? R Jun12 2700:24 [smbiod] which leaves a smbiod kthread in a busy loop until the mountpoint is remounted : [root@sbr-qse1-3 log]# umount /mnt [root@sbr-qse1-3 log]# lsmod | grep smbfs smbfs 63417 0 [root@sbr-qse1-3 log]# mount -t smbfs -o username=foo,password=bar //sbr-qse1-1.shopping.com/foo /mnt [root@sbr-qse1-3 log]# ls -al /mnt/ | wc -l 21
(In reply to comment #6) > > switching to cifs is currently the implemented workaround, but is not a > > solution as it runs fully in kernel space and would require a system reboot > > if hung. > > That seems like faulty reasoning to me. Given that smbfs has significant kernel > component, you may *still* be rebooting if there are problems. when I mentioned that cifs was fully in kernel space I didn't imply that smbfs didn't have a kernel component as well, but that eventhough it has a one, still relies on a user level helper for managing the mount point and the connection to the server. so in case of a lockup it can be unmounted, the userspace helper can be killed if hung and the kernel hooks removed, that way allowing for the kernel part to be also restarted by removing the module and loading it back. All without a reboot. of course, if the failure is on smbfs, the kernel will crash and require a reboot, but at least on the current version we are using in production (from 2.4 without smbiod) we had been so far lucky enough not to hit a bug like that. > And I believe > that problems are much more likely with smbfs. The design of smbfs is just not > as robust as cifs. agree, but since we limit our use of smbfs to very basic operations which are hopefully mature the design deficiencies of smbfs are not that big of a concern, and the limits of functionality mean also a reduction in code complexity and size (1/4 of cifs looking at the 2.6.20 sources). from our tests with smbfs (from 2.6 with smbiod) we had only 1 ocasion of smbfs related crash, and it was while doing writes and on smbiod due to a kernel spinlock bug. with cifs we had more serious crashes and data corruption concerns.
I'm trying again to reproduce this using a method closer to what you're using, but still no luck so far. In the meantime, I'm looking over this code. smb_get_length also does some sanity checking of the packets by checking the session packet type. From RFC 1002: Session Packet Types (in hexidecimal): 00 - SESSION MESSAGE 81 - SESSION REQUEST 82 - POSITIVE SESSION RESPONSE 83 - NEGATIVE SESSION RESPONSE 84 - RETARGET SESSION RESPONSE 85 - SESSION KEEP ALIVE ...the fact that we're getting occasional packets that don't look like this leads me to think that this has some sort of alignment issue.
Went through git and pulled out pretty much any smbfs-related patch that had been committed since RHEL4's release that looked like it might be related. Upstream development on smbfs has been pretty much dead for a long while now, so there weren't many: old-2.6-bkcvs: b189c38d82770a5b281233babbf318caac09ad74 linux-2.6: caf736085f2f0d22a992a855d9caae14973f7ea4 ac34dd052400b1e63aa8e711a13c0670943296fd 54b21a7992a31d30c9a91f7e0a00ffdb4bd0caee 53b27584db0446e356c86b3aa43f5107b0170704 ccb6e363a68144cdfdaa6d964d63d620c8ac9a9b ...I applied these and still got occasional errors like: smb_get_length: Invalid NBT packet, code=fb ...though I don't get the situation where smbiod starts eating CPU time. I'm taking this to either mean that this is still an upstream bug, or is due to a non-smbfs specific problem (that may or may not have been fixed upstream). When I get time, I'll take a crack at building a smbfs kernel module for f7 and seeing if I can reproduce it there.
Added a dump_stack() to smb_add_request just after the "timed out!" message (still working with RHEL4 here): smb_get_length: Invalid NBT packet, code=5a smb_add_request: request [ffffff8007d7fc80, mid=11925] timed out! Call Trace:<ffffffffa01b3060>{:smbfs:smb_add_request+1117} <ffffffff8012daec>{autoremove_wake_function+0} <ffffffff8012daec>{autoremove_wake_function+0} <ffffffffa01abae1>{:smbfs:smb_request_ok+44} <ffffffffa01ac886>{:smbfs:smb_proc_readX+220} <ffffffffa01b1a0b>{:smbfs:smb_readpage+208} <ffffffff8015a204>{read_pages+177} <ffffffff8015749c>{__alloc_pages+200} <ffffffff8015a5a2>{do_page_cache_readahead+345} <ffffffff8015a7ad>{page_cache_readahead+478} <ffffffff80153ab6>{do_generic_mapping_read+295} <ffffffff80153e9b>{file_read_actor+0} <ffffffff80155d3b>{__generic_file_aio_read+382} <ffffffff80155ee7>{generic_file_read+187} <ffffffff80174eab>{do_sync_write+178} <ffffffff8012daec>{autoremove_wake_function+0} <ffffffffa01b0e06>{:smbfs:smb_revalidate_inode+172} <ffffffffa01b1cf7>{:smbfs:smb_file_read+99} <ffffffff80174dc4>{vfs_read+207} <ffffffff80175020>{sys_read+69} <ffffffff8010d66e>{system_call+134} <ffffffff8010d5e8>{system_call+0} ...in almost all cases, it seems to happen on a read call, and the stack traces are generally very similar. So it seems like we see this Invalid NBT packet message, and then queueing the job to smbiod times out. It also seems to generally happen on read calls (though maybe that's an artifact of the fact that most of what I'm doing here are read calls).
Tested this on a f7 xen kernel (2.6.20-2925.9.fc7xen): smb_get_length: Invalid NBT packet, code=9 smb_proc_readX_data: offset is larger than SMB_READX_MAX_PAD or negative! smb_proc_readX_data: -35 > 64 || -35 < 0 smb_add_request: request [ffff88001f5a6800, mid=60595] timed out! smb_add_request: request [ffff880017c3cc80, mid=60596] timed out! smb_add_request: request [ffff88001f5a6980, mid=60597] timed out! smb_get_length: Invalid NBT packet, code=69 smb_add_request: request [ffff88001f5a6680, mid=35822] timed out! smb_add_request: request [ffff88001f5a6b00, mid=39974] timed out! smb_get_length: Invalid NBT packet, code=74 ...so this appears to be a long standing upstream bug. It did throw some other messages that I haven't seen before (the SMB_READX_MAX_PAD stuff). So maybe that's related...
...again though, I've yet to reproduce this situation where smbiod starts eating CPU time. I'm presuming that the "Invalid NBT packet" messages are related to that problem and am going to see if I can track down the cause of that and the smb_add_request timeouts.
Created attachment 236241 [details] set of upstream patches that may or may not help this I went through the upstream git trees and pulled out some smbfs patches that looked like they might affect this. Most of them are probably patches we want anyway, so I may try to queue some of these up for 4.7. I'm testing a kernel with these patches now to see if I can get it to fail...
Since we don't have a way to reproduce these hangs, I'll focus for now on these messages: smb_get_length: Invalid NBT packet, code=3d ...since these packets look like garbage, that suggests to me that in certain cases the prior request is either reading too much or not enough data off the socket before starting the next request. The next request comes in and tries to read the length, but the read is not aligned with the head of the response and the data looks like garbage. I added a bit of debug code to smb_request_recv, looking for cases where we're starting a new response (SMB_RECV_START) but smb_len (which should represent the total packet length) is not the same as smb_read (which should be how much data was read from the socket). This showed up in the ring buffer: mismatch -- smb_read=4159 smb_len=3341 smb_get_length: Invalid NBT packet, code=3d There's a difference of 818 bytes here.
Again, focusing on the bad packet alignment for now. The problem seems to be that smb_receive occasionally calls kernel_recvmsg with a larger size parameter than smb_len-smb_read. I added this just before the "out:" label in smb_receive: if (server->smb_read > server->smb_len) PARANOIA("smb_read=%d smb_len(req->rq_header)=%d smb_len=%d req->rq_rlen=%d req->rq_bytes_recvd=%d rlen=%d result=%d state=%d\n", server->smb_read, smb_len(req->rq_header), server->smb_len, req->rq_rlen, req->rq_bytes_recvd, rlen, result, server->rstate); One such printk looked like this: smb_receive: smb_read=1006 smb_len(req->rq_header)=963 smb_len=967 req->rq_rlen=904 req->rq_bytes_recvd=943 rlen=904 result=58 state=4 This is after smb_read and rq_bytes_received was adjusted, so on entry to the function: server->smb_read = 948 req->rq_bytes_recvd = 885 On entry into the function, there should have been only 19 bytes (967-948) left in this response, but it calls kernel_recvmsg with a size of 904 (rlen, which is set from rq_rlen). rq_rlen seems to almost always be set by this formula: req->rq_rlen = smb_len(req->rq_header) + 4 - req->rq_bytes_recvd; ...so in order to get a rq_rlen of 904 from a request that has a smb_len of 963, req->rq_bytes_received had to have been 63 at the time it was set. So apparently we're occasionally adjusting rq_bytes_recvd and then attempting to use rq_rlen without recalculating it.
rq_rlen is only recalculated in the SMB_RECV_PARAM phase. So I think I can see a possible scenario for these errors: smbiod calls smb_recv_param() for the req and then calls smb_recv_data(). We don't receive the entire response (req->rq_bytes_recvd < req->rq_rlen), but we get some of it. smb_recv_data exits early, and we go back into smb_request_recv with server->rstate still set to SMB_RECV_DATA. rq_rlen is still set to the original value, however, so when we call smb_receive again, the size passed to kernel_recvmsg is larger than it should be. I'm going to test a patch that adjusts rq_rlen along with rq_bytes_received in smb_receive.
Created attachment 239561 [details] patch -- calculate rq_rlen locally in smb_receive I think this patch does the right thing. Changing req->rq_rlen didn't work. There are other routines that expect that that won't change for the life of the request. Since rq_bytes_received is set to 0 at the end of smb_receive_param, this should be safe. If the request takes more than 1 pass through smb_recv_data, then the amount of data already read is now taken into account. I've not been able to reproduce any "Invalid NBT Packet" messages with this patch in place.
I've built some kernels that contain the patches in comment #26 and comment #31 and put them on my people page: http://people.redhat.com/jlayton/ Please test them and let me know whether this problem is still reproducible.
Posted a couple of smbfs patches upstream today, one is variant of the patch in comment #31 and the other is one to fix up smbfs debug builds. Awaiting review there at this point...
Created attachment 249001 [details] patch -- calculate rq_rlen locally in smb_receive This patch is a bit cleaner, but is functionally the same. Guenter Kukkukk reviewed and acked this patch. I've gone ahead and pushed it upstream to Andrew. If Andrew takes it in -mm, I'll plan to propose this for 4.7.
This patch has been accepted into -mm, I'll probably plan to roll up a set of RHEL4 smbfs patches and post them internally for review.
*** Bug 373171 has been marked as a duplicate of this bug. ***
I've folded this patch into a larger patchset that I've posted internally for review. Closing this as a duplicate of the bug tracking the larger set. Please reopen this bug if the customer reports that this patch doesn't fix their issue. *** This bug has been marked as a duplicate of 355141 ***