Bug 168280

Summary: 30 second pause on samba mounts
Product: Red Hat Enterprise Linux 4 Reporter: David Bestor <redbugme3210>
Component: sambaAssignee: Jeff Layton <jlayton>
Status: CLOSED ERRATA QA Contact: David Lawrence <dkl>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: anton.rops, carenas, cnd, ddomingo, kzak, petero, samba-bugs-list, ssorce, staubach, steved
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://www.ianag.com/linux/t-466198.html
Whiteboard:
Fixed In Version: RHBA-2007-0791 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-11-15 16:13:09 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 248673    
Attachments:
Description Flags
patch -- proposed by Vasily Averin in bug 234300 none

Description David Bestor 2005-09-14 13:48:17 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050910 CentOS/1.0.6-1.4.2.1ment Firefox/1.0.6

Description of problem:
I know probably wrong category..

But the samba in the 4.2 beta (samba-3.0.10-1.4E.2) seems to have issues.
I included a link of another who sees the same.

There is about a 30 second pause when doing samba mounts on the
clients (either autofs'ed or command line)

Message are 
dmesg :
smb_retry: no connection process
smb_add_request: request [cddafee0, mid=0] timed out!
smb_delete_inode: could not close inode 2

After about 30 seconds, the mounts are clean and everything is fine.

Went back to samba-3.0.10-1.4E on both server and clients. No issues..


Version-Release number of selected component (if applicable):
samba-3.0.10-1.4E.2

How reproducible:
Always

Steps to Reproduce:
1. create a share on server
2. mount that share on client
3. wait for 30seconds
  

Actual Results:  30 second pause

Expected Results:  No 30 second wait

Additional info:

I am using CentOS 4.x with a rebuilt 4.2 beta samba-3.0.10-1.4E.2 rpm

I included a link from someone else seeing something similar :
http://www.ianag.com/linux/t-466198.html

Mine is a linux client to a linux server. same errors tho.

Comment 1 David Bestor 2005-09-14 14:54:06 UTC
New info...
server: samba-3.0.10-1.4E
client: samba-3.0.10-1.4E
  No Long pause on client mount

server: samba-3.0.10-1.4E.2
client: samba-3.0.10-1.4E.2
  Long pause on client mount

server: samba-3.0.10-1.4E
client: samba-3.0.10-1.4E.2
  Long pause on client mount

Remove Patch20: samba-3.0.10-bug157208.patch with
#%patch20 -p1 -b .bug157208

server: samba-3.0.10-1.4E
client: samba-3.0.10-1.4E.2
  No Long pause on client mount

server: samba-3.0.10-1.4E.2
client: samba-3.0.10-1.4E.2
  No Long pause on client mount

So is samba-3.0.10-bug157208.patch dependent on something else?

I dont have access to bug 157208 but
if there is a dependency its missing in the spec file.


Comment 2 Sam Sharpe 2005-10-10 12:43:00 UTC
I am getting the same behaviour in RHEL4 U2, but against fully patched MS
Windows Server 2003. I have pam_mount running on login to mount user's
filespace, which worked perfectly under U1, but now gives the same error as the
report above.




Comment 3 Sam Sharpe 2005-10-10 15:54:01 UTC
weirdly I was still experiencing the same behaviour when I removed Patch20, but
when I removed Patch17:

#Patch17: samba-3.0.12rc1-gcc4.patch

I no longer get the hangs...



Comment 4 Chris 2005-10-17 23:13:25 UTC
If you hit Ctrl-Z to suspend the mount command, then immediately type "bg" to 
background the task, the mount immediately completes successfully.

Looks like the bug is related to some kind of waiting around for a signal that 
never arrives, and the Ctrl-Z+"bg" "hack" delivers whatever it's waiting for...

# rpm -q --whatprovides mount
util-linux-2.12a-16.EL4.12


# mount -w -t smbfs //192.168.0.11/xxx /x -o username=xxx 
Password: 

[1]+  Stopped                 mount -w -t smbfs //192.168.0.11/xxx /x -o 
username=xxx
# bg
[1]+ mount -w -t smbfs //192.168.0.11/xxx /x -o username=xxx &
# 
[1]+  Done                    mount -w -t smbfs //192.168.0.11/xxx /x -o 
username=xxx
# 


Comment 5 Chris 2005-10-20 13:11:47 UTC
The problem is identical using /sbin/mount.smbfs isntead of the "mount" 
command.

Below - I issued the command, waited 30 seconds (it still did not complete) so 
I hit Ctrl-Z, bg, and it completed immediately OK:-

[root@ca2 xxx]# date;/sbin/mount.smbfs //192.168.0.11/xxx /x -o 
username=xxx;date
Thu Oct 20 13:10:58 UTC 2005
Password: 

[1]+  Stopped                 /sbin/mount.smbfs //192.168.0.11/xxx /x -o 
username=xxx
Thu Oct 20 13:11:27 UTC 2005
[root@ca2 xxx]# bg
[1]+ /sbin/mount.smbfs //192.168.0.11/xxx /x -o username=xxx &
[root@ca2 xxx]# 
[1]+  Done                    /sbin/mount.smbfs //192.168.0.11/xxx /x -o 
username=xxx
[root@ca2 xxx]# 


Comment 6 Karel Zak 2005-10-20 13:33:55 UTC
*** Bug 171082 has been marked as a duplicate of this bug. ***

Comment 7 Taichi Yanagiya 2005-12-19 08:44:28 UTC
Workaround: set up both of fmask and dmask.

(smbmount -o fmask=777,dmask=777)


Comment 8 Jeff Layton 2007-05-09 14:04:53 UTC
*** Bug 202934 has been marked as a duplicate of this bug. ***

Comment 9 Jeff Layton 2007-05-10 19:42:22 UTC
Created attachment 154493 [details]
patch -- proposed by Vasily Averin in bug 234300

This patch was proposed by Vasily Averin in bug 234300. Still looking over it
to make sure that it's correct and reasonable, but it does seem to alleviate
this problem.

Comment 10 Jeff Layton 2007-05-11 14:22:46 UTC
This patch seems to be a missing delta from the patch that went in for bug
#157402 (i.e. upstream patch has this delta, but the RHEL4 patch doesn't).


Comment 11 Need Real Name 2007-05-11 16:20:01 UTC
That "patch" looks like it's a quick-and-dirty hack to force the workaround.

It's clearly got nothing to do with anything that might be causing a 30-second 
timeout.

If you can't trace the code to find out where it's waiting for those 30 
seconds, and thus locate the actual bug, perhaps looking at a diff between 
known working and broken versions should lure it out?

Comment 13 Jeff Layton 2007-05-11 17:47:57 UTC
The pauses may be due to a different reason entirely, but the unitialized
file_mode and dir_mode are definitely an issue in an of themselves. Here's what
we see in userspace:

[pid  2454] mount("//server/testuser", ".", "smbfs",
MS_MGC_VAL|MS_NOSUID|MS_NODEV, "version=7,") = -1 ENOTDIR (Not a directory)
[pid  2454] mount("//server/testuser", ".", "smbfs",
MS_MGC_VAL|MS_NOSUID|MS_NODEV, "\6") = 0

...so the first mount eventually fails (after a long pause) and then falls back
to doing the mount with "old style" options (using a struct instead of a
string). During the mount, we call smb_fill_super that parses the mount options,
and populates the smb_mount_data_kernel struct for the mount. The file_mode and
dir_mode end up being filled out with 0's.

This means that when we call smb_iget, we end up initing a special inode, but
the mode is still bogus. We eventually return back up to the VFS, but the sb
struct has a root inode that's not a directory (mode is 0), so it starts tearing
down the superblock and preparing to return an error.

During the teardown, one of the things it does is delete this inode, and I think
this is where the delay comes in. I believe it calls into smbiod, which for some
reason (which is the part I'm not sure of) discards this request.

Stack trace from during the delay:

smbmnt        S dead4ead00000001     0  2180   2179                     (NOTLB)
ffffff801b22b9e8 0000000000000282 ffffff8000bf0570 0000007400bf0530 
       ffffff801a6b27f0 00000000002ddeaa 0002c5ffe0b4938b ffffff801db82030 
       ffffff801a6b2a88 ffffffff80129988 
Call Trace:<ffffffff80129988>{try_to_wake_up+731}
<ffffffff80138954>{__mod_timer+293} 
       <ffffffff80295566>{schedule_timeout+362}
<ffffffff80139565>{process_timeout+0} 
       <ffffffffa01a7f68>{:smbfs:smb_add_request+925}
<ffffffff8012dbf0>{autoremove_wake_function+0} 
       <ffffffff8015a6a0>{dbg_redzone1+30}
<ffffffff8012dbf0>{autoremove_wake_function+0} 
       <ffffffffa01a0ae1>{:smbfs:smb_request_ok+44}
<ffffffffa01a12a3>{:smbfs:smb_proc_close+98} 
       <ffffffffa01a1440>{:smbfs:smb_proc_close_inode+172} 
       <ffffffff801ce1b5>{superblock_doinit+1547}
<ffffffff801cd50e>{avc_has_perm+70} 
       <ffffffffa01a1492>{:smbfs:smb_close+37}
<ffffffffa01a5e2a>{:smbfs:smb_delete_inode+54} 
       <ffffffffa01a5df4>{:smbfs:smb_delete_inode+0}
<ffffffff8018ee56>{generic_delete_inode+190} 
       <ffffffff8018ed98>{generic_delete_inode+0} <ffffffff8018bb68>{dput+463} 
       <ffffffff8017b05a>{generic_shutdown_super+63}
<ffffffff8017bd40>{kill_anon_super+9} 
       <ffffffff8017b002>{deactivate_super+95} <ffffffff80191352>{do_add_mount+332} 
       <ffffffff80191e12>{do_mount+1721} <ffffffff801ce2d3>{inode_has_perm+89} 
       <ffffffff8018b9d1>{dput+56} <ffffffff801533b0>{find_get_page+57} 
       <ffffffff801e6b7d>{__up_read+16} <ffffffff80191703>{copy_mount_options+157} 
       <ffffffff801429ad>{search_exception_tables+29}
<ffffffff80119891>{do_page_fault+870} 
       <ffffffff80165ed6>{handle_mm_fault+556} <ffffffff8018b9d1>{dput+56} 
       <ffffffff80157054>{buffered_rmqueue+384}
<ffffffff8015acc8>{check_poison_obj+48} 
       <ffffffff8015ab4e>{poison_obj+54} <ffffffff80157244>{__alloc_pages+200} 
       <ffffffff80192196>{sys_mount+186} <ffffffff8010d66e>{system_call+134} 
       <ffffffff8010d5e8>{system_call+0} 
smbiod        S 0000000000000001     0  2181      1                1934 (L-TLB)
ffffff801b1f7eb8 0000000000000246 ffffff801b1f7e08 0000000000000000 
       ffffff801a6b2030 000000000000719c 0002c5ffe0b72b1f ffffff8000a2a030 
       ffffff801a6b22c8 0000000000000000 
Call Trace:<ffffffffa01a77a2>{:smbfs:smbiod+172}
<ffffffff8012dbf0>{autoremove_wake_function+0} 
       <ffffffff8012dbf0>{autoremove_wake_function+0}
<ffffffff80129df8>{schedule_tail+55} 
       <ffffffff8010e092>{child_rip+8} <ffffffffa01a76f6>{:smbfs:smbiod+0} 
       <ffffffff8010e08a>{child_rip+0} 


clearly smbiod isn't doing much of anything, and smbmnt is waiting on it come back.

So, arguably, this is a different bug -- we should probably return an error here
instead of timing out, but the root cause of this particular issue is that the
file and dir modes are not properly set. I don't think this patch is papering
over the issue at all.


Comment 14 Jeff Layton 2007-05-11 17:55:07 UTC
As a side note, while I'm not averse to patching smbfs where we have clear bugs
and patches, we're strongly suggesting that people move to using CIFS. The
upstream CIFS code is in much better shape than the smbfs code, and we track
CIFS upstream pretty closely in RHEL.


Comment 15 Jeff Layton 2007-05-11 18:00:07 UTC
FWIW, the 30 second pause comes from smb_add_request:

        timeleft = wait_event_interruptible_timeout(req->rq_wait,
                                    req->rq_flags & SMB_REQ_RECEIVED, 30*HZ);

(note the 30*HZ here). It's sitting on the smbiod wait queue, and just never
waking up.



Comment 16 RHEL Program Management 2007-06-21 20:14:35 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 18 Jason Baron 2007-06-22 14:29:22 UTC
committed in stream U6 build 55.11. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 20 Don Domingo 2007-08-23 02:28:03 UTC
added to RHEL4.6 release notes under "Kernel-Related Updates":

<quote>
dir_mode and file_mode now have default values
</quote>

please advise if any revisions are necessary. thanks!



Comment 21 Simo Sorce 2007-09-12 14:05:19 UTC
*** Bug 241442 has been marked as a duplicate of this bug. ***

Comment 22 Issue Tracker 2007-10-24 12:08:03 UTC
Hi,

This customer as not provided any feed back on this, but i have one more
customer having same issue and for him thing works fine with the test
kerenl.

Hope this helps.

Kaustubh.

Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by jnansi 
 issue 124805

Comment 24 errata-xmlrpc 2007-11-15 16:13:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html