168280 – 30 second pause on samba mounts

Bug 168280 - 30 second pause on samba mounts

Summary: 30 second pause on samba mounts

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	samba
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	David Lawrence
Docs Contact:
URL:	http://www.ianag.com/linux/t-466198.html
Whiteboard:
Duplicates (3):	171082 202934 241442 (view as bug list)
Depends On:
Blocks:	248673
TreeView+	depends on / blocked

Reported:	2005-09-14 13:48 UTC by David Bestor
Modified:	2018-10-19 19:53 UTC (History)
CC List:	10 users (show)
Fixed In Version:	RHBA-2007-0791
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2007-11-15 16:13:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
patch -- proposed by Vasily Averin in bug 234300 (731 bytes, patch) 2007-05-10 19:42 UTC, Jeff Layton	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2007:0791	0	normal	SHIPPED_LIVE	Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6	2007-11-14 18:25:55 UTC

Description David Bestor 2005-09-14 13:48:17 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.10) Gecko/20050910 CentOS/1.0.6-1.4.2.1ment Firefox/1.0.6

Description of problem:
I know probably wrong category..

But the samba in the 4.2 beta (samba-3.0.10-1.4E.2) seems to have issues.
I included a link of another who sees the same.

There is about a 30 second pause when doing samba mounts on the
clients (either autofs'ed or command line)

Message are 
dmesg :
smb_retry: no connection process
smb_add_request: request [cddafee0, mid=0] timed out!
smb_delete_inode: could not close inode 2

After about 30 seconds, the mounts are clean and everything is fine.

Went back to samba-3.0.10-1.4E on both server and clients. No issues..


Version-Release number of selected component (if applicable):
samba-3.0.10-1.4E.2

How reproducible:
Always

Steps to Reproduce:
1. create a share on server
2. mount that share on client
3. wait for 30seconds
  

Actual Results:  30 second pause

Expected Results:  No 30 second wait

Additional info:

I am using CentOS 4.x with a rebuilt 4.2 beta samba-3.0.10-1.4E.2 rpm

I included a link from someone else seeing something similar :
http://www.ianag.com/linux/t-466198.html

Mine is a linux client to a linux server. same errors tho.

Comment 1 David Bestor 2005-09-14 14:54:06 UTC

New info...
server: samba-3.0.10-1.4E
client: samba-3.0.10-1.4E
  No Long pause on client mount

server: samba-3.0.10-1.4E.2
client: samba-3.0.10-1.4E.2
  Long pause on client mount

server: samba-3.0.10-1.4E
client: samba-3.0.10-1.4E.2
  Long pause on client mount

Remove Patch20: samba-3.0.10-bug157208.patch with
#%patch20 -p1 -b .bug157208

server: samba-3.0.10-1.4E
client: samba-3.0.10-1.4E.2
  No Long pause on client mount

server: samba-3.0.10-1.4E.2
client: samba-3.0.10-1.4E.2
  No Long pause on client mount

So is samba-3.0.10-bug157208.patch dependent on something else?

I dont have access to bug 157208 but
if there is a dependency its missing in the spec file.

Comment 2 Sam Sharpe 2005-10-10 12:43:00 UTC

I am getting the same behaviour in RHEL4 U2, but against fully patched MS
Windows Server 2003. I have pam_mount running on login to mount user's
filespace, which worked perfectly under U1, but now gives the same error as the
report above.

Comment 3 Sam Sharpe 2005-10-10 15:54:01 UTC

weirdly I was still experiencing the same behaviour when I removed Patch20, but
when I removed Patch17:

#Patch17: samba-3.0.12rc1-gcc4.patch

I no longer get the hangs...

Comment 4 Chris 2005-10-17 23:13:25 UTC

If you hit Ctrl-Z to suspend the mount command, then immediately type "bg" to 
background the task, the mount immediately completes successfully.

Looks like the bug is related to some kind of waiting around for a signal that 
never arrives, and the Ctrl-Z+"bg" "hack" delivers whatever it's waiting for...

# rpm -q --whatprovides mount
util-linux-2.12a-16.EL4.12


# mount -w -t smbfs //192.168.0.11/xxx /x -o username=xxx 
Password: 

[1]+  Stopped                 mount -w -t smbfs //192.168.0.11/xxx /x -o 
username=xxx
# bg
[1]+ mount -w -t smbfs //192.168.0.11/xxx /x -o username=xxx &
# 
[1]+  Done                    mount -w -t smbfs //192.168.0.11/xxx /x -o 
username=xxx
#

Comment 5 Chris 2005-10-20 13:11:47 UTC

The problem is identical using /sbin/mount.smbfs isntead of the "mount" 
command.

Below - I issued the command, waited 30 seconds (it still did not complete) so 
I hit Ctrl-Z, bg, and it completed immediately OK:-

[root@ca2 xxx]# date;/sbin/mount.smbfs //192.168.0.11/xxx /x -o 
username=xxx;date
Thu Oct 20 13:10:58 UTC 2005
Password: 

[1]+  Stopped                 /sbin/mount.smbfs //192.168.0.11/xxx /x -o 
username=xxx
Thu Oct 20 13:11:27 UTC 2005
[root@ca2 xxx]# bg
[1]+ /sbin/mount.smbfs //192.168.0.11/xxx /x -o username=xxx &
[root@ca2 xxx]# 
[1]+  Done                    /sbin/mount.smbfs //192.168.0.11/xxx /x -o 
username=xxx
[root@ca2 xxx]#

Comment 6 Karel Zak 2005-10-20 13:33:55 UTC

*** Bug 171082 has been marked as a duplicate of this bug. ***

Comment 7 Taichi Yanagiya 2005-12-19 08:44:28 UTC

Workaround: set up both of fmask and dmask.

(smbmount -o fmask=777,dmask=777)

Comment 8 Jeff Layton 2007-05-09 14:04:53 UTC

*** Bug 202934 has been marked as a duplicate of this bug. ***

Comment 9 Jeff Layton 2007-05-10 19:42:22 UTC

Created attachment 154493 [details]
patch -- proposed by Vasily Averin in bug 234300

This patch was proposed by Vasily Averin in bug 234300. Still looking over it
to make sure that it's correct and reasonable, but it does seem to alleviate
this problem.

Comment 10 Jeff Layton 2007-05-11 14:22:46 UTC

This patch seems to be a missing delta from the patch that went in for bug
#157402 (i.e. upstream patch has this delta, but the RHEL4 patch doesn't).

Comment 11 Need Real Name 2007-05-11 16:20:01 UTC

That "patch" looks like it's a quick-and-dirty hack to force the workaround.

It's clearly got nothing to do with anything that might be causing a 30-second 
timeout.

If you can't trace the code to find out where it's waiting for those 30 
seconds, and thus locate the actual bug, perhaps looking at a diff between 
known working and broken versions should lure it out?

Comment 13 Jeff Layton 2007-05-11 17:47:57 UTC

The pauses may be due to a different reason entirely, but the unitialized
file_mode and dir_mode are definitely an issue in an of themselves. Here's what
we see in userspace:

[pid  2454] mount("//server/testuser", ".", "smbfs",
MS_MGC_VAL|MS_NOSUID|MS_NODEV, "version=7,") = -1 ENOTDIR (Not a directory)
[pid  2454] mount("//server/testuser", ".", "smbfs",
MS_MGC_VAL|MS_NOSUID|MS_NODEV, "\6") = 0

...so the first mount eventually fails (after a long pause) and then falls back
to doing the mount with "old style" options (using a struct instead of a
string). During the mount, we call smb_fill_super that parses the mount options,
and populates the smb_mount_data_kernel struct for the mount. The file_mode and
dir_mode end up being filled out with 0's.

This means that when we call smb_iget, we end up initing a special inode, but
the mode is still bogus. We eventually return back up to the VFS, but the sb
struct has a root inode that's not a directory (mode is 0), so it starts tearing
down the superblock and preparing to return an error.

During the teardown, one of the things it does is delete this inode, and I think
this is where the delay comes in. I believe it calls into smbiod, which for some
reason (which is the part I'm not sure of) discards this request.

Stack trace from during the delay:

smbmnt        S dead4ead00000001     0  2180   2179                     (NOTLB)
ffffff801b22b9e8 0000000000000282 ffffff8000bf0570 0000007400bf0530 
       ffffff801a6b27f0 00000000002ddeaa 0002c5ffe0b4938b ffffff801db82030 
       ffffff801a6b2a88 ffffffff80129988 
Call Trace:<ffffffff80129988>{try_to_wake_up+731}
<ffffffff80138954>{__mod_timer+293} 
       <ffffffff80295566>{schedule_timeout+362}
<ffffffff80139565>{process_timeout+0} 
       <ffffffffa01a7f68>{:smbfs:smb_add_request+925}
<ffffffff8012dbf0>{autoremove_wake_function+0} 
       <ffffffff8015a6a0>{dbg_redzone1+30}
<ffffffff8012dbf0>{autoremove_wake_function+0} 
       <ffffffffa01a0ae1>{:smbfs:smb_request_ok+44}
<ffffffffa01a12a3>{:smbfs:smb_proc_close+98} 
       <ffffffffa01a1440>{:smbfs:smb_proc_close_inode+172} 
       <ffffffff801ce1b5>{superblock_doinit+1547}
<ffffffff801cd50e>{avc_has_perm+70} 
       <ffffffffa01a1492>{:smbfs:smb_close+37}
<ffffffffa01a5e2a>{:smbfs:smb_delete_inode+54} 
       <ffffffffa01a5df4>{:smbfs:smb_delete_inode+0}
<ffffffff8018ee56>{generic_delete_inode+190} 
       <ffffffff8018ed98>{generic_delete_inode+0} <ffffffff8018bb68>{dput+463} 
       <ffffffff8017b05a>{generic_shutdown_super+63}
<ffffffff8017bd40>{kill_anon_super+9} 
       <ffffffff8017b002>{deactivate_super+95} <ffffffff80191352>{do_add_mount+332} 
       <ffffffff80191e12>{do_mount+1721} <ffffffff801ce2d3>{inode_has_perm+89} 
       <ffffffff8018b9d1>{dput+56} <ffffffff801533b0>{find_get_page+57} 
       <ffffffff801e6b7d>{__up_read+16} <ffffffff80191703>{copy_mount_options+157} 
       <ffffffff801429ad>{search_exception_tables+29}
<ffffffff80119891>{do_page_fault+870} 
       <ffffffff80165ed6>{handle_mm_fault+556} <ffffffff8018b9d1>{dput+56} 
       <ffffffff80157054>{buffered_rmqueue+384}
<ffffffff8015acc8>{check_poison_obj+48} 
       <ffffffff8015ab4e>{poison_obj+54} <ffffffff80157244>{__alloc_pages+200} 
       <ffffffff80192196>{sys_mount+186} <ffffffff8010d66e>{system_call+134} 
       <ffffffff8010d5e8>{system_call+0} 
smbiod        S 0000000000000001     0  2181      1                1934 (L-TLB)
ffffff801b1f7eb8 0000000000000246 ffffff801b1f7e08 0000000000000000 
       ffffff801a6b2030 000000000000719c 0002c5ffe0b72b1f ffffff8000a2a030 
       ffffff801a6b22c8 0000000000000000 
Call Trace:<ffffffffa01a77a2>{:smbfs:smbiod+172}
<ffffffff8012dbf0>{autoremove_wake_function+0} 
       <ffffffff8012dbf0>{autoremove_wake_function+0}
<ffffffff80129df8>{schedule_tail+55} 
       <ffffffff8010e092>{child_rip+8} <ffffffffa01a76f6>{:smbfs:smbiod+0} 
       <ffffffff8010e08a>{child_rip+0} 


clearly smbiod isn't doing much of anything, and smbmnt is waiting on it come back.

So, arguably, this is a different bug -- we should probably return an error here
instead of timing out, but the root cause of this particular issue is that the
file and dir modes are not properly set. I don't think this patch is papering
over the issue at all.

Comment 14 Jeff Layton 2007-05-11 17:55:07 UTC

As a side note, while I'm not averse to patching smbfs where we have clear bugs
and patches, we're strongly suggesting that people move to using CIFS. The
upstream CIFS code is in much better shape than the smbfs code, and we track
CIFS upstream pretty closely in RHEL.

Comment 15 Jeff Layton 2007-05-11 18:00:07 UTC

FWIW, the 30 second pause comes from smb_add_request:

        timeleft = wait_event_interruptible_timeout(req->rq_wait,
                                    req->rq_flags & SMB_REQ_RECEIVED, 30*HZ);

(note the 30*HZ here). It's sitting on the smbiod wait queue, and just never
waking up.

Comment 16 RHEL Program Management 2007-06-21 20:14:35 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 18 Jason Baron 2007-06-22 14:29:22 UTC

committed in stream U6 build 55.11. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/

Comment 20 Don Domingo 2007-08-23 02:28:03 UTC

added to RHEL4.6 release notes under "Kernel-Related Updates":

<quote>
dir_mode and file_mode now have default values
</quote>

please advise if any revisions are necessary. thanks!

Comment 21 Simo Sorce 2007-09-12 14:05:19 UTC

*** Bug 241442 has been marked as a duplicate of this bug. ***

Comment 22 Issue Tracker 2007-10-24 12:08:03 UTC

Hi,

This customer as not provided any feed back on this, but i have one more
customer having same issue and for him thing works fine with the test
kerenl.

Hope this helps.

Kaustubh.

Internal Status set to 'Waiting on SEG'

This event sent from IssueTracker by jnansi 
 issue 124805

Comment 24 errata-xmlrpc 2007-11-15 16:13:09 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html

Note You need to log in before you can comment on or make changes to this bug.