Bug 151085
Summary: | mount are not interruptible | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Steve Dickson <steved> | ||||||
Component: | kernel | Assignee: | Jeff Layton <jlayton> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.0 | CC: | riel, staubach, steved | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | All | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | RHSA-2008-0665 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-07-24 19:10:44 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 430698 | ||||||||
Attachments: |
|
Description
Steve Dickson
2005-03-14 19:14:31 UTC
Had to build a mount program that skips the clnt_ping, but I seem to have been able to reproduce this. Stack trace of the mount program while hung: mount D 0000000000000000 0 1930 1830 (NOTLB) ffffff801b02ba58 0000000000000282 ffffff801e479030 ffffffff801409f6 ffffff801f35d030 00000000000031be 00018d730e8af4d6 ffffffff80320d40 ffffff801f35d2c8 0000000000000000 Call Trace:<ffffffff801409f6>{flush_cpu_workqueue+397} <ffffffffa00a1fda>{:sunrpc:xprt_connect_status+0} <ffffffff801387f8>{__mod_timer+293} <ffffffffa00a4db6>{:sunrpc:__rpc_execute+462} <ffffffff8012dab8>{autoremove_wake_function+0} <ffffffffa00a4920>{:sunrpc:rpc_init_task+341} <ffffffff8012dab8>{autoremove_wake_function+0} <ffffffffa00a079c>{:sunrpc:rpc_call_sync+114} <ffffffffa0172288>{:nfs:nfs4_proc_setclientid+388} <ffffffffa017a481>{:nfs:__nfs4_init_client+24} <ffffffff801d1540>{selinux_d_instantiate+0} <ffffffffa017a4aa>{:nfs:nfs4_init_client+6} <ffffffffa01614fd>{:nfs:nfs4_get_sb+1336} <ffffffff8017af8e>{do_kern_mount+161} <ffffffff80190d3b>{do_mount+1690} <ffffffff8018134f>{do_lookup+44} <ffffffff8018a911>{dput+56} <ffffffff8018247c>{__link_path_walk+3562} <ffffffff8018a911>{dput+56} <ffffffff80182603>{link_path_walk+176} <ffffffff80167f29>{do_munmap+811} <ffffffff80156f92>{buffered_rmqueue+388} <ffffffff801827f3>{path_lookup+452} <ffffffff801e7a35>{atomic_dec_and_lock+37} <ffffffff801571c7>{__alloc_pages+200} <ffffffff801910de>{sys_mount+186} <ffffffff8010d636>{system_call+134} <ffffffff8010d5b0>{system_call+0} ...I'll test the patch in the BZ and see if it allows this to be interrupted. Patch seems to work. I can interrupt the mount() syscall with it. I have to wonder whether it's worth including though. Most customers will be using the normal util-linux mount program. That program will do a clnt_ping which will hang (and maybe eventually time out?). There could be a race where clnt_ping succeeds just before we lose connectivity to the server, but that seems somewhat unlikely. My suggestion is WONTFIX here. Steve, do you have thoughts on this? I am of the opinion that we should go with the patch since its the expected behavior. Interrupting out of things is always a bit 'racey' but not being able to interrupt out of a mount, esp, on a console can is highly undesirable as well... I presume that the patch in question is that mentioned in Opened by section? If so, then I am curious why it would be a good thing for a mount to be soft. If I don't specify soft in the options, then I don't think that I would want the mount to be soft either. If I want something like that, then I would choose to use autofs. Created attachment 159849 [details]
patch -- backport of patch in description
For discussion, here's the backported patch...
I don't think that it matters much whether we include this or not. In order to
hit this problem, you'd have to have the mount program's clnt_ping succeed and
then have the host go down just before the mount() syscall is done. It's
probably possible, but I think it would be hard to hit. Then again, amd I think
calls mount() directly, so maybe it's an issue for people that use it.
Its not the communication with server, its the communication with the local rpc.idmapd is the problem. I'm thinking those upcalls to the local daemon should be interruptible when the 'intr' is set... WRT soft mounts, they are an evil thing, I agree... but should they work as advertised? For better or worse? Sorry, I should have been more clear. The patch, appeared to me, to make the mounting process soft, while the file system, after completing the mount process, would not be soft anymore. Or did I misinterpret the patch? I've not been able to get a mount to hang due to rpc.idmapd being down, but if you attempt to do a krb5 mount with rpc.gssd down, then things do seem to block. Here's the stack: mount D 0000000000000000 0 2043 1943 (NOTLB) ffffff801b287a58 0000000000000282 ffffff801ca4a1c0 0000000000000000 ffffff801f23d7f0 000000000008cb32 0001cc72f7770a48 ffffffff80320d40 ffffff801f23da88 0000000000000000 Call Trace:<ffffffffa00d4013>{:auth_rpcgss:gss_refresh+468} <ffffffffa00a4db6>{:sunrpc:__rpc_execute+462} <ffffffff8012dab8>{autoremove_wake_function+0} <ffffffffa00a4920>{:sunrpc:rpc_init_task+341} <ffffffff8012dab8>{autoremove_wake_function+0} <ffffffffa00a079c>{:sunrpc:rpc_call_sync+114} <ffffffffa0167305>{:nfs:nfs4_proc_setclientid+388} <ffffffffa016f5b5>{:nfs:__nfs4_init_client+24} <ffffffff801d1540>{selinux_d_instantiate+0} <ffffffffa016f5de>{:nfs:nfs4_init_client+6} <ffffffffa01564fd>{:nfs:nfs4_get_sb+1336} <ffffffff8017af8e>{do_kern_mount+161} <ffffffff80190d3b>{do_mount+1690} <ffffffff80235d25>{sock_common_recvmsg+48} <ffffffff802329e6>{sock_aio_read+297} <ffffffff801e5829>{__up_read+16} <ffffffff8019064b>{copy_mount_options+157} <ffffffff80142871>{search_exception_tables+29} <ffffffff8011989d>{do_page_fault+870} <ffffffff801387f8>{__mod_timer+293} <ffffffff80235a7b>{sk_reset_timer+15} <ffffffff8026623b>{tcp_write_xmit+314} <ffffffff8010dd8b>{error_exit+0} <ffffffff80297b1d>{__lock_text_end+12014} <ffffffff801910de>{sys_mount+186} <ffffffff8010d636>{system_call+134} <ffffffff8010d5b0>{system_call+0} ...the patch in comment #5 allows me to break out of it. This doesn't make the entire mount soft/intr. Once the filesystem is mounted the options are respected. The question here is -- do we want the mount() syscall to block indefinitely? I suppose it seems like that should be interruptible. I don't guess there's much danger if it is interrupted since if the fs isn't mounted, we don't have any outstanding I/O to it anyway... I would think that if intr was specified, then all situations possible should be interruptible. There will be some which can not be made interruptible, but at least the usual places and the ones that can be recovered from should be made interruptible. If we can't do the proper recovery though, then interruptibility would be a bad thing. Agreed, but the question here is "Should mount() be interruptible regardless of the mount options used?" That's what this patch does. I generally think that giving people what they ask for is the right thing to do, but in this case, maybe it's best to do it unconditionally. It doesn't seem like this change would put any data in jeopardy, only allow for users to interrupt a hung mount() call in more situations. Right. I think that unless "intr" was specified or defaulted to, then a mount() should not be any more or less interruptible than any other system call made on an NFS mounted file system. As for giving people what they ask for, I think that it depends upon whether they are asking for some specific answer to a problem that they are having or for a problem that they are having to be resolved. In the first case, what they are asking for may or may not be the right thing to do and it is our job to figure out what they really need to do and help them to accomplish that in a manner which would benefit all customers and not just them. Customers are fabulous at asking for specific solutions to their problems, which turn out to be only useful for themselves. Ok, I don't have strong feelings either way, and Steve's earlier comments only mentioned making the upcalls interruptible when "intr" was specified as well. I'll plan to respin the patch so that it's conditional upon the mount options. Created attachment 159931 [details]
patch -- make mount interruptible/soft according to intr mount option
This patch should fix up NFSv4. From what I can tell NFS2/3 already does this.
With this though, I do seem to see a *different* possible issue:
The userspace tools default to intr for NFSv4. This is the case with RHEL4's
util-linux and looks to be the case on current upstream util-linux. So to get a
non-intr mount, you have to specify nointr.
I'll take a look at the v4 RFC and see if there's a reason for this, but it
sounds like this might be wrong.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Committed in 68.17.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ Committed in 68.17.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/ An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2008-0665.html |