Bug 726452

Summary: open() calls for files mounted via kerberized nfsv4 for a user with expired ticket hangs
Product: Red Hat Enterprise Linux 6 Reporter: prozaconstilts
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: yanfu,wang <yanwang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.1   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-09-27 18:31:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description prozaconstilts 2011-07-28 16:32:13 UTC
Description of problem:

Opens of files located under a kerberized NFSv4 mount hang when the user owning the file has an expired (but existing) kerberos credential cache.

Version-Release number of selected component (if applicable):

nfs-utils-1.2.3.7.el6.x86_64
krb5-libs-1.9-9.el6_1.1.x86_64

How reproducible:

Easily by others in my environment...not too sure about what exactly is the underlying cause, so perhaps difficult to reproduce.

Steps to Reproduce:

-build a RHEL6 NFS client that mounts an NFS server via kerberized NFSv4.
-request a ticket with a short lifetime and renewal time
-nfs mount a directory you have access to, and cat any file you own
-wait until your ticket expires
-try to cat a file you own, or ssh into the server, or any other operation that will try to open a file you own

Actual results:

hangs indefinitely


Expected results:

returns permission denied


Additional info:

the rpcgssd downcall returns something different depending on expired vs. non-existant ccache:


with an expired cache:
   write(12, "?\t\0\0\0\0\0\0\0\0\0\0\201\377\377\377", 16) = 16
   | 00000  3f 09 00 00 00 00 00 00  00 00 00 00 81 ff ff ff  ?....... ........ |

without a cache:
   write(12, "?\t\0\0\0\0\0\0\0\0\0\0\363\377\377\377", 16) = 16
   | 00000  3f 09 00 00 00 00 00 00  00 00 00 00 f3 ff ff ff  ?.......  ........ |

I'm not surprised it writes something different...expired ccache vs. non-existant cache, but I'm unable to determine what receives the result of the downcall, and why it decides to hang...

My kerberos server is a 2008 R2 AD. My RHEL5 clients do not exhibit this bug against the same kerberos server and NFS server.

I can provide any conf files needed upon request.

Thanks!

Comment 2 prozaconstilts 2011-07-29 11:50:25 UTC
Actually, after conferring with my colleagues, I believe one of them may have done a better job identifying this problem. Here is a paste of his e-mail:

It looks like that guess may have been accurate. Here is the beginning
of the patchset designed specifically to make the kernel spin (with
exponential backoff) when access is requested after a TGT has expired.
The use case driving this was specifically long term jobs.
http://linux-nfs.org/pipermail/nfsv4/2010-January/012012.html

    When someone deploys kerberized NFS, they usually will quickly run
    across a major problem. As soon as their credentials expire, all
    RPCs start failing with -EACCES errors. This makes it really
    difficult to have any sort of long-running job since you have to
    proactively kinit before your TGT expires. If you miss doing so,
    then your job may start getting errors unexpectedly.

    This patchset represents a first pass at fixing this. The idea here
    is to distinguish between the situation where someone has an expired
    credential cache and someone that has no credential cache at all. In
    the latter case, we want to have the RPC return -EACCES (just like
    it does today), in the former case we want to return a different
    error that will make the NFS layer delay and retry the call instead
    of erroring out (-EKEYEXPIRED).

    This patchset is for the kernel patches. To make this work, gssd
    will also need to be fixed to send different errors in these
    situations. That patch will follow this set.


and here is the patch which actually causes the kernel to wait
http://linux-nfs.org/pipermail/nfsv4/2010-January/012014.html

    If a KRB5 TGT ticket expires, we don't want to return an error
    immediatel. If someone has a long running job and just forgets to
    run "kinit" in time then this will make it fail.

    Instead, we want to treat this situation as we would NFS4ERR_DELAY
    and retry the upcall after delaying a bit with an exponential
    backoff.

    This patch just makes any place that would handle NFS4ERR_DELAY also
    handle -EKEYEXPIRED the same way. In the future, we may want to be
    more sophisticated however and handle hard vs. soft mounts
    differently, or specify some upper limit on how long we'll wait for
    a new TGT to be acquired.


There are some timeout checks in place in the RHEL6 kernel, but
they all seem to eventually loop into infinity, surprisingly even
if running 'soft'.

    nfs4_handle_exception:
            case -EKEYEXPIRED:
                ret = nfs4_delay(server->client, &exception->timeout);
                if (ret != 0)
                        break;

    nfs4_async_handle_error
            case -EKEYEXPIRED:
                    rpc_delay(task, NFS4_POLL_RETRY_MAX);
                        task->tk_status = 0;
                        return -EAGAIN;

At the moment, this basically seems to boil down to:

In RHEL5, a long running process that continued operating after the TGT
expired, would spontaneously be returned 'access denied' to read/write
data that it may have been using moments before. Unless the application
was reasonably well written, that generally meant it just crashed.
Depending on the situation, that could have easily resulted in corrupted
state.

In RHEL6, that same process after the TGT expired was basically blocked
on I/O, it would sit there and spin, waiting for the filesystem to be
available again. From a single dumb application's point of view, this is
a pretty good approach. It effectively gets stuck in I/O wait, and when
the TGT is finally renewed, it can continue processing as if nothing
happened.  Obviously this can cause some confusion, and is not ideal for
really smart applications which might be able to recover in some other
manner (use other space, continue processing and write later, etc.) or
single threaded interactive applications which will seemingly just
freeze.

Given the two paths, I certainly see the draw of the RHEL6 approach,
particularly for someone who was not thinking of NFS mounted home
directories. Obviously with home directories involved (especially those
backing a graphical session) it tends to lock things up, but probably
results in less actual corruption then 5.  Additionally, given a renewed
TGT in RHEL6, all state would conceivably be recoverable if not sitting
for long periods. That is probably less true when an NFS based home
directory with a graphical session is in play.

A few solutions are coming to mind, some far better than others.
Unfortunately the best choices are going to involve a fair amount of
work.

Comment 3 Steve Dickson 2011-09-27 18:31:16 UTC
I'm thinking we are probably not going to much further, in RHEL6,
than Jeff's upstream patches that deal with this problem. As stated
with RHEL 6 deals with expired better but not perfect for all
applications. So baring some unexpected break through in upstream
I am going to close this as NOTABUG