Bug 1767664

Summary: Backport CIFS stale ESTALE handling and dentry revalidation patches
Product: Red Hat Enterprise Linux 8 Reporter: Dave Wysochanski <dwysocha>
Component: kernelAssignee: Dave Wysochanski <dwysocha>
kernel sub component: CIFS QA Contact: xiaoli feng <xifeng>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: cifs-maint, fsorenso, lsahlber, piastryyy, xzhou
Version: ---Keywords: Reproducer
Target Milestone: rc   
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-4.18.0-150.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1771657 (view as bug list) Environment:
Last Closed: 2020-04-28 16:31:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1771657    

Description Dave Wysochanski 2019-10-31 23:35:18 UTC
Request to backport the following to RHEL kernels.

commit 30573a82fb179420b8aac30a3a3595aa96a93156
Author: Pavel Shilovsky <piastryyy>
Date:   Mon Sep 30 10:06:18 2019 -0700

    CIFS: Gracefully handle QueryInfo errors during open
    
    Currently if the client identifies problems when processing
    metadata returned in CREATE response, the open handle is being
    leaked. This causes multiple problems like a file missing a lease
    break by that client which causes high latencies to other clients
    accessing the file. Another side-effect of this is that the file
    can't be deleted.
    
    Fix this by closing the file after the client hits an error after
    the file was opened and the open descriptor wasn't returned to
    the user space. Also convert -ESTALE to -EOPENSTALE to allow
    the VFS to revalidate a dentry and retry the open.
    
    Cc: <stable.org>
    Signed-off-by: Pavel Shilovsky <pshilov>
    Signed-off-by: Steve French <stfrench>

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 3758237bf951..5ad15de2bb4f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -253,6 +253,12 @@ int cifs_posix_open(char *full_path, struct inode **pinode,
                rc = cifs_get_inode_info(&inode, full_path, buf, inode->i_sb,
                                         xid, fid);
 
+       if (rc) {
+               server->ops->close(xid, tcon, fid);
+               if (rc == -ESTALE)
+                       rc = -EOPENSTALE;
+       }
+
 out:
        kfree(buf);
        return rc;

Comment 2 Dave Wysochanski 2019-11-04 19:15:26 UTC
Trying to refine reproducer to see if we can find increasing slab usage or something else.  Looks like this reproduces ESTALE fairly easily.  server is running 4.18.0-80.el8

[1]   Running                 tcpdump -i ens3 -w /tmp/tcpdump.pcap host 192.168.122.44 &  (wd: /tmp)
[2]-  Running                 nohup stap -v -e 'probe module("cifs").function("cifs_get_inode_info").return { if ($return) { printf("%s %s returning %d\n", tz_ctime(gettimeofday_s()), ppfunc(), $return) } }' &
[3]+  Running                 while true; do
    python3 -c "with open('/mnt/example/bz1767664/1') as f: print(f)"; rm -i -f /mnt/example/bz1767664/1; touch /mnt/example/bz1767664/1; touch /mnt/example/bz1767664/2;
done &

Roughly every 30 seconds:
# touch: cannot touch '/mnt/example/bz1767664/1': No such file or directory
touch: setting times of '/mnt/example/bz1767664/2': No such file or directory
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/example/bz1767664/1'
touch: setting times of '/mnt/example/bz1767664/2': No such file or directory
Traceback (most recent call last):
  File "<string>", line 1, in <module>
OSError: [Errno 116] Stale file handle: '/mnt/example/bz1767664/1'


# tail nohup.out 
Mon Nov  4 13:37:08 2019 EST cifs_get_inode_info returning -2
Mon Nov  4 13:37:08 2019 EST cifs_get_inode_info returning -2
Mon Nov  4 13:37:08 2019 EST cifs_get_inode_info returning -2
Mon Nov  4 13:37:08 2019 EST cifs_get_inode_info returning -116
Mon Nov  4 13:37:08 2019 EST cifs_get_inode_info returning -116
Mon Nov  4 13:37:38 2019 EST cifs_get_inode_info returning -2
Mon Nov  4 13:37:38 2019 EST cifs_get_inode_info returning -2
Mon Nov  4 13:37:39 2019 EST cifs_get_inode_info returning -2
Mon Nov  4 13:37:39 2019 EST cifs_get_inode_info returning -116
Mon Nov  4 13:37:39 2019 EST cifs_get_inode_info returning -116

Comment 3 Dave Wysochanski 2019-11-04 19:54:14 UTC
This series had 2 other patches, so 3 patches total:
https://patchwork.kernel.org/patch/11166977/
https://patchwork.kernel.org/patch/11166979/
https://patchwork.kernel.org/patch/11166981/

30573a82fb17 CIFS: Gracefully handle QueryInfo errors during open
c82e5ac7fe35 CIFS: Force revalidate inode when dentry is stale
0b3d0ef9840f CIFS: Force reval dentry if LOOKUP_REVAL flag is set

Comment 5 Frank Sorenson 2019-11-12 17:55:12 UTC
This is also reproduced using 2 smb clients mounting the same share:

*** client 1 ***
# mount //vm3/user1 /mnt/vm3 -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds
# cd /mnt/vm3
# echo foo > foo ; echo bar > bar

# cat foo bar
foo
bar


*** client 2 ***
# mount //vm3/user1 /mnt/vm3 -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds
# cd /mnt/vm3

# cat foo bar
foo
bar


*** smb server ***
# smbstatus | grep -i locked -A1000
No locked files


*** client 2 ***
# while true ; do cat foo bar >/dev/null 2>/dev/null ; cat bar || break ; done
bar
bar
bar
...


*** client 1 ***
# mv -f foo bar


*** client 2 ***
...
bar
bar
bar
cat: bar: Stale file handle


*** smb server ***

# smbstatus | grep -i locked -A1000
Locked files:
Pid          Uid        DenyMode   Access      R/W        Oplock           SharePath   Name   Time
--------------------------------------------------------------------------------------------------
22254        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   bar   Tue Nov 12 11:20:12 2019
22254        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   bar   Tue Nov 12 11:20:12 2019
22254        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   bar   Tue Nov 12 11:20:12 2019
22254        501        DENY_NONE  0x120089    RDONLY     LEASE(RWH)       /home/user1   bar   Tue Nov 12 11:20:12 2019

Comment 7 Phillip Lougher 2019-11-15 17:09:54 UTC
Patch(es) available on kernel-4.18.0-150.el8

Comment 11 Bruno Meneguele 2019-12-10 04:41:12 UTC
*** Bug 1768695 has been marked as a duplicate of this bug. ***

Comment 13 errata-xmlrpc 2020-04-28 16:31:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:1769