Bug 1767664
| Summary: | Backport CIFS stale ESTALE handling and dentry revalidation patches | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Dave Wysochanski <dwysocha> | |
| Component: | kernel | Assignee: | Dave Wysochanski <dwysocha> | |
| kernel sub component: | CIFS | QA Contact: | xiaoli feng <xifeng> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | unspecified | CC: | cifs-maint, fsorenso, lsahlber, piastryyy, xzhou | |
| Version: | --- | Keywords: | Reproducer | |
| Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
|
| Target Release: | 8.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | kernel-4.18.0-150.el8 | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1771657 (view as bug list) | Environment: | ||
| Last Closed: | 2020-04-28 16:31:50 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1771657 | |||
Trying to refine reproducer to see if we can find increasing slab usage or something else. Looks like this reproduces ESTALE fairly easily. server is running 4.18.0-80.el8
[1] Running tcpdump -i ens3 -w /tmp/tcpdump.pcap host 192.168.122.44 & (wd: /tmp)
[2]- Running nohup stap -v -e 'probe module("cifs").function("cifs_get_inode_info").return { if ($return) { printf("%s %s returning %d\n", tz_ctime(gettimeofday_s()), ppfunc(), $return) } }' &
[3]+ Running while true; do
python3 -c "with open('/mnt/example/bz1767664/1') as f: print(f)"; rm -i -f /mnt/example/bz1767664/1; touch /mnt/example/bz1767664/1; touch /mnt/example/bz1767664/2;
done &
Roughly every 30 seconds:
# touch: cannot touch '/mnt/example/bz1767664/1': No such file or directory
touch: setting times of '/mnt/example/bz1767664/2': No such file or directory
Traceback (most recent call last):
File "<string>", line 1, in <module>
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/example/bz1767664/1'
touch: setting times of '/mnt/example/bz1767664/2': No such file or directory
Traceback (most recent call last):
File "<string>", line 1, in <module>
OSError: [Errno 116] Stale file handle: '/mnt/example/bz1767664/1'
# tail nohup.out
Mon Nov 4 13:37:08 2019 EST cifs_get_inode_info returning -2
Mon Nov 4 13:37:08 2019 EST cifs_get_inode_info returning -2
Mon Nov 4 13:37:08 2019 EST cifs_get_inode_info returning -2
Mon Nov 4 13:37:08 2019 EST cifs_get_inode_info returning -116
Mon Nov 4 13:37:08 2019 EST cifs_get_inode_info returning -116
Mon Nov 4 13:37:38 2019 EST cifs_get_inode_info returning -2
Mon Nov 4 13:37:38 2019 EST cifs_get_inode_info returning -2
Mon Nov 4 13:37:39 2019 EST cifs_get_inode_info returning -2
Mon Nov 4 13:37:39 2019 EST cifs_get_inode_info returning -116
Mon Nov 4 13:37:39 2019 EST cifs_get_inode_info returning -116
This series had 2 other patches, so 3 patches total: https://patchwork.kernel.org/patch/11166977/ https://patchwork.kernel.org/patch/11166979/ https://patchwork.kernel.org/patch/11166981/ 30573a82fb17 CIFS: Gracefully handle QueryInfo errors during open c82e5ac7fe35 CIFS: Force revalidate inode when dentry is stale 0b3d0ef9840f CIFS: Force reval dentry if LOOKUP_REVAL flag is set This is also reproduced using 2 smb clients mounting the same share: *** client 1 *** # mount //vm3/user1 /mnt/vm3 -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds # cd /mnt/vm3 # echo foo > foo ; echo bar > bar # cat foo bar foo bar *** client 2 *** # mount //vm3/user1 /mnt/vm3 -overs=3,sec=ntlmssp,credentials=/root/.user1_smb_creds # cd /mnt/vm3 # cat foo bar foo bar *** smb server *** # smbstatus | grep -i locked -A1000 No locked files *** client 2 *** # while true ; do cat foo bar >/dev/null 2>/dev/null ; cat bar || break ; done bar bar bar ... *** client 1 *** # mv -f foo bar *** client 2 *** ... bar bar bar cat: bar: Stale file handle *** smb server *** # smbstatus | grep -i locked -A1000 Locked files: Pid Uid DenyMode Access R/W Oplock SharePath Name Time -------------------------------------------------------------------------------------------------- 22254 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 bar Tue Nov 12 11:20:12 2019 22254 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 bar Tue Nov 12 11:20:12 2019 22254 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 bar Tue Nov 12 11:20:12 2019 22254 501 DENY_NONE 0x120089 RDONLY LEASE(RWH) /home/user1 bar Tue Nov 12 11:20:12 2019 Patch(es) available on kernel-4.18.0-150.el8 *** Bug 1768695 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1769 |
Request to backport the following to RHEL kernels. commit 30573a82fb179420b8aac30a3a3595aa96a93156 Author: Pavel Shilovsky <piastryyy> Date: Mon Sep 30 10:06:18 2019 -0700 CIFS: Gracefully handle QueryInfo errors during open Currently if the client identifies problems when processing metadata returned in CREATE response, the open handle is being leaked. This causes multiple problems like a file missing a lease break by that client which causes high latencies to other clients accessing the file. Another side-effect of this is that the file can't be deleted. Fix this by closing the file after the client hits an error after the file was opened and the open descriptor wasn't returned to the user space. Also convert -ESTALE to -EOPENSTALE to allow the VFS to revalidate a dentry and retry the open. Cc: <stable.org> Signed-off-by: Pavel Shilovsky <pshilov> Signed-off-by: Steve French <stfrench> diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 3758237bf951..5ad15de2bb4f 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -253,6 +253,12 @@ int cifs_posix_open(char *full_path, struct inode **pinode, rc = cifs_get_inode_info(&inode, full_path, buf, inode->i_sb, xid, fid); + if (rc) { + server->ops->close(xid, tcon, fid); + if (rc == -ESTALE) + rc = -EOPENSTALE; + } + out: kfree(buf); return rc;