Bug 1694201
Summary: | cifs repeatedly tries to open a file using smb v1 on an smb2 mount after receiving STATUS_SHARING_VIOLATION | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Frank Sorenson <fsorenso> | ||||||
Component: | kernel | Assignee: | Ronnie Sahlberg <lsahlber> | ||||||
kernel sub component: | CIFS | QA Contact: | Murphy Zhou <xzhou> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | high | ||||||||
Priority: | high | CC: | dwysocha, jose.paul, jshivers, kdsouza, lsahlber, rbergant, swhiteho, xzhou | ||||||
Version: | 7.6 | Keywords: | Reopened, Reproducer | ||||||
Target Milestone: | rc | ||||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | kernel-3.10.0-1063.el7 | Doc Type: | If docs needed, set a value | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-03-31 19:16:21 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
the upstream client is also broken, but in a different way. Instead of an smb request, it appears to send some invalid bytes, followed by an smb request (which may or may not be the same as what rhel 7 sends). Instead of closing the connection with FIN, the server sends RST. However, the upstream client still loops forever, with the 'mv' operation hanging (just with different invalid communication). I can confirm that the upstream client does send the same 'NT Create AndX Request', however the NBSS header is repeated, making the payload invalid. Here is the tcp payload sent by the upstream client: 0000 00 00 00 5d 00 00 00 5d ff 53 4d 42 a2 00 00 00 ...]...].SMB.... 0010 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0020 5e 15 77 09 df 6a 21 03 18 ff 00 00 00 00 0a 00 ^.w..j!......... 0030 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 ................ 0040 00 00 00 00 80 00 00 00 07 00 00 00 01 00 00 00 ................ 0050 40 00 00 00 02 00 00 00 03 0a 00 5c 74 65 73 74 @..........\test 0060 66 69 6c 64 00 file. the first 4 bytes decode as nbss: NetBIOS Session Service Message Type: Session message (0x00) Flags: 0x00 Length: 0x005d However, note that these 4 bytes are then repeated: 0000 00 00 00 5d 00 00 00 5d Also, due to the addition of these 4 bytes, the nbss encapsulates 4 fewer bytes at the end, so the filename ends with 'testf', and there are 4 extra bytes of tcp payload at the end. I modified the frame to get rid of the additional 4 bytes of header, and it decodes as valid SMB again: 3231 97.377091804 192.168.122.150 → 192.168.122.71 SMB 167 NT Create AndX Request, Path: \testfile Sending SMB over the SMB2 session is still invalid, however this does explain the difference between RHEL 7 and upstream behavior. For some reason, in addition to switching from smb2 to smb, it adds a second header. in cifs_do_rename(), we first try to call the smb-version-specific path-based rename function, then later try to open the file using the function CIFS_open() static int cifs_do_rename(const unsigned int xid, struct dentry *from_dentry, const char *from_path, struct dentry *to_dentry, const char *to_path) { ... /* try path-based rename first */ rc = server->ops->rename(xid, tcon, from_path, to_path, cifs_sb); ... /* open the file to be renamed -- we need DELETE perms */ oparms.desired_access = DELETE; oparms.create_options = CREATE_NOT_DIR; oparms.disposition = FILE_OPEN; oparms.path = from_path; oparms.fid = &fid; oparms.reconnect = false; ... rc = CIFS_open(xid, &oparms, &oplock, NULL); if (rc == 0) { rc = CIFSSMBRenameOpenFile(xid, tcon, fid.netfid, (const char *) to_dentry->d_name.name, cifs_sb->local_nls, cifs_remap(cifs_sb)); CIFSSMBClose(xid, tcon, fid.netfid); However, CIFS_open, CIFSSMBRenameOpenFile, and CIFSSMBClose are all smb1-specific in CIFS_open(): rc = smb_init(SMB_COM_NT_CREATE_ANDX, 24, tcon, (void **)&req, in CIFSSMBRenameOpenFile(): rc = smb_init(SMB_COM_TRANSACTION2, 15, pTcon, (void **) &pSMB, in CIFSSMBClose(): rc = small_smb_init(SMB_COM_CLOSE, 3, tcon, (void **) &pSMB); So if the 'path-based rename' fails while using a higher smb version, it falls back to using smb v1 calls, which is clearly wrong. smb1ops.c:struct smb_version_operations smb1_operations = { smb1ops.c: .rename_pending_delete = cifs_rename_pending_delete, smb1ops.c: .rename = CIFSSMBRename, smb1ops.c: .open = cifs_open_file, smb1ops.c: .close = cifs_close_file, smb1ops.c: .close_dir = cifs_close_dir, smb2ops.c:struct smb_version_operations smb20_operations = { smb2ops.c: .rename = smb2_rename_path, smb2ops.c: .open = smb2_open_file, smb2ops.c: .close = smb2_close_file, smb2ops.c: .close_dir = smb2_close_dir, smb2ops.c:struct smb_version_operations smb21_operations = { smb2ops.c: .rename = smb2_rename_path, smb2ops.c: .open = smb2_open_file, smb2ops.c: .close = smb2_close_file, smb2ops.c: .close_dir = smb2_close_dir, smb2ops.c:struct smb_version_operations smb30_operations = { smb2ops.c: .rename = smb2_rename_path, smb2ops.c: .open = smb2_open_file, smb2ops.c: .close = smb2_close_file, smb2ops.c: .close_dir = smb2_close_dir, smb2ops.c:struct smb_version_operations smb311_operations = { smb2ops.c: .rename = smb2_rename_path, smb2ops.c: .open = smb2_open_file, smb2ops.c: .close = smb2_close_file, smb2ops.c: .close_dir = smb2_close_dir, so it seems like the open in cifs_do_rename should really look like this: rc = server->ops->open(xid, &oparms, &oplock, NULL); and the CIFSSMBClose replaced: server->ops->close(xid, tcon, fid.netfid); there is no replacement for CIFSSMBRenameOpenFile ... can this be done using smb2 semantics? ***** and finally, when trying to rename the open file on an smb1 mount, the operation fails, which seems like it may be the correct response anyway: 11358 07:06:54.545544 rename("/mnt/vm3_a/testfile", "/mnt/vm3_a/testfile.bak") = -1 EBUSY (Device or resource busy) <1.925719> the 1-2 second delay is expected; in the samba code (for smb1) (I believe there may be 2 waits) source3/include/local.h: /* Number of microseconds to wait before a sharing violation. */ #define SHARING_VIOLATION_USEC_WAIT 950000 During the recent big credits cleanup in the smb2 codebase we have recently fixed a handful of similar issues. But these changes can not easily be backported as they build on and depend on a lot of other unrelated changes. However, there is a workaround. I suggest for customers that suffer this, then can switch back to vers=1.0 until they get a chance to upgrade to rhel8. Created attachment 1555290 [details]
patch to prevent falback to smb
A path-based rename returning EBUSY will incorrectly try opening
the file with a cifs (NT Create AndX) operation on an smb2+ mount,
which causes the server to force a session close.
If the mount is smb2+, skip the fallback.
The credits cleanup did not touch this, and does not fix it. The attached patch fixes this bug, and applies cleanly to both upstream and RHEL7. How hard would it be to do an xfstest from the reproducer provided? I think the patch will go upstream just fine, but Steve French rightly suggested the should be something in xfstests upstream posting for the patch - https://www.spinics.net/lists/linux-cifs/msg16914.html Steve French responded privately regarding the xfstests fix is committed upstream and in -stable branches commit 652727bbe1b17993636346716ae5867627793647 Author: Frank Sorenson <sorenson> Date: 2019-04-16 08:37:27 -0500 cifs: do not attempt cifs operation on smb2+ rename error A path-based rename returning EBUSY will incorrectly try opening the file with a cifs (NT Create AndX) operation on an smb2+ mount, which causes the server to force a session close. If the mount is smb2+, skip the fallback. Signed-off-by: Frank Sorenson <sorenson> Signed-off-by: Steve French <stfrench> CC: Stable <stable.org> Reviewed-by: Ronnie Sahlberg <lsahlber> Reproducer can be done on a single system, acting as both smb server and cifs client. specific share, user, credentials, etc. are unimportant. setup: # mount -ocredentials=/root/.user1_smb_creds,vers=2.0 //localhost/user1 /mnt/tmp # touch /mnt/tmp/testfile in terminal 1: # smbclient -A /root/.user1_smb_creds //localhost/user1 Try "help" to get a list of possible commands. smb: \> open testfile open file \testfile: for read/write fnum 1 in terminal 2: # mv /mnt/tmp/testfile /mnt/tmp/testfile.bak If the bug exists, the 'mv' command will hang, and a packet capture will show the server responding to an open/create call with STATUS_SHARING_VIOLATION. The client will then attempt to open the file using smb1 semantics, to which the server will close or reset the tcp connection. The client will delay several seconds, reconnect the smb2 session, and re-attempt the smb1 open/Create. This then repeats. If the bug is fixed, the client will simply return an error (EBUSY) to the 'mv' command. (In reply to Frank Sorenson from comment #12) > Reproducer can be done on a single system, acting as both smb server and > cifs client. > > specific share, user, credentials, etc. are unimportant. > > > setup: > # mount -ocredentials=/root/.user1_smb_creds,vers=2.0 //localhost/user1 > /mnt/tmp > # touch /mnt/tmp/testfile > > > in terminal 1: > # smbclient -A /root/.user1_smb_creds //localhost/user1 > Try "help" to get a list of possible commands. > smb: \> open testfile > open file \testfile: for read/write fnum 1 > > in terminal 2: > # mv /mnt/tmp/testfile /mnt/tmp/testfile.bak > > > If the bug exists, the 'mv' command will hang, and a packet capture will > show the server responding to an open/create call with > STATUS_SHARING_VIOLATION. The client will then attempt to open the file > using smb1 semantics, to which the server will close or reset the tcp > connection. The client will delay several seconds, reconnect the smb2 > session, and re-attempt the smb1 open/Create. This then repeats. > > If the bug is fixed, the client will simply return an error (EBUSY) to the > 'mv' command. Thanks Frank very much! I'll try to get it to xfstests. (In reply to Frank Sorenson from comment #12) > Reproducer can be done on a single system, acting as both smb server and > cifs client. > > specific share, user, credentials, etc. are unimportant. smbclient open is needed, isn't it? I tried opening it in shell and failed to reproduce. > If the bug exists, the 'mv' command will hang, In my test, 'mv' hangs for a few minutes and returns EAGAIN eventually: (smbv1 returns EBUSY.) [root@ibm-x3850x5-03 ~]# uname -r 3.10.0-1059.el7.x86_64 [root@ibm-x3850x5-03 ~]# mount //localhost/test /cifsmnt -o vers=2.0,credentials=/root/smb1.creds [root@ibm-x3850x5-03 ~]# mount | grep cifs //localhost/test on /cifsmnt type cifs (rw,relatime,vers=2.0,cache=strict,username=root,domain=IBM-X3850X5-03,uid=0,noforceuid,gid=0,noforcegid,addr=0000:0000:0000:0000:0000:0000:0000:0001,file_mode=0755,dir_mode=0755,soft,nounix,serverino,mapposix,rsize=65536,wsize=65536,echo_interval=60,actimeo=1) [root@ibm-x3850x5-03 ~]# time mv /cifsmnt/test /cifsmnt/test1 mv: cannot move ‘/cifsmnt/test’ to ‘/cifsmnt/test1’: Resource temporarily unavailable real 0m40.115s user 0m0.002s sys 0m0.013s [root@ibm-x3850x5-03 ~]# echo $? 1 [root@ibm-x3850x5-03 ~]# umount /cifsmnt [root@ibm-x3850x5-03 ~]# root@ibm-x3850x5-03 ~]# mount //localhost/test /cifsmnt -o vers=1.0,credentials=/root/smb1.creds [root@ibm-x3850x5-03 ~]# time mv /cifsmnt/test /cifsmnt/test1 mv: cannot move ‘/cifsmnt/test’ to ‘/cifsmnt/test1’: Device or resource busy real 0m1.912s user 0m0.000s sys 0m0.006s [root@ibm-x3850x5-03 ~]# Patch(es) committed on kernel-3.10.0-1063.el7 This seems to hit many different customers based on the number of linked cases so I think it would justify z-stream. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:1016 |
Created attachment 1549585 [details] packet capture Description of problem: On an smb2 mount, if an attempt to open a file fails with STATUS_SHARING_VIOLATION, the client will retry the open operation with an smb v1 'NT Create AndX'. This results in the server closing the TCP connection. The client then re-establishes the tcp connection, renegotiates smb2, sets up a new session, connects to the tree, and retries the smb v1 operation. The server again closes the connection, and the client loops continually. The application initiating the access is blocked. Version-Release number of selected component (if applicable): tested with several RHEL7 kernels, including: 3.10.0-862.9.1.el7.x86_64 3.10.0-1006.el7.x86_64 (nightly) How reproducible: easy, see below Steps to Reproduce: in terminal 1: # mount -ocredentials=/root/.user1_smb_creds,vers=2.0 //vm3/user1 /mnt/vm3 # touch /mnt/vm3/testfile # smbclient -A /root/.user1_smb_creds //vm3/user1 Try "help" to get a list of possible commands. smb: \> open testfile open file \testfile: for read/write fnum 1 in terminal 2: # mv /mnt/vm3/testfile /mnt/vm3/testfile.bak Actual results: the client repeatedly attempts an smb v1 operation over smb2 the 'mv' will never complete, either successfully or with an error Expected results: the client only uses valid operations (not smb calls over smb2) the 'mv' returns an error (presumably EBUSY) Additional info: from the packet capture: vm3 - server vm7 - client hosts: 192.168.122.71 vm3 192.168.122.60 vm7 tshark -H hosts -2 -r trace.pcap.gz smb2 open fails with STATUS_SHARING_VIOLATION 120 52.420169670 vm7 → vm3 SMB2 214 Create Request File: testfile 121 52.420736516 vm3 → vm7 SMB2 143 Create Response, Error: STATUS_SHARING_VIOLATION client retries opening with smb 'NT Create AndX' 122 52.421352572 vm7 → vm3 SMB 163 NT Create AndX Request, Path: \testfile server disconnects the client 123 52.435305763 vm3 → vm7 TCP 66 445 → 39224 [FIN, ACK] Seq=346440821 Ack=2786851610 Win=44032 Len=0 TSval=2420834511 TSecr=1547066566 124 52.435789451 vm7 → vm3 TCP 66 39224 → 445 [FIN, ACK] Seq=2786851610 Ack=346440822 Win=48512 Len=0 TSval=1547066581 TSecr=2420834511 125 52.435812384 vm3 → vm7 TCP 66 445 → 39224 [ACK] Seq=346440822 Ack=2786851611 Win=44032 Len=0 TSval=2420834512 TSecr=1547066581 client does tcp reconnect, negotiate smb2, session setup, tree connect 126 52.436150145 vm7 → vm3 TCP 74 39228 → 445 [SYN] Seq=3370372604 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1547066581 TSecr=0 WS=128 127 52.436193928 vm3 → vm7 TCP 74 445 → 39228 [SYN, ACK] Seq=71852149 Ack=3370372605 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=2420834512 TSecr=1547066581 WS=128 128 52.436588837 vm7 → vm3 TCP 66 39228 → 445 [ACK] Seq=3370372605 Ack=71852150 Win=29312 Len=0 TSval=1547066581 TSecr=2420834512 129 52.436792226 vm7 → vm3 SMB2 172 Negotiate Protocol Request 130 52.436807047 vm3 → vm7 TCP 66 445 → 39228 [ACK] Seq=71852150 Ack=3370372711 Win=29056 Len=0 TSval=2420834513 TSecr=1547066582 131 52.441860471 vm3 → vm7 SMB2 272 Negotiate Protocol Response 132 52.442298599 vm7 → vm3 TCP 66 39228 → 445 [ACK] Seq=3370372711 Ack=71852356 Win=30336 Len=0 TSval=1547066587 TSecr=2420834518 133 52.446443896 vm7 → vm3 SMB2 190 Session Setup Request, NTLMSSP_NEGOTIATE 134 52.447394014 vm3 → vm7 SMB2 332 Session Setup Response, Error: STATUS_MORE_PROCESSING_REQUIRED, NTLMSSP_CHALLENGE 135 52.448873796 vm7 → vm3 SMB2 422 Session Setup Request, NTLMSSP_AUTH, User: \user1 136 52.454091885 vm3 → vm7 SMB2 142 Session Setup Response 137 52.454961286 vm7 → vm3 SMB2 166 Tree Connect Request Tree: \\vm3\user1 138 52.456747268 vm3 → vm7 SMB2 150 Tree Connect Response 139 52.457521077 vm7 → vm3 SMB2 164 Tree Connect Request Tree: \\vm3\IPC$ 140 52.458883197 vm3 → vm7 SMB2 150 Tree Connect Response 141 52.498909139 vm7 → vm3 TCP 66 39228 → 445 [ACK] Seq=3370373389 Ack=71852866 Win=31360 Len=0 TSval=1547066644 TSecr=2420834535 142 54.913057435 vm7 → vm3 SMB2 138 KeepAlive Request 143 54.913315594 vm3 → vm7 SMB2 138 KeepAlive Response 144 54.913891376 vm7 → vm3 TCP 66 39226 → 445 [ACK] Seq=2374753979 Ack=651874105 Win=33536 Len=0 TSval=1547069059 TSecr=2420836989 145 59.925659973 vm7 → vm3 SMB2 138 KeepAlive Request 146 59.925992968 vm3 → vm7 SMB2 138 KeepAlive Response 147 59.926747160 vm7 → vm3 TCP 66 39226 → 445 [ACK] Seq=2374754051 Ack=651874177 Win=33536 Len=0 TSval=1547074071 TSecr=2420842002 10 seconds after initiating the new tcp connection, the client again tries to open with smb 'NT Create AndX' 148 62.436158004 vm7 → vm3 SMB 163 NT Create AndX Request, Path: \testfile server again disconnects 149 62.444862128 vm3 → vm7 TCP 66 445 → 39228 [FIN, ACK] Seq=71852866 Ack=3370373486 Win=30080 Len=0 TSval=2420844521 TSecr=1547076581 150 62.445651520 vm7 → vm3 TCP 66 39228 → 445 [FIN, ACK] Seq=3370373486 Ack=71852867 Win=31360 Len=0 TSval=1547076590 TSecr=2420844521