Bug 773705
Summary: | cifs: i/o error on copying file > 102336 bytes | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | illtud <illtud.daniel> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED ERRATA | QA Contact: | Jian Li <jiali> |
Severity: | high | Docs Contact: | |
Priority: | urgent | ||
Version: | 6.2 | CC: | dhoward, eguan, jiali, jwest, kzhang, nmurray, rwheeler, smfltc, sprabhu, steved |
Target Milestone: | rc | Keywords: | Reopened, ZStream |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | kernel-2.6.32-230.el6 | Doc Type: | Bug Fix |
Doc Text: |
Windows clients never send write requests larger than 64 KB but the default size for write requests in Common Internet File System (CIFS) was set to a much larger value. Consequently, write requests larger than 64 KB caused various problems on certain third-party servers. This update lowers the default size for write requests to prevent this bug. The user can override this value to a larger one to get better performance.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2012-06-20 08:19:11 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 789058 |
Description
illtud
2012-01-12 16:54:18 UTC
(cc'ing Sachin and Steve French, the upstream CIFS maintainer) Yes. This is actually a bug in Solaris. See the thread here where I proposed a patch to address this that Steve has so far NAK'ed: http://thread.gmane.org/gmane.linux.kernel.cifs/4909/focus=4981 I also traded emails with Gordon Ross (who works with Nexenta) on this matter and here's what he said: ------------------[snip]------------------ I just looked at this. The current code allocates a buffer large enough for whatever size the NetBIOS-style header length indicates. However, there's a bug in the write path (whch I just stumbled upon this week due to activities unrelated to this disucssion) so that the marshalling code clamps the size at 0x19000 (102400). So the good news is, we'll probably "just work" with the Linux client after this other bug gets fixed. ------------------[snip]------------------ Setting the wsize smaller, to maybe wsize=98304 or so should work around the problem for now. If you can simultaneously report this bug to oracle, then maybe they can fix it there too and you can get the benefit of the larger wsize. For now, I'll set this to needinfo from you. If you can confirm whether mounting with a smaller wsize= value helps, then we can determine what to do from there. Great, thanks, I'll try the small wsize. In the meantime, any explanation on why it works fine in RHEL5.x ? Sorry, I can see from the man.cifs page: wsize=bytes Maximum amount of data that the kernel will send in a write request in bytes. Prior to RHEL6.2 kernels, the default and maximum was 57344 (14 * 4096 pages). As of RHEL6.2, the default is 1M, and the maximum allowed is 16M. Which explains my question. I can confirm that wsize=98304 fixes the issue for me. Thanks for your help. No problem. Since this is a server bug, I think we probably will need to go ahead and close this with a resolution of NOTABUG. Please reopen this bug if you want to discuss it further. If you could also report this to Oracle and ask them to fix this bug in the server then that would be good too. Reopening the bug... Steve ended up taking my patch to lower the default wsize upstream. We also ran into a similar problem with a BlueArc server. I think keeping the wsize high is probably semi-dangerous. I'll plan to use this bug to get that patch into RHEL6. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Patch(es) available on kernel-2.6.32-230.el6 (In reply to comment #8) > Reopening the bug... > > Steve ended up taking my patch to lower the default wsize upstream. We also ran > into a similar problem with a BlueArc server. I think keeping the wsize high is > probably semi-dangerous. > > I'll plan to use this bug to get that patch into RHEL6. hi, jeff I want to reproduce the bug, but fail. #cat /proc/mounts //10.66.13.199/cifs/ /mnt/test cifs rw,relatime,sec=ntlm,unc=\\10.66.13.199\cifs,username=test,uid=0,noforceuid,gid=0,noforcegid,addr=10.66.13.199,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=131008 0 0 #info catched by wireshark: 2.053739 10.16.42.37 -> 10.66.13.199 SMB Write AndX Request, FID: 0x0001, 102337 bytes at offset 0 ...snip 2.345623 10.66.13.199 -> 10.16.42.37 SMB Write AndX Response, FID: 0x0001, 102337 bytes solaris server info: root@solaris:/export/home/cifs# cat /etc/release Oracle Solaris 11 11/11 X86 Copyright (c) 1983, 2011, Oracle and/or its affiliates. All rights reserved. Assembled 18 October 2011 root@solaris:/export/home/cifs# ifconfig net0 net0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 3 inet 10.66.13.199 netmask fffffe00 broadcast 10.66.13.255 ether 44:37:e6:5e:91:83 root@solaris:/export/home/cifs# uname -a SunOS solaris 5.11 11.0 i86pc i386 i86pc If I test the latest kernel, and find 'Write AndX Request' with length no more 64K, could this verify the bug? Well...that looks odd. 102337 isn't a multiple of PAGE_SIZE. What sort of writes are you doing from userspace with this test? Also, I think the problem is that the RFC1002 size gets pushed over 0x19000 bytes. How big is the SMB frame? It's also possible that Oracle fixed this in more recent versions of solaris. Regardless, we still need to lower the default size even if your Solaris box works. What you really want to test for here is that the wsize now defaults to 65536 when unix extensions aren't enabled. (In reply to comment #17) > Well...that looks odd. > > 102337 isn't a multiple of PAGE_SIZE. What sort of writes are you doing from > userspace with this test? I just tested as comment 0. CIFS IO is incurred by 'cp' . file tested is of size 102337. (In reply to comment #18) I test 3 cases. 1. nounix is used, and no wsize is used mount info show wsize = 65536, and "Write" packets catched by tshark have size of 65536. 2. neither nounix nor wsize is used When solaris smb server is used, mount info show wsize = 65536, and "Write" packets catched by tshark have size of 65536. when samba/linux server is used(unix extension = yes), mount info show wsize = 65536. 3. wsize is used wsize could not overtake 131008 (CIFS_MAX_RFC1002_WSIZE), whatever wsize is set. So, how to make wsize >= CIFS_DEFAULT_IOSIZE(1M)?? maybe CIFS_UNIX_LARGE_WRITE_CAP is not open? (In reply to comment #20) > So, how to make wsize >= CIFS_DEFAULT_IOSIZE(1M)?? maybe > CIFS_UNIX_LARGE_WRITE_CAP is not open? 3. wsize is used samba should be configured to support larger write. wsize could be set value of range (?? ~ CIFS_MAX_WSIZE or ~16M). Check /proc/mounts and packets catched by tshark. Jeff, could this verify the bug? (In reply to comment #20) > (In reply to comment #18) > > 2. neither nounix nor wsize is used > When solaris smb server is used, mount info show wsize = 65536, and "Write" > packets catched by tshark have size of 65536. > when samba/linux server is used(unix extension = yes), mount info show wsize > = 65536. > When samba/linux support larger write, default wsize = 1048576 In order to get wsizes larger than 65536, you'll need to configure the server to enable unix extensions. For samba, that means turning this on: unix extensions = yes ...and configure it to set CAP_LARGE_WRITE_X in the capabilities flags. To do this with samba, you need to set: min receivefile size = 4096 ...in truth, I believe you can set it to smaller or larger values, but below 4096 or so, there's probably no real reason to use splice(). This patch is tested with linux/samba. And the bug can't be reproduced with solaris 11, which maybe fix its own problem. But in my case, I have tested kinds of wsize, and general IO is tested with difference wsize by fsx program. In my case wsize takes value of : 131008,65536, 60000, 131008, 18000, 16777151, 1048576. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Windows clients never send write requests larger than 64 KB but the default size for write requests in Common Internet File System (CIFS) was set to a much larger value. Consequently, write requests larger than 64 KB caused various problems on certain third-party servers. This update lowers the default size for write requests to prevent this bug. The user can override this value to a larger one to get better performance. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0862.html |