773705 – cifs: i/o error on copying file > 102336 bytes

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 773705 - cifs: i/o error on copying file > 102336 bytes

Summary: cifs: i/o error on copying file > 102336 bytes

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 6
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	6.2
Hardware:	x86_64
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Jeff Layton
QA Contact:	Jian Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	789058
TreeView+	depends on / blocked

Reported:	2012-01-12 16:54 UTC by illtud
Modified:	2014-06-18 07:42 UTC (History)
CC List:	10 users (show)
Fixed In Version:	kernel-2.6.32-230.el6
Doc Type:	Bug Fix
Doc Text:	Windows clients never send write requests larger than 64 KB but the default size for write requests in Common Internet File System (CIFS) was set to a much larger value. Consequently, write requests larger than 64 KB caused various problems on certain third-party servers. This update lowers the default size for write requests to prevent this bug. The user can override this value to a larger one to get better performance.
Clone Of:
Environment:
Last Closed:	2012-06-20 08:19:11 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2012:0862	0	normal	SHIPPED_LIVE	Moderate: Red Hat Enterprise Linux 6 kernel security, bug fix and enhancement update	2012-06-20 12:55:00 UTC

Description illtud 2012-01-12 16:54:18 UTC

Description of problem:

Copying files to a cifs mount (server is solaris storage appliance). If the file is 102,336 bytes or under, it works fine. Any bigger and it creates a zero-byte file and gives me:

"cp: closing `/mnt/SCIF2/newspapers/ocr/apnav/mets/foo': Input/output error"

share is mounted with cifs:
//myserver.example.com/scan_scan2/SCIF2 on /mnt/SCIF2 type cifs (rw)


Version-Release number of selected component (if applicable):

cifs-utils-4.8.1-5.el6.x86_64

How reproducible:

Always for me

Steps to Reproduce:
1. dd if=/dev/zero of=foo bs=1 count=102337
2. cp -f foo bar /mnt/SCIF2/newspapers/ocr/apnav/mets
3.
  
Actual results:

"cp: closing `/mnt/SCIF2/newspapers/ocr/apnav/mets/foo': Input/output error"

Expected results:

copy the file

Additional info:

Works fine in RHEL5.* FC15 etc. Can't find any significance to 102,337 bytes. Any ideas?

Comment 2 Jeff Layton 2012-01-12 18:35:13 UTC

(cc'ing Sachin and Steve French, the upstream CIFS maintainer)

Yes. This is actually a bug in Solaris. See the thread here where I proposed a patch to address this that Steve has so far NAK'ed:

    http://thread.gmane.org/gmane.linux.kernel.cifs/4909/focus=4981

I also traded emails with Gordon Ross (who works with Nexenta) on this matter and here's what he said:

------------------[snip]------------------
I just looked at this.  The current code allocates a buffer large enough
for whatever size the NetBIOS-style header length indicates.
However, there's a bug in the write path (whch I just stumbled upon
this week due to activities unrelated to this disucssion) so that the
marshalling code clamps the size at 0x19000 (102400).

So the good news is, we'll probably "just work" with the Linux client
after this other bug gets fixed.
------------------[snip]------------------

Setting the wsize smaller, to maybe wsize=98304 or so should work around the problem for now. If you can simultaneously report this bug to oracle, then maybe they can fix it there too and you can get the benefit of the larger wsize.

Comment 3 Jeff Layton 2012-01-12 20:56:25 UTC

For now, I'll set this to needinfo from you. If you can confirm whether mounting with a smaller wsize= value helps, then we can determine what to do from there.

Comment 4 illtud 2012-01-13 09:19:29 UTC

Great, thanks, I'll try the small wsize. In the meantime, any explanation on why it works fine in RHEL5.x ?

Comment 5 illtud 2012-01-13 12:03:30 UTC

Sorry, I can see from the man.cifs page:

wsize=bytes
           Maximum amount of data that the kernel will send in a write request
           in bytes. Prior to RHEL6.2 kernels, the default and maximum was
           57344 (14 * 4096 pages). As of RHEL6.2, the default is 1M, and the
           maximum allowed is 16M.

Which explains my question.

Comment 6 illtud 2012-01-13 12:04:39 UTC

I can confirm that wsize=98304 fixes the issue for me. Thanks for your help.

Comment 7 Jeff Layton 2012-01-13 12:28:07 UTC

No problem. Since this is a server bug, I think we probably will need to go ahead and close this with a resolution of NOTABUG. Please reopen this bug if you want to discuss it further.

If you could also report this to Oracle and ask them to fix this bug in the server then that would be good too.

Comment 8 Jeff Layton 2012-01-18 15:49:11 UTC

Reopening the bug...

Steve ended up taking my patch to lower the default wsize upstream. We also ran into a similar problem with a BlueArc server. I think keeping the wsize high is probably semi-dangerous.

I'll plan to use this bug to get that patch into RHEL6.

Comment 12 RHEL Program Management 2012-01-18 18:09:25 UTC

This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 14 Aristeu Rozanski 2012-02-10 23:00:10 UTC

Patch(es) available on kernel-2.6.32-230.el6

Comment 16 Jian Li 2012-02-17 10:29:03 UTC

(In reply to comment #8)
> Reopening the bug...
> 
> Steve ended up taking my patch to lower the default wsize upstream. We also ran
> into a similar problem with a BlueArc server. I think keeping the wsize high is
> probably semi-dangerous.
> 
> I'll plan to use this bug to get that patch into RHEL6.

hi, jeff
I want to reproduce the bug, but fail.

#cat /proc/mounts
//10.66.13.199/cifs/ /mnt/test cifs rw,relatime,sec=ntlm,unc=\\10.66.13.199\cifs,username=test,uid=0,noforceuid,gid=0,noforcegid,addr=10.66.13.199,file_mode=0755,dir_mode=0755,serverino,rsize=16384,wsize=131008 0 0

#info catched by wireshark:
  2.053739  10.16.42.37 -> 10.66.13.199 SMB Write AndX Request, FID: 0x0001, 102337 bytes at offset 0
...snip
  2.345623 10.66.13.199 -> 10.16.42.37  SMB Write AndX Response, FID: 0x0001, 102337 bytes

solaris server info:
root@solaris:/export/home/cifs# cat /etc/release 
                           Oracle Solaris 11 11/11 X86
  Copyright (c) 1983, 2011, Oracle and/or its affiliates.  All rights reserved.
                            Assembled 18 October 2011
root@solaris:/export/home/cifs# ifconfig net0
net0: flags=1004843<UP,BROADCAST,RUNNING,MULTICAST,DHCP,IPv4> mtu 1500 index 3
        inet 10.66.13.199 netmask fffffe00 broadcast 10.66.13.255
        ether 44:37:e6:5e:91:83 
root@solaris:/export/home/cifs# uname -a
SunOS solaris 5.11 11.0 i86pc i386 i86pc


If I test the latest kernel, and find 'Write AndX Request' with length no more 64K, could this verify the bug?

Comment 17 Jeff Layton 2012-02-17 12:20:11 UTC

Well...that looks odd.

102337 isn't a multiple of PAGE_SIZE. What sort of writes are you doing from
userspace with this test?

Comment 18 Jeff Layton 2012-02-17 12:26:45 UTC

Also, I think the problem is that the RFC1002 size gets pushed over 0x19000 bytes.
How big is the SMB frame?

It's also possible that Oracle fixed this in more recent versions of solaris.

Regardless, we still need to lower the default size even if your Solaris box
works. What you really want to test for here is that the wsize now defaults
to 65536 when unix extensions aren't enabled.

Comment 19 Jian Li 2012-02-20 02:09:17 UTC

(In reply to comment #17)
> Well...that looks odd.
> 
> 102337 isn't a multiple of PAGE_SIZE. What sort of writes are you doing from
> userspace with this test?

I just tested as comment 0. CIFS IO is incurred by 'cp' . file tested is of size 102337.

Comment 20 Jian Li 2012-02-20 03:24:44 UTC

(In reply to comment #18)

I test 3 cases.
1. nounix is used, and no wsize is used
  mount info show wsize = 65536, and "Write" packets catched by tshark have size of 65536.

2. neither nounix nor wsize is used
  When solaris smb server is used, mount info show wsize = 65536, and "Write" packets catched by tshark have size of 65536.
  when samba/linux server is used(unix extension = yes), mount info show wsize = 65536. 

3. wsize is used
  wsize could not overtake 131008 (CIFS_MAX_RFC1002_WSIZE), whatever wsize is set.

  So, how to make wsize >= CIFS_DEFAULT_IOSIZE(1M)?? maybe CIFS_UNIX_LARGE_WRITE_CAP is not open?

Comment 21 Jian Li 2012-02-20 05:04:14 UTC

(In reply to comment #20)


>   So, how to make wsize >= CIFS_DEFAULT_IOSIZE(1M)?? maybe
> CIFS_UNIX_LARGE_WRITE_CAP is not open?

3. wsize is used
  samba should be configured to support larger write. wsize could be set value of range (?? ~ CIFS_MAX_WSIZE or ~16M). Check /proc/mounts and packets catched by tshark.

Jeff, could this verify the bug?

Comment 22 Jian Li 2012-02-20 05:06:58 UTC

(In reply to comment #20)
> (In reply to comment #18)
> 

> 2. neither nounix nor wsize is used
>   When solaris smb server is used, mount info show wsize = 65536, and "Write"
> packets catched by tshark have size of 65536.
>   when samba/linux server is used(unix extension = yes), mount info show wsize
> = 65536. 
> 
When samba/linux support larger write, default wsize = 1048576

Comment 23 Jeff Layton 2012-02-20 11:20:37 UTC

In order to get wsizes larger than 65536, you'll need to configure the server
to enable unix extensions. For samba, that means turning this on:

     unix extensions = yes

...and configure it to set CAP_LARGE_WRITE_X in the capabilities flags. To do
this with samba, you need to set:

     min receivefile size = 4096

...in truth, I believe you can set it to smaller or larger values, but below
4096 or so, there's probably no real reason to use splice().

Comment 25 Jian Li 2012-02-28 01:47:16 UTC

This patch is tested with linux/samba. And the bug can't be reproduced with solaris 11, which maybe fix its own problem.  But in my case, I have tested kinds of wsize, and general IO is tested with difference wsize by fsx program.

In my case wsize takes value of :
131008,65536, 60000, 131008, 18000, 16777151, 1048576.

Comment 26 Tomas Capek 2012-03-08 11:40:59 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Windows clients never send write requests larger than 64 KB but the default size for write requests in Common Internet File System (CIFS) was set to a much larger value. Consequently, write requests larger than 64 KB caused various problems on certain third-party servers. This update lowers the default size for write requests to prevent this bug. The user can override this value to a larger one to get better performance.

Comment 28 errata-xmlrpc 2012-06-20 08:19:11 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2012-0862.html

Note You need to log in before you can comment on or make changes to this bug.