Bug 486092
Summary: | httpd Sendfile troubles reading from a CIFS share | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 5 | Reporter: | Paolo Penzo <paolo.penzo> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
Status: | CLOSED ERRATA | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 5.2 | CC: | dzickus, emcnabb, jlayton, jorton, steved |
Target Milestone: | rc | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2010-03-30 07:43:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Paolo Penzo
2009-02-18 11:14:56 UTC
Have you tested this with the RHEL 5.3 kernel? If sendfile() is failing it's almost certainly a kernel bug. I'm pretty sure this is a known bug. I think it's also broken upstream but it would be nice to confirm whether it's a problem on recent kernels. It's still there with kernel 2.6.18-128.1.6.el5 Created attachment 348923 [details] attempted reproducer How exactly are you testing this? I need some idea of how to reproduce it so I can track down the problem. This is a reproducer I wrote for the fedora bug report to try and reproduce this. It works fine on CIFS AFAICT, but it uses local unix sockets so maybe that's a factor. You may want to download and build it and and see if it works on your setup: Basically run it with a file sitting on a CIFS share as an argument: $ sendfiletest /file/on/cifs.txt ...it should output the size of the file as seen by the receiving end (for bonus points, make it do a checksum of the received file). If that also works for you, then we'll need to come up with a reproducer that demonstrates the problem. I suppose we can change the reproducer so that it does a TCP connection and use netcat to send it to a new file and compare contents. If the reproducer also fails for you, then please test the kernels on my people.redhat.com page and see if they make any difference: http://people.redhat.com/jlayton ...they have updated CIFS code that's slated for 5.4 (plus some other patches I'm looking at for 5.5). I suspect that they'll still have the problem too, but it would be good to confirm. The reproducer works fine: the reported size equals the size on the CIFS file. The kernel version is still 2.6.18-128.1.6.el5. By works fine, I assume you mean that the reported size is correct. If so, that means that the reproducer is just no good. I'll need to get some info on how you're reproducing this with apache. Can you provide details? Yes, the reproducer reported the same size of the ls command. Apache is serving part of a web site by reading files (autocad drawings) from a remote linux server (with security=ADS). The remote CIFS share is connected by autofs using these options -fstype=cifs,username=XXXXXXX,workgroup=YYYYYYYY,password=ZZZZZZZZ,uid=apache, gid=tomcat,file_mode=0664,dir_mode=0775,port=139 The mount options are nice, but I'm more concerned about how you tell that the problem has been reproduced. What goes wrong when you try to download these files? When using sendfile, the size of the received file is smaller of the original file. Wrote a new reproducer that connects to an IPv4 socket (which I connected to netcat) and then sends the file there. The received file seems to be the right size when I send a file on CIFS, but the sendfile call returns an error: sendfile(3, 4, NULL, 15886) = -1 EOVERFLOW (Value too large for defined data type) While I haven't looked at how apache uses sendfile, if it was calling it in a loop of chunks smaller than the file size then it may give up when it gets an error. Either way, sendfile shouldn't return that error, so something is wrong here. Created attachment 354613 [details]
patch -- fix sb->s_maxbytes so that it casts properly to a signed value
I think this patch will fix the problem. It fixes the reproducer I have so that sendfile no longer returns an EOVERFLOW error.
I'll see about building some test kernels soon with this patch, but if you're able to test it in the meantime and report back it would be helpful. I've also gone ahead and pushed this upstream too. > While I haven't looked at how apache uses sendfile, if it was calling it in a
> loop of chunks smaller than the file size then it may give up when it gets an
> error.
Yes, httpd uses a non-blocking socket and will end up popping in and out of sendfile() each time the TCP socket blocks.
Thanks a lot for tracking this down, Jeff. This has plagued people upstream for a long time, too.
Kernels with this patch (and another one for a similar fix for get_sb_pseudo) are on my people.redhat.com page: http://people.redhat.com/jlayton/ Please test these and report back as to whether they fix the problem for you. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. Paolo, Have you been able to try the test kernel in comment #14? Thanks! With kernel 2.6.18-169.el5.jtltest.90 it works fine. I apologize for the delay of my answer. Created attachment 365350 [details]
updated reproducer
I think this program should serve as a reproducer for this. Should return success if the sendfile returns success and vice versa.
I haven't tested this on unpatched CIFS yet, so if it doesn't seem to work let me know and I'll have a closer look.
Created attachment 366791 [details]
patch -- cifs/libfs: fix sb->s_maxbytes so that it casts properly to a signed value
This is the patch that was proposed. I added the patch to get_sb_pseudo as well.
in kernel-2.6.18-172.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified. An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html |