Escalated to Bugzilla from IssueTracker
Description of problem: If a file is moved to a CIFS share which exceeds the quota on the server, there's no errormessage and after the mv only a part of the file is on the share. The local file is deleted! How reproducible: 100% Steps to Reproduce: dd if=/dev/urandom of=testfile.dat bs=1M count=500 logger "=== Start ==========================" df -P ls -l testfile.dat mv testfile.dat /net/m525017/J/temp/ ls -l testfile.dat ls -l /net/m525017/J/temp/testfile.dat df -P logger "=== End ==========================" Actual results: Only part of the moved file gets to the cifs share. The other part is lost. No error occurs. Expected results: Command should fail with something like "no space left on device", no data should be lost. Additional info: The fileserver is a standard MS-Windows Server 2003 SP1 with no modification of fileservices. Client mount-options are: acl,intr,nocase,nouser_xattr,rsize=8192,wsize=8192,soft,uid=$(id -u),gid=$(id -g),dirmode=0770,filemode=0660,credentials=${CREDENTIALS} This event sent from IssueTracker by vincew [SEG - Storage] issue 134794
Dominik, first of all, thank you very much for reporting this. I am trying to locate the resources in-house to try reproducing the problem. Meanwhile I do this, do you think you could provide me a tcpdump trace of the network traffic when this problem is happening ? It might be worth lowering the quota so that we don't end up collecting too much network traffic. I am interested in capturing all SMB/CIFS traffic with the following command: tcpdump -i any -w /tmp/rhsupport-cifs.pcap -s 0 port 139 or port 445 I realize that CIFS is giving write errors which should be enough to make "mv" suspicious about the success of the operation, but I would like to see what messages are returned from the Windows server in regards to the success of the operation. Thank you very much. -- Navid Internal Status set to 'Waiting on Customer' Status set to: Waiting on Client This event sent from IssueTracker by vincew [SEG - Storage] issue 134794
Dominik, I quickly look at the trace you have sent and it appears clear that the Windows file-server is returning an NT status "STATUS_DISK_FULL" starting on frame 2736. The CIFS client continues to send write requests with an incremental offset instead of retrying, which leads me to think that it simply is ignoring the error status previously returned. I will try to reproduce this in our labs and let you know. Thank you very much. -- Navid This event sent from IssueTracker by vincew [SEG - Storage] issue 134794
Dominik, the problem appears to be already fixed in FC6 as of kernel 2.6.22.7-57.fc6. I am currently looking at the changelogs and diffs to see if I can locate where the fix was introduced. Regards, -- Navid This event sent from IssueTracker by vincew [SEG - Storage] issue 134794
Navid, this is a severe bug. It should be fixed as soon as possible. If this doesn't get in 5.1 we need a intermediate solution. -- Dominik This event sent from IssueTracker by vincew [SEG - Storage] issue 134794
I have done some further testing using the latest available kernel for RHEL5. Interestingly while "mv" thinks the move was successful and removes the original file (causing data loss), "cp" is behaving correctly in my environment. I have attached both straces for "mv" and "cp" to comapare. "cp" does the following: open("./navid.dat", O_WRONLY|O_CREAT|O_LARGEFILE, 0100644) = 4 read(3, "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0"..., 16384) = 16384 write(4, "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0"..., 16384) = 16384 [...] read(3, "", 16384) = 0 close(4) = -1 ENOSPC (No space left on device) As opposed to "mv", which doesn't return an error when closing the fd for the destination file: open("./navid.dat", O_WRONLY|O_CREAT|O_LARGEFILE, 0100644) = 4 read(3, "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0"..., 16384) = 16384 write(4, "\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0\\0"..., 16384) = 16384 [...] utimes("/proc/self/fd/4", {1192187215, 0}) = 0 fchown32(4, 0, 0) = 0 getxattr("/tmp/navid.dat", "system.posix_acl_access", 0xbfe9aa20, 132) = -1 ENODATA (No data available) stat64("/tmp/navid.dat", {st_mode=S_IFREG|0644, st_size=9437184, ...}) = 0 setxattr("./navid.dat", "system.posix_acl_access", "\\x02\\x00\\x00\\x00\\x01\\x00\\x06\\x00\\xff\\xff\\xff\\xff\\x04\\x00\\x04\\x00\\xff\\xff\\xff\\xff \\x00\\x04\\x00\\xff\\xff\\xff\\xff", 28, 0) = -1 EOPNOTSUPP (Operation not supported) chmod("./navid.dat", 0100644) = 0 read(3, "", 16384) = 0 close(4) = 0 The FD is opened the same way in both cases, the only obvious difference is those ACL related calls (getxattr and setxattrs) which are in between the last read() and close() of the destination file (which returns 0). I will do further testing and let you know. Regards, -- Navid This event sent from IssueTracker by vincew [SEG - Storage] issue 134794
Yes, "cp" works here too. If the quota is reached on the share (disk is "full") a "mv" gives an error message too. It seems, that mv ignores disk-full messages after it starts writing!? -- Dominik This event sent from IssueTracker by vincew [SEG - Storage] issue 134794
Short additonally note: not only "mv" but also moving files with nautilus results in data loss. -- Dominik This event sent from IssueTracker by vincew [SEG - Storage] issue 134794
Internal Status set to: 'Waiting on SEG' Data corruption on RHEL5 using latest kernel, beta and even jlayton's test kernels. Very easy to reproduce: 1. Windows box sharing a folder with quotas enforced 2. RHEL5 mounting the share using CIFS 3. dd if=/dev/zero of=/tmp/navid.dat bs=1M count=100 4. mv /tmp/navid.dat /mnt/windows/share When Windows is returning STATUS_DISK_FULL and CIFS error messages are visible in the kernel logs, even if the quota limit has been reached the command will exit successfully and remove the source file, even if the destination contents are actually truncated. This doesn't seem to affect "cp", which for some reason correctly received a ENOSPC when doing a close() on the destination file. FC6 with 2.6.22.7-57.fc6 doesn't seem to be affected either. -- Navid Issue escalated to Support Engineering Group by: navid. This event sent from IssueTracker by vincew [SEG - Storage] issue 134794
Ok, I set up a win2k3 export with quotas turned on. I successfully get ENOSPC errors back for both mv and cp. On a mv, the original file is not unlinked. I'm testing: 2.6.18-52.el5.jtltest.8xen ...which is pretty much the same CIFS that's going into 5.1 (a couple of extra patches, but nothing that should affect this).
Created attachment 225561 [details] 'script' output of test run
Created attachment 225571 [details] syslog from running tests
Created attachment 225581 [details] strace of cp operation
For reference, from 2.6.18-52.el5.jtltest.8xen: write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384 write(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 16384) = 16384 read(3, "", 16384) = 0 utimes("/proc/self/fd/4", {1192199452, 0}) = 0 fchown(4, 0, 0) = 0 getxattr("/tmp/bigfile.txt", "system.posix_acl_access", 0x7ffff3a72a20, 132) = -1 ENODATA (No data available) stat("/tmp/bigfile.txt", {st_mode=S_IFREG|0644, st_size=125829120, ...}) = 0 setxattr("/mnt/cifs/bigfile.txt", "system.posix_acl_access", "\x02\x00\x00\x00\x01\x00\x06\x00\xff\xff\xff\xff\x04\x00\x04\x00\xff\xff\xff\xff \x00\x04\x00\xff\xff\xff\xff", 28, 0) = -1 EOPNOTSUPP (Operation not supported) chmod("/mnt/cifs/bigfile.txt", 0100644) = 0 close(4) = -1 ENOSPC (No space left on device) open("/usr/share/locale/locale.alias", O_RDONLY) = 4 ...that's from trying to mv a 120M file to a volume with a 100M quota.
Created attachment 225591 [details] strace of mv operation
Created attachment 225601 [details] tcpdump of cifs session
Created attachment 225611 [details] sosreport
The return code for the close() is actually governed by the return value from the flush operation. There's a difference in your box versus mine in this situation: Yours: fs/cifs/file.c: CIFS VFS: in cifs_writepages as Xid: 18339 with uid: 0 fs/cifs/file.c: CIFS VFS: leaving cifs_writepages (xid = 18339) rc = 0 fs/cifs/file.c: Flush inode e24d71e0 file dfa07d40 rc 0 Mine: fs/cifs/file.c: CIFS VFS: in cifs_writepage as Xid: 741191 with uid: 0 fs/cifs/file.c: write 4096 bytes to offset 21209088 of bigfile.txt fs/cifs/file.c: CIFS VFS: in cifs_write as Xid: 741192 with uid: 0 fs/cifs/transport.c: For smb_command 47 fs/cifs/transport.c: Sending smb of length 4160 fs/cifs/connect.c: rfc1002 length 0x36) fs/cifs/misc.c: Calculated size 51 vs length 54 mismatch for mid 16707 Status code returned 0xc000007f NT_STATUS_DISK_FULL fs/cifs/netmisc.c: !!Mapping smb error code 112 to POSIX err -28 !! fs/cifs/cifssmb.c: Send error in write = -28 fs/cifs/file.c: CIFS VFS: leaving cifs_write (xid = 741192) rc = -28 fs/cifs/file.c: CIFS VFS: leaving cifs_writepage (xid = 741191) rc = -28 fs/cifs/file.c: Flush inode ffff880015c13ac8 file ffff88001e6806c0 rc -28 ...so on your machine, it looks like there aren't any actual pages queue up to be flushed by the time the flush call is done. Since there's nothing queued up, then the flush doesn't seem to fail. There are some differences between my test rig and yours. Mine's x86_64, for instance...
I tested this using 2.6.18-20.el5 (plus 2 non-related ipsec patches) on an x86 vmware rhel5 guest. I was denied once i exceeded quota in the mv operation: [root@rhel5-test1 ~]# dd if=/dev/urandom of=testfile.dat bs=1M count=200 200+0 records in 200+0 records out 209715200 bytes (210 MB) copied, 68.0668 seconds, 3.1 MB/s [root@rhel5-test1 ~]# ls *dat -rw-r--r-- 1 root root 209715200 Oct 12 12:30 testfile.dat [root@rhel5-test1 ~]# mv testfile.dat /mnt/tmp2/ mv: closing `/mnt/tmp2/testfile.dat': No space left on device [root@rhel5-test1 ~]# ls *dat -rw-r--r-- 1 root root 209715200 Oct 12 12:30 testfile.dat [root@rhel5-test1 ~]# ls /mnt/tmp2/*dat -rw-r--r-- 1 root root 104857600 Oct 12 12:31 /mnt/tmp2/testfile.dat [root@rhel5-test1 ~]# This event sent from IssueTracker by vincew issue 134794
note - /mnt/tmp2 is where the remote w2k3sp1 share was mounted.. I did a 200MB dd file to speed up reproduction (or not) after a few 500MB tries in which the mv succeeded (because of not having the deny storage to users with exceeded quota option in w2k3sp1). Jeff believes we may be seeing a race between flushing of pages and the close() - to explain why we don't always see this happening. --vince This event sent from IssueTracker by vincew issue 134794
Created attachment 225861 [details] reproducer program I'm pretty sure that this has something to do with a writepages completing before the close is called, but I'm having a very hard time reproducing this. I wrote a small test program that writes out some data to a file, does a fsync, a variable length sleep and then a close. On a normal box: fsync(3) = -1 ENOSPC (No space left on device) fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaaaac000 write(1, "fsync returned -1 errno 28\n", 27fsync returned -1 errno 28 ) = 27 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({1, 0}, {1, 0}) = 0 close(3) = -1 ENOSPC (No space left on device) write(1, "close returned -1 errno 28\n", 27close returned -1 errno 28 ) = 27 exit_group(-1) = ? ...on navid's machine: fsync(3) = -1 ENOSPC (No space left on device) fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f13000 write(1, "fsync returned -1 errno 28\n", 27fsync returned -1 errno 28 ) = 27 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, NULL, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 nanosleep({1, 0}, {1, 0}) = 0 close(3) = 0 write(1, "close returned 0 errno 28\n", 26close returned 0 errno 28 ) = 26 exit_group(0) = ? ...so on his machine the fsync returns -1 (with errno set to -ENOSPC), but the close returns 0. On every other machine I've tested, both the fsync and the close return error. I'm not sure what makes his machine "special" here.
Ok, I think I've been able to reproduce it. I had to get the share very close to quota and change the program to only write a little bit more than the quota allows. So for instance I got the share to quota - 64k, and then changed the program to only write 128k. With that I was able to see it. Still not certain why navid's box can reproduce it so easily, but I think there is some raciness involved and perhaps vmware makes it easier to reproduce. Now, I think the problem is this (presuming I'm following the code correctly): When cifs_writepages is called to flush the pagecache for the file, it does this: rc = CIFSSMBWrite2(xid, cifs_sb->tcon, open_file->netfid, bytes_to_write, offset, &bytes_written, iov, n_iov, 1); atomic_dec(&open_file->wrtPending); if (rc || bytes_written < bytes_to_write) { cERROR(1,("Write2 ret %d, written = %d", rc, bytes_written)); /* BB what if continued retry is requested via mount flags? */ set_bit(AS_EIO, &mapping->flags); ...and then later it frees up the pages that it attempts to flush. So if there is any error at all in the write, we toss out the page (yikes!). When there are errors in the write, cifs_writepages (correctly) returns error, but there is no guarantee that the writepages being done is for the close() call. If this writepages is being done for something else, and the entire pagevec for the file is flushed then the next pass through cifs_writepages will end up with no outstanding pages to be written and will return 0. It seems like we should keep track in the cifsFileInfo for the file that a previous cifs_writepages failed and have this function return an error until it closes. Then again, I'm not certain what the correct behavior is for a close() that follows an unsuccessful fsync().
Created attachment 226081 [details] upstream patch -- only sync out the file on size changes in cifs_setattr This patch is against cifs-2.6 git tree. Sent to upstream maintainer for comment. I tested a variant of this on RHEL5 and it seems to fix the problem. This just prevents the cifs client from syncing out pages when setattr isn't going to change the file size. It also has cifs_setattr check the return code of filemap_write_and_wait and error out if it fails. Awaiting upstream comment at this point...
Created attachment 228691 [details] proposed upstream patch 2 A second proposed upstream patch. I'm not certain this is the right approach but it seems like it to me currently. The idea here is that we want to separate "unconscious" flushes from "conscious" ones. By unconscious, I mean when the kernel needs to free up memory, or when the flush occurs as a side effect of another operation -- like setattr. Conscious flushes would be ones that occur due to a fsync or close. The idea is that when we do any sort of filemap_fdatawait operation that we make sure we store the return code from it on an unconscious call. Then when we fsync or flush, we make sure that we return any stored errors that we have if the existing one doesn't return an error. I'm not sure I have all of the semantics right, so this is really an RFC at this point.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 237941 [details] proposed upstream patch 3 Another proposed patch -- this one removes the changes to the AIO operation (which I don't think is correct). Still awaiting comment at this point...
Created attachment 259751 [details] patch -- better handle writeback errors Proposed patch sent to upstream lists this morning: This fixes a bug in the earlier patches (write_behind_rc wasn't being properly initialized). I have some kernels on my people page with this patch: http://people.redhat.com/jlayton ...if the customer has a non-critical place to do so, it would be nice if they could test these and let me know if they fix the issue for them (or if they notice any problems with them).
Jeff, The customer has tested you test kernels containing the patch, mv now works as expected. However, they have highlighted another issue, not sure whether this is related. --- But downloading with Firefox does not work: While the download is in progress I do a ls on the file: $ls -latr Fedora-8-i386-DVD.iso -rw-r--r-- 1 m525017 bin 180226800 20. Nov 11:11 Fedora-8-i386-DVD.iso -> 180MB downloaded that's exactly what Firefox's download dialog says. But on the windows fileserver there are only 68MB in this file. And dmesg gives tons of: CIFS VFS: Write2 ret -28, written = 0 --- Cheers, Paul on behalf of Navid Internal Status set to 'Waiting on Engineering' This event sent from IssueTracker by pdwyer issue 134794
Possibly... #define ENOSPC 28 /* No space left on device */ ...are they attempting to exceed the quota (or the amount of disk space) when they copy this file to the share?
I'm not seeing this when I 'cp' or 'dd' files to a share. Once it hits the quota it eventually the program returns ENOSPC and the file shows the correct size on the cifs mount. Exactly which kernel did they test and against what sort of server? I wonder if firefox is holding the file open for some reason? Could you have them do this test and then run: # lsof /path/to/file/on/cifs/share and see if firefox has it open?
Has anyone measured whether this patch affects performance much? It seems excessive to flush all pages, rather than simply the last page, when extending the file size.
No, I've not done any real performance testing with this at all. I figured we'd shoot for correctness first and then worry about performance afterward. That said, I don't think I've changed the semantics of what gets flushed when the file size changes, have I?
I misread that part of the patch - you are right, it looks like it should not affect performance. I like the idea of returning the writebehind rc earlier in some cases (than the close) - but there are some interesting questions that this brings up e.g. whether we should behave differently on "hard" vs. "soft" mounts when the original write error occurred - and what happens to those pages in the pagecache that we tried to write out and failed to? But I am leaning toward merging this now and pushing for it to go in 2.6.24
Steve, I've made comment #34 public here too which you should probably be aware of. I've not been able to replicate those results and the situation where they saw this isn't very clear. If there's time then we may want to try to get clarification of that before merging. > I like the idea of returning the writebehind rc earlier in some > cases (than the close) I think fsync() dictates that we have to return any writeback errors at that time, so I think we want something like this patch. > but there are some interesting questions that this brings up > e.g. whether we should behave differently on "hard" vs. "soft" > mounts Should we retry this writeback on a hard mount? Good question. The mount.cifs manpage is a little vague on what hard mounts mean, but the nfs manpage says this: hard If an NFS file operation has a major timeout then report "server not responding" on the console and continue retrying indefinitely. This is the default. This seems to imply that soft/hard really only comes into play when the server isn't responding. In the case of quota overrun, the server is responding and returned an error, so I think failing the operation is appropriate. > and what happens to those pages in the pagecache that we tried > to write out and failed to? Currently (both with and without this patch), they just get tossed out, which is why we want to make sure we return error when that occurs.
Should not hurt the performance of setattr (actually could help it) but it could hurt both flush and sync which now with your ptach do filemap_write_and_wait() instead of filemap_fdatawrite so could be slower. Are you sure that we have to do this for flush in order to fix the bug - that change seems drastic. I don't mind the part about reporting the writebehind rc earlier and saving the writebehind rc in more places of course.
Yes, I think we do have to do that. If I understand correctly, filemap_fdatawrite does: __filemap_fdatawrite(mapping, WB_SYNC_ALL); ...this basically means: "Initiate writeback on all pages for the mapping, and if the page is busy, wait for it so we can initiate writeback" I don't think that waits for the writeback to actually complete, however. For that you're supposed to use filemap_fdatawait. For CIFS though, I don't think it really matters, because we do this in cifs_writepages: if (wbc->sync_mode != WB_SYNC_NONE) wait_on_page_writeback(page); ...so we're waiting on writeback to complete anyway. We still need filemap_fdatawait I think though. When a page writeback fails, we set AS_EIO (or AS_ENOSPC with the patch) in the mapping flags and continue. If writeback of one page fails, and then the next one succeeds we can end up returning "0" from cifs_writepages. Also we'd have never clear the bits out of the mapping flags once we did report the error. I wonder if we shouldn't get rid of write_behind_rc altogether and try to handle this with mapping->flags directly, but that seems like it would be a larger change...
So to sum up the earlier comment, I don't think filemap_fdatawait will make a big difference in performance since we wait on each page to be written out anyway. What might help performance though is to eventually change cifs_writepages to initiate writeback on all of the pages w/o waiting for them, and then go back and wait on any that aren't complete (I think that's what NFS does).
My understanding of the filemap_fdatawait and filemap_fdatawrite semantics could be wrong here though, so feel free to correct me if that's the case :-)
Why isn't the writebehind rc returned on sync?
Nevermind - the writebehind rc is returned on sync in your patch - I was looking at the wrong copy of the file for a second
Merged into mainline with the minor one line change (from Jeff's patch attached here) discussed earlier - ie not doing a wait on the flush path (since that changes affect on performance is unknown and it does not appear necessary to fix this problem). Otherwise the same patch.
(In reply to comment #36) > I'm not seeing this when I 'cp' or 'dd' files to a share. Once it hits the quota > it eventually the program returns ENOSPC and the file shows the correct size on > the cifs mount. Exactly which kernel did they test and against what sort of > server? I wonder if firefox is holding the file open for some reason? Could you > have them do this test and then run: > > # lsof /path/to/file/on/cifs/share > > and see if firefox has it open? Customer provided the output of `lsof` just after the quota was exceeded: # lsof /net/J/temp/Fedora-8-i386-DVD.iso COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME firefox-b 6108 dominik 35w REG 0,24 68896800 34767 /net/m525017/J/temp/Fedora-8-i386-DVD.iso -- Navid
I'm not sure that that tells me much. If they checked lsof just after the quota was exceeded, then I assume that was before the file had been written back. It's possible that there are outstanding pages still undergoing writeback when they checked this. I suspect what they're seeing is that the kernel hasn't attempted to write back the last page in this file at the time that they're checking the file size. Since that page is still active in the mapping, the file's size reflects that page's position in the file and they see it showing up at 180M. The file is sparse, however since some intervening pages aren't written out. If they kill off firefox after downloading this file, does the file size show up appropriately? If not, if they then do a "sync" does the file size start showing up correctly?
Ok, I think I see what may be happening... First, Fedora-8-i386-DVD.iso totals in at ~3G, not 180M. So I presume that they're looking at the file when it's still in the process of being downloaded by firefox (and hence, is the file is still open). Firefox just does a simple open() and writes to the file with a standard synchronous write() call. These calls are generally returning successful, even when the partition is at quota. From the write(2) manpage: A successful return from write() does not make any guarantee that data has been committed to disk. In fact, on some buggy implementations, it does not even guarantee that space has successfully been reserved for the data. The only way to be sure is to call fsync(2) after you are done writing all your data. ...on a networked filesystem in particular, it's tricky to manage write errors of this nature. We want good performance when writing which means allowing the kernel to buffer things up without waiting for the server to respond. But, if we allow the kernel to buffer the write and return success before actually sending the data off to the server then we lose the ability to report that error to userspace for the write. So we can only report that error when the file is closed, or a fsync() is done. A contributing factor here is that firefox deletes the file after you cancel the download, so it doesn't bother to report that there were problems writing out the file. I'm going to argue that this behavior is expected (even if it isn't ideal).
From the original description: > Additional info: > The fileserver is a standard MS-Windows Server 2003 SP1 with no modification of > fileservices. They've changed the server configuration since reporting this, and the new problem they see sounds like a different issue. I suggest opening a new BZ for that and leaving this one to handle the issue that this patch is known to fix.
I've proposed an update of CIFS to a newer upstream rev for 5.2, so I'm going to fold this BZ into that one. *** This bug has been marked as a duplicate of 417961 ***
Unduping bug since I posted a patch with this BZ number.
in 2.6.18-62.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
(In reply to comment #52) > From the original description: > > > Additional info: > > The fileserver is a standard MS-Windows Server 2003 SP1 with no modification > of > fileservices. > > They've changed the server configuration since reporting this, and the new > problem they see sounds like a different issue. I suggest opening a new BZ for > that and leaving this one to handle the issue that this patch is known to fix. I had a word with the customer and there may have been a misunderstanding: they used MS-Windows Server 2003 R2 (also because R1 didn't have share-level quotas) during the whole time, they simply omitted that detail when they first reported the issue. So, from customer's perspective the issue is only partly solved. Please let me know if we shall still open a separate BZ. -- Navid
> So, from customer's perspective the issue is only partly solved. Please let me > know if we shall still open a separate BZ. Please do, since this is a different problem than what was originally understood and fixed by the patch in this BZ.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html