From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050301 Firefox/1.0.1 Red Hat/1.0.1-1.4.3.centos4.1 Description of problem: I wrote small program which implements mkfile (from Solaris) using mmap system call. Just as Solaris mkfile, my program can either create sparse file, or it can fill the file with zeros (allocating actualy disk blocks). The later case is done from a loop using mmap system call followed by memset and munmap. If the file is on local file system, this works great. If it is on NFS mounted partition, data corruption occurs. In case of NFS, modified file blocks are not sent back to the NFS server (Solaris 9 host). They exist only in local machine's cache. The simplified code itslef looks something like bellow. Error checking and some other stuff removed. The mlen variable is initialized before hand to value that is multiple of memory page size as obtained by getpagesize() call (still experimenting to find optimal value for it, first idea was to set it to 2^30, but something tells me this might cause lot of swapping). fd = open64(argv[i], O_RDWR | O_CREAT | O_TRUNC | O_LARGEFILE, 0666); pwrite64(fd, "\0", 1, size - 1) offset = 0; while (offset < size) { len = (size_t) (size - offset >= mlen ? mlen : size - offset); buf = mmap(NULL, mlen, PROT_WRITE, MAP_SHARED, fd, offset)) == MAP_FAILED) memset(buf, 0, len); munmap(buf, mlen); offset = offset + len; } close(fd); If the target file is on NFS mounted partition, after running the program and doing "du -sk" on the file, it reports that file is almost empty (basically, sparse file was created). To prove the point, I changed memset line to read memset(buf, '.', len), and than created 100kB file. If I do "less filename" on NFS client, it shows file full of dots. If I do the same on NFS server, it shows file full of nulls. This doesn't seem to be synchronization issue. No matter how long I wait, file blocks are not commited to NFS server. Doing "sync" on client doesn't help either. Version-Release number of selected component (if applicable): kernel-2.6.9-5.0.3.EL How reproducible: Always Steps to Reproduce: 1. create and mmap sparse file on NFS mounted partition 2. change data in mmaped region (that was sparse) 3. unmmap and close the file 4. changes are not comminted to NFS server Additional info:
Steve, this is probably a NOTABUG, but it would be nice if you could verify that ;)
One addtional note. Depending on the size of file I'm creating, some blocks do make it to the NFS server. For example, if I create 100kB file, no blocks seems to be commited to the NFS server. If I create 100MB file, some blocks do make it to actual disk storage ("du -sk filename" shows that file uses 11MB of disk space, so I'm still missing like 89MB). This is probably dependent on client machine's RAM size and how much of it is free for caching.
Thinking about it some more - the data should be written to the server some 30 seconds after the munmap. While the file is mmaped nothing needs to be written according to POSIX, but after the unmap the pages are marked dirty and the dirty file data flushing code should kick in. Aleksandar, does the data get written out to the server after a few minutes, or does it not get written at all ?
Well, I waited for almost half an hour, and it was not written to the server. "du -sk filename" on both client and server returns same numbers (as if the file is sparse). However, when reading the file on the client, I can see the data. I will reboot the client, and see if that will flush the data. Will report back in couple of minutes.
After the client was rebooted, all changes to the file were lost forever. BTW, off-topic, how come that bug report is not word-wrapped (like the comments)? Kind of almost impossible to read...
Created attachment 112056 [details] the program that demonstrates problem The program (still under development) where I first saw the problem. "mkfile -v 100k foo" should create file and allocate disk blocks. "mkfile -nv 100k bar" should only create the fine and not allocate any disk blocks. Too see the problem more clearly, modify memset line to read: memset(buf, '.', len); and run the program as "mkfile -v 100k foo" on NFS client. If you are able to reproduce the problem, "less foo" on the NFS client (linux in my environment) should show the file full of dots. "less foo" on NFS server (solaris 9 box in my environment) should show the file full of nuls. The NFS partition in my case was automounted home directory: automount(pid2356) on /home type autofs (rw,fd=5,pgrp=2356,minproto=2,maxproto=4) nfsserver:/path_to/amilivojevic on /home/amilivojevic type nfs (rw,addr=1.2.3.4)
I've just tested this on Red Hat 7.3 machine running kernel-2.4.20-24.7. On 2.4.20 kernel, everything seems to works correctly. So the bug must have been introduced somewhere in 2.5 or 2.6. Hope this info will help track down where the bug is.
I applied patch from http://marc.theaimsgroup.com/?l=bk-commits-head&m=111094632300379&w=2 to kernel-2.6.9-5.0.3.EL SRPM and rebuilt, rebooted, and rerun mkfile program (version that uses mmap calls). Seems that everything works correctly now. All changed pages were comited to NFS server. I'll leave patched kernel running on my desktop. In case there are any problems, I'll let you know. Hopefully there'll be updated official kernel soon. IMO, this bug makes NFS dangerous to use in production (with 2.6.9-5.0.3.EL kernel).
Just wondering if this fix will be included in forthcomming RHEL 4.1?
yes, it will be in U1.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2005-420.html