151284 – mmap of file over NFS corrupts data

Bug 151284 - mmap of file over NFS corrupts data

Summary: mmap of file over NFS corrupts data

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	4.0
Hardware:	i386
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Steve Dickson
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	137160
TreeView+	depends on / blocked

Reported:	2005-03-16 17:53 UTC by Aleksandar Milivojevic
Modified:	2007-11-30 22:07 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-06-08 15:13:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
the program that demonstrates problem (2.67 KB, text/x-csrc) 2005-03-16 19:21 UTC, Aleksandar Milivojevic	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2005:420	0	normal	SHIPPED_LIVE	Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 1	2005-06-08 04:00:00 UTC

Description Aleksandar Milivojevic 2005-03-16 17:53:58 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050301 Firefox/1.0.1 Red Hat/1.0.1-1.4.3.centos4.1

Description of problem:
I wrote small program which implements mkfile (from Solaris) using mmap system call.  Just as Solaris mkfile, my program can either create sparse file, or it can fill the file with zeros (allocating actualy disk blocks).  The later case is done from a loop using mmap system call followed by memset and munmap.  If the file is on local file system, this works great.  If it is on NFS mounted partition, data corruption occurs.  In case of NFS, modified file blocks are not sent back to the NFS server (Solaris 9 host).  They exist only in local machine's cache.

The simplified code itslef looks something like bellow.  Error checking and some other stuff removed.  The mlen variable is initialized before hand to value that is multiple of memory page size as obtained by getpagesize() call (still experimenting to find optimal value for it, first idea was to set it to 2^30, but something tells me this might cause lot of swapping).

fd = open64(argv[i], O_RDWR | O_CREAT | O_TRUNC | O_LARGEFILE, 0666);
pwrite64(fd, "\0", 1, size - 1)
offset = 0;
while (offset < size) {
  len = (size_t) (size - offset >= mlen ? mlen : size - offset);
  buf = mmap(NULL, mlen, PROT_WRITE, MAP_SHARED, fd, offset)) == MAP_FAILED)
  memset(buf, 0, len);
  munmap(buf, mlen);
  offset = offset + len;
}
close(fd);

If the target file is on NFS mounted partition, after running the program and doing "du -sk" on the file, it reports that file is almost empty (basically, sparse file was created).

To prove the point, I changed memset line to read memset(buf, '.', len), and than created 100kB file.  If I do "less filename" on NFS client, it shows file full of dots.  If I do the same on NFS server, it shows file full of nulls.  This doesn't seem to be synchronization issue.  No matter how long I wait, file blocks are not commited to NFS server.  Doing "sync" on client doesn't help either.

Version-Release number of selected component (if applicable):
kernel-2.6.9-5.0.3.EL

How reproducible:
Always

Steps to Reproduce:
1. create and mmap sparse file on NFS mounted partition
2. change data in mmaped region (that was sparse)
3. unmmap and close the file
4. changes are not comminted to NFS server


Additional info:

Comment 1 Rik van Riel 2005-03-16 18:02:59 UTC

Steve,

this is probably a NOTABUG, but it would be nice if you could verify that ;)

Comment 2 Aleksandar Milivojevic 2005-03-16 18:04:58 UTC

One addtional note.  Depending on the size of file I'm creating, some blocks do
make it to the NFS server.  For example, if I create 100kB file, no blocks seems
to be commited to the NFS server.  If I create 100MB file, some blocks do make
it to actual disk storage ("du -sk filename" shows that file uses 11MB of disk
space, so I'm still missing like 89MB).  This is probably dependent on client
machine's RAM size and how much of it is free for caching.

Comment 3 Rik van Riel 2005-03-16 18:10:20 UTC

Thinking about it some more - the data should be written to the server some 30
seconds after the munmap.  While the file is mmaped nothing needs to be written
according to POSIX, but after the unmap the pages are marked dirty and the dirty
file data flushing code should kick in.

Aleksandar, does the data get written out to the server after a few minutes, or
does it not get written at all ?

Comment 4 Aleksandar Milivojevic 2005-03-16 18:14:06 UTC

Well, I waited for almost half an hour, and it was not written to the server. 
"du -sk filename" on both client and server returns same numbers (as if the file
is sparse).  However, when reading the file on the client, I can see the data.

I will reboot the client, and see if that will flush the data.  Will report back
in couple of minutes.

Comment 5 Aleksandar Milivojevic 2005-03-16 18:28:56 UTC

After the client was rebooted, all changes to the file were lost forever.

BTW, off-topic, how come that bug report is not word-wrapped (like the
comments)?  Kind of almost impossible to read...

Comment 6 Aleksandar Milivojevic 2005-03-16 19:21:47 UTC

Created attachment 112056 [details]
the program that demonstrates problem

The program (still under development) where I first saw the problem.  "mkfile
-v 100k foo" should create file and allocate disk blocks.  "mkfile -nv 100k
bar" should only create the fine and not allocate any disk blocks.

Too see the problem more clearly, modify memset line to read:

memset(buf, '.', len);

and run the program as "mkfile -v 100k foo" on NFS client.  If you are able to
reproduce the problem, "less foo" on the NFS client (linux in my environment)
should show the file full of dots.  "less foo" on NFS server (solaris 9 box in
my environment) should show the file full of nuls.  The NFS partition in my
case was automounted home directory:

automount(pid2356) on /home type autofs
(rw,fd=5,pgrp=2356,minproto=2,maxproto=4)
nfsserver:/path_to/amilivojevic on /home/amilivojevic type nfs
(rw,addr=1.2.3.4)

Comment 7 Aleksandar Milivojevic 2005-03-21 14:53:00 UTC

I've just tested this on Red Hat 7.3 machine running kernel-2.4.20-24.7.  On
2.4.20 kernel, everything seems to works correctly.  So the bug must have been
introduced somewhere in 2.5 or 2.6.  Hope this info will help track down where
the bug is.

Comment 9 Aleksandar Milivojevic 2005-03-22 19:27:49 UTC

I applied patch from

http://marc.theaimsgroup.com/?l=bk-commits-head&m=111094632300379&w=2

to kernel-2.6.9-5.0.3.EL SRPM and rebuilt, rebooted, and rerun mkfile program
(version that uses mmap calls).  Seems that everything works correctly now.  All
changed pages were comited to NFS server.  I'll leave patched kernel running on
my desktop.  In case there are any problems, I'll let you know.

Hopefully there'll be updated official kernel soon.  IMO, this bug makes NFS
dangerous to use in production (with 2.6.9-5.0.3.EL kernel).

Comment 12 Aleksandar Milivojevic 2005-05-09 15:56:46 UTC

Just wondering if this fix will be included in forthcomming RHEL 4.1?

Comment 13 Dave Jones 2005-05-10 21:56:00 UTC

yes, it will be in U1.

Comment 14 Tim Powers 2005-06-08 15:13:59 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-420.html

Note You need to log in before you can comment on or make changes to this bug.