Created attachment 856464 [details] Test program to test and verify write-behind with open(), seek(), write() and mmap() +++ This bug was initially created as a clone of Bug #1058405 +++ Description of problem: A program that calls mmap() on a newly created sparse file, may receive a SIGBUS signal. If SIGBUS is not handled, a segmentation fault will occur and the program will exit. Version-Release number of selected component (if applicable): glusterfs-3.4.0.44rhs-1.el6rhs.x86_64 How reproducible: If you try hard enough (read: test in a loop), easily Steps to Reproduce: 1. compile the attached program bug-1058405.c: gcc -o bug-1058405 bug-1058405.c 2. mount a volume (single brick is sufficient) 3. run: ./bug-1058405 /path/to/mount/some-file Actual results: The bug-1058405.c returns 0 if the error did not occur, and 1 in case of the error. Additional info: The write-behind translator is used on the client-side (glusterfs-fuse) to optimize writing to the bricks. Multiple small, subsequent (re)writes are combined into bigger writes, which are more efficiently sent over the network. A bug in the write-behind translator can cause the creation of a sparse file created with open(), seek(), write() to be cached. The last write() may not be sent to the server, until write-behind deems this necessary. SIGBUS is a signal that can occur with mmap() when the mmap'd area of a file is located after the end of the file. For example, the following will trigger a SIGBUS: Legend: [ = start of file _ = unallocated space # = allocated bytes in the file ] = end of file [################]________ | | | | '- byte 0 | | '- byte 39 | '- byte 32 '- byte 31 * open() the file, it is 32 bytes big (byte 0-31) * mmap() the file, but use a size of 40 byes (byte 0-39) * read from the memory area returned by mmap() * reading upto byte 31 is expected to work flawlessly * reading after byte 31 should trigger a SIGBUS In the case of creating a file with open(), seek(), write(), the file looks like this: [_______________#] Creating a sparse file this way is not very uncommon. However, the write-behind translator can cache the last write. Normally all outstanding writes are flushed when a read is done on an area cached by the translator. Unfortunately, the write-behind translator did not contain logic to track writes that extend a file when a seek() past the end-of-file was done. Normal writes that extend the file would correctly mark the written range as outstanding, and reading causes the outstanding data to be flushed. In the case of open(), seek(), write(), the range that was skipped in the seek() would not have been marked as outstanding. Reading from this range does not trigger the outstanding writes to be flushed. The brick that receives the read() (translated over the network from mmap()) does not know that the file has been extended, and returns -EINVAL. This error gets transported back from the brick to the glusterfs-fuse client, and translated by the Linux kernel/VFS into SIGBUS triggered by mmap(). Workaround: The write-behind translator has a special handling for the truncate() systemcall. Using open(), seek(), write() is an alternative for doing truncate(). truncate() is more elegant in any case and will not trigger a SIGBUS. It is recommended to create sparse files with truncate().
REVIEW: http://review.gluster.org/6835 (write-behind: track filesize when doing extending writes) posted (#1) for review on master by Niels de Vos (ndevos)
REVIEW: http://review.gluster.org/6835 (write-behind: track filesize when doing extending writes) posted (#2) for review on master by Niels de Vos (ndevos)
REVIEW: http://review.gluster.org/6835 (write-behind: track filesize when doing extending writes) posted (#3) for review on master by Niels de Vos (ndevos)
REVIEW: http://review.gluster.org/6835 (write-behind: track filesize when doing extending writes) posted (#4) for review on master by Niels de Vos (ndevos)
COMMIT: http://review.gluster.org/6835 committed in master by Anand Avati (avati) ------ commit b0515e2a4a08b657ef7e9715fb8c6222c700e78c Author: Niels de Vos <ndevos> Date: Tue Jan 28 10:06:13 2014 +0100 write-behind: track filesize when doing extending writes A program that calls mmap() on a newly created sparse file, may receive a SIGBUS signal. If SIGBUS is not handled, a segmentation fault will occur and the program will exit. A bug in the write-behind translator can cause the creation of a sparse file created with open(), seek(), write() to be cached. The last write() may not be sent to the server, until write-behind deems this necessary. * open(.., O_TRUNC, ...)/creat() the file, it is 0 bytes big * seek() into the file, use offset 31 * write() 1 byte to the file * the range from byte 0-30 are unwritten so called 'sparse' The following illustration tries to capture this: Legend: [ = start of file _ = unallocated/unwritten bytes # = allocated bytes in the file ] = end of file [_______________#] | | '- byte 0 '- byte 31 Without this change, reading from byte 0-30 will return an error, and reading the same area through an mmap()'d pointer will trigger a SIGBUS. Reading from this range did not trigger the outstanding write() to be flushed. The brick that receives the read() (translated over the network from mmap()) does not know that the file has been extended, and returns -EINVAL. This error gets transported back from the brick to the glusterfs-fuse client, and translated by the Linux kernel/VFS into SIGBUS triggered by mmap(). In order to solve this, a new attribute to the wb_inode structure is introduced; the current size of the file. All FOPs that can modify the size, are expected to update wb_inode->size. This makes it possible for extending writes with an offset bigger than EOF to mark the unwritten area as modified/pending. Change-Id: If5ba6646732e6be26568541ea9b12852a5d0b988 BUG: 1058663 Signed-off-by: Niels de Vos <ndevos> Reviewed-on: http://review.gluster.org/6835 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Raghavendra G <rgowdapp> Reviewed-by: Anand Avati <avati>