| Summary: | md5sum mismatch when files are transferred using vsftpd | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Vikas Gorur <vikas> |
| Component: | write-behind | Assignee: | Raghavendra G <raghavendra> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 3.0.4 | CC: | gluster-bugs, pavan, rabhat, vijay |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Vikas Gorur
2010-05-19 20:59:38 UTC
Appears to be a race: [root@brick5 934]# cmp test.4M test.4M.mnt.run3 test.4M test.4M.mnt.run3 differ: byte 1143921, line 4504 [root@brick5 934]# cmp test.4M test.4M.mnt.run2 [root@brick5 934]# cmp test.4M test.4M.mnt.run1 test.4M test.4M.mnt.run1 differ: byte 2088017, line 8232 test.4M is the original source file, and run[123] are the files that were transferred during 3 successive FTP uploads. For run3: [root@brick5 934]# cmp test.4M test.4M.mnt.run3 test.4M test.4M.mnt.run3 differ: byte 1143921, line 4504 From the server trace: [2010-05-19 17:09:33] N [trace.c:1642:trace_writev] brick1: 776: (*fd=0x2aaaac000fe0, *vector=0x7fff1e47a770, count=1, offset=1143920) For run5: [root@brick5 trendmicro]# cmp test.4M.run5 test.4M test.4M.run5 test.4M differ: byte 315665, line 1211 [2010-05-19 17:28:56] N [trace.c:1642:trace_writev] brick1: 1037: (*fd=0x2aaaac002120, *vector=0x7fff903695b0, count=1, offset=315664) Off-by-1? When a file is transferred using ftp (ftp server is vsftpd version 2.0.5 on CentOS 5.2) with write-behind loaded, the md5sum on the mountpoint does not match the md5sum on the source. It appears that the mismatch only happens for files that are larger than 4MB (window-size in this test is 4MB). c48c9a0c7a162516df5d16c17bd73d78 test.3.99M c48c9a0c7a162516df5d16c17bd73d78 test.3.99M.mnt 4b5232ac6e400872da52f4af71f4159c test.4M cb542550cfb2a72d8343e0f35caaf126 test.4M.mnt Minimum configuration required to reproduce this bug is, fuse->write-behind->replicate->client->server->locks->posix. Following are the causes for the bug: 1. in afr, write is a transaction instead of a single operation. Hence if two writes are sent to afr one after another, there is a possibility of change of their order by the time they leave afr. 2. Maximum size of a write from write-behind is 128KB, hence for a window size > 128KB, there is a possibility of write-behind issuing more than one write (one after another). 3. For files opened with O_APPEND, a file with holes cannot be created, since writes always happen at the end of file. 4. vsftpd always opens files with O_APPEND. Now, since writes can happen out of order, files with holes are created (by the time vsftpd finishes writing to file, these holes will be filled, since holes were created only because of out-of-order writes from afr). With posix opening files with O_APPEND (as passed by vsftpd), writes always happen at the current end of file, instead of happening at their correct offset, thereby causing corruption. As a fix, we should remove O_APPEND from the flags passed to open/creat. on the other hand O_APPEND is redundant, since the offsets are always sent by fuse and we do lseek before doing read/write. After probing further into the bug, we found the problem to be in posix-locks. posix-locks uses address of frame->root (on server side) as 'lock-owner'. Once a lock is granted, the request-frame is unwound and freed. Hence there is a possibility of same address being reused for frame->root in new requests and thereby a new lock request with same lock-owner (frame->root) as that of one of currently held locks being granted (this is because posix-locks grants inode locks for requests having same lock-owner as that of one of currently granted locks). If there are other lock requests issued between the time a lock is granted and its frame->root value is reused, out of order writes can be issued from afr, since lock requests for writes at lesser offsets are still not granted, but the lock request with reused frame->root address is granted. As a fix, posix-locks should be using some value which is guaranteed to be unique across lock-requests for lock-owner. correction: As a fix, posix-locks should be using some value which is guaranteed to be unique across lock-requests for lock-owner, unless the issuer of lock request really wants the same lock-owner value for different lock-requests. Another correction: posix-locks grants inode locks for requests having same lock-owner as that of one of currently granted locks. posix-locks MAY/MAY NOT grant inode lock requests having same lock-owner as that of one of currently granted locks, but if a lock request and one of already granted locks have same lock-owner, they do not conflict with each other. In this particular case, since there can be only one lock on the file (since we are locking entire file - afr locks entire file for files opened with O_APPEND), and its lock-owner is same as that of new request, the new request is granted. PATCH: http://patches.gluster.com/patch/3307 in master (features/locks: Use fuse supplied lock owner even for internal locks.) PATCH: http://patches.gluster.com/patch/3306 in release-3.0 (features/locks: Use fuse supplied lock owner even for internal locks.) PATCH: http://patches.gluster.com/patch/3318 in master (performance/write-behind: explicitly enforce ordering of overlapping writes.) PATCH: http://patches.gluster.com/patch/3319 in release-3.0 (performance/write-behind: explicitly enforce ordering of overlapping writes.) *** Bug 963 has been marked as a duplicate of this bug. *** bug #762695 is surfaced because of patches to write-behind which were supposed to fix this bug. Hence marking #963 as duplicate and reopening this bug. *** Bug 1060 has been marked as a duplicate of this bug. *** kernel compile on latest git-pull of release-3.0 succeeds. I think we can close this bug. PATCH: http://patches.gluster.com/patch/6000 in release-3.0 (performance/write-behind: backport write-behind from 3.1) |