Bug 1152903
Summary: | Rebalance on a dispersed volume produces multiple errors in logs | ||
---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Xavi Hernandez <jahernan> |
Component: | disperse | Assignee: | Xavi Hernandez <jahernan> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.6.0 | CC: | bugs, gluster-bugs, nbalacha |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | glusterfs-3.6.1 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | 1152902 | Environment: | |
Last Closed: | 2014-11-10 15:14:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1152902 | ||
Bug Blocks: | 1117822 |
Description
Xavi Hernandez
2014-10-15 07:43:07 UTC
Hi Xavier, Can you attach the gluster logs and volume details ? Thanks, Nithya REVIEW: http://review.gluster.org/8948 (ec: Fix rebalance issues) posted (#1) for review on release-3.6 by Xavier Hernandez (xhernandez) REVIEW: http://review.gluster.org/8948 (ec: Fix rebalance issues) posted (#2) for review on release-3.6 by Xavier Hernandez (xhernandez) REVIEW: http://review.gluster.org/8948 (ec: Fix rebalance issues) posted (#3) for review on release-3.6 by Xavier Hernandez (xhernandez) COMMIT: http://review.gluster.org/8948 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit 408d4454870e7374bc9c4060c6c23d224d5174a2 Author: Xavier Hernandez <xhernandez> Date: Wed Oct 15 09:44:55 2014 +0200 ec: Fix rebalance issues Some issues in ec xlator made that rebalance didn't complete successfully and generated some warnings and errors in the log. The most critical error was a race condition that caused false corruption detection when two specific operations were executed sequentially and they shared the same lock. This explains the problem: 1. A setxattr is issued. 2. setxattr: ec locks the inode before updating the xattr. 3. setxattr: The xattr is updated. 4. setxattr: Upper xlator is notified that the operation completed. 5. setxattr: A background task is initiated to update the version of the file. 6. A stat is issued on the same file. 7. stat: Since the lock is already acquired, it's reused. 8. stat: A lookup is issued to determine version and size information of the file. At this point, operations 5 and 8 can interfere. This can make that lookup sees different information on each brick, determining that some bricks are corrupted and incorrectly excluding them from the operation and initiating a self-heal. In some cases this false detection combined with self-heal could lead to invalid updates of the trusted.ec.size xattr, leaving the file smaller than it should be. This only happens if the first operation does not perform a lookup, because chained operations reuse the information returned by the previous one, avoiding this kind of problems. To solve this, now the background update is executed atomically with the posterior unlock. This avoids some reuses of the lock while updating. However this reduces performance because the window in which new requests can reuse the lock is much smaller now. This has been alleviated by using the same technique implemented in AFR (i.e. waiting some time before releasing the lock). Some minor changes also introduced in this patch: * Bug in management of 'trusted.glusterfs.pathinfo' that was writing beyond the allocated space. * Uninitialized variable. * trusted.ec.config was not created for regular files created with mknod. * An invalid state was used in access fop. This is a backport of http://review.gluster.org/8947/ Change-Id: Idfaf69578ed04dbac97a62710326729715b9b395 BUG: 1152903 Signed-off-by: Xavier Hernandez <xhernandez> Reviewed-on: http://review.gluster.org/8948 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Dan Lambright <dlambrig> This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users |