Bug 877293
Summary: | A single brick down of a dist-rep volume results in geo-rep session "faulty" | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Vijaykumar Koppad <vkoppad> | |
Component: | geo-replication | Assignee: | Csaba Henk <csaba> | |
Status: | CLOSED ERRATA | QA Contact: | Vijaykumar Koppad <vkoppad> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 2.0 | CC: | aavati, amarts, bbandari, csaba, pkarampu, rhs-bugs, shaines, surs, vbellur, vshankar | |
Target Milestone: | --- | Keywords: | Reopened | |
Target Release: | --- | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.4.0.14rhs-1 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 959069 (view as bug list) | Environment: | ||
Last Closed: | 2013-09-23 22:29:52 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 850514, 959069 |
Description
Vijaykumar Koppad
2012-11-16 07:29:42 UTC
Should it not be a feature-bug? In geo-rep context, how can we differentiate between the cases when data and when only redundancy is lost due to the brick down? If we can cut back safely on overly cautiousness, that's good, but if we can't, that does not seem to me to be a problem. Is there any spec or actual feature request that the current situation does not comply to? (In reply to comment #2) > Should it not be a feature-bug? In geo-rep context, how can we differentiate > between the cases when data and when only redundancy is lost due to the > brick down? Is the failure due to xtime aggregation? > > If we can cut back safely on overly cautiousness, that's good, but if we > can't, that does not seem to me to be a problem. Is there any spec or actual > feature request that the current situation does not comply to? In general, this violates the high availability that gluster provides. (In reply to comment #3) > Is the failure due to xtime aggregation? I think assert-no-child-down. > In general, this violates the high availability that gluster provides. OK, why I said this should be an enhancement bug, is that I don't see an easy way to fix it, and the behavior is in accordance with what we aimed as of the current implementation. Do you have any idea how to attack this? (In reply to comment #4) > (In reply to comment #3) > > Is the failure due to xtime aggregation? > > I think assert-no-child-down. assert-no-child-down should take effect after all children of distribute node are down. We will need to investigate why assert-no-child-down kicked in when only one of the bricks of a volume with replica count 2 went down. Sorry for spreading confusion, I implied to assert-no-child-down because errno is ENOTCONN, and that usually means the gluster client is terminated, and brick-down + client termination had assert-no-child-down smell. A superficial chain of thought... Indeed, looking into the gluster log (which I missed), it seems to be aggregation. The aggregation logic should be then refined. Maybe it's easy? :)
> Indeed, looking into the gluster log (which I missed), it seems to be
> aggregation. The aggregation logic should be then refined. Maybe it's easy?
> :)
Yeah, maybe an additional flag in local to determine the least number of children on which this operation needs to succeed?
(In reply to comment #7) > > > Indeed, looking into the gluster log (which I missed), it seems to be > > aggregation. The aggregation logic should be then refined. Maybe it's easy? > > :) > > Yeah, maybe an additional flag in local to determine the least number of > children on which this operation needs to succeed? How can you narrow it down to a numeric measure? It's the topology that matters AFAIK... How exactly does DHT manage assert-child-no-down, ie. on what circumstances does it trigger the assertion? Maybe we could use the same logic for aggregation. (In reply to comment #8) > (In reply to comment #7) > > > > > Indeed, looking into the gluster log (which I missed), it seems to be > > > aggregation. The aggregation logic should be then refined. Maybe it's easy? > > > :) > > > > Yeah, maybe an additional flag in local to determine the least number of > > children on which this operation needs to succeed? > > How can you narrow it down to a numeric measure? It's the topology that > matters AFAIK... Since you intend to keep the aggregation logic generic, you can use a numeric measure to determine the topology. For afr, you need at least one reply to succeed. For dht and stripe, you need replies from all STACK_WINDs to succeed. > > How exactly does DHT manage assert-child-no-down, ie. on what circumstances > does it trigger the assertion? Maybe we could use the same logic for > aggregation. DHT manages assert-child-no-down by listening to CHILD_DOWN notification. looks like the client getting terminated is not due to dht getting CHILD_DOWN when one brick is taken down. Did a small experiment with Vijaykumar: 2x2 distributed-replicate volume, killed a brick, and getfattr for trusted.glusterfs.<volume-id>.xtime from a mount point (with client-pid = -1) and got the following (pasted from IRC) 16:15 <vijaykumar> [root@rhs01 client-1]# getfattr -e hex -n trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime f* 16:15 <vijaykumar> f0: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> f1: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> f2: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> f3: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> # file: f4 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800004bf3 16:15 <vijaykumar> f5: trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime: Transport endpoint is not connected 16:15 <vijaykumar> # file: f6 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c92080000543a 16:15 <vijaykumar> # file: f7 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800005942 16:15 <vijaykumar> # file: f8 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800005d74 16:15 <vijaykumar> # file: f9 16:15 <vijaykumar> trusted.glusterfs.40cc0727-84f7-4581-8177-e2be055495d7.xtime=0x512c920800006248 for files which hash to the subvol (where a brick was down) getfattr returns "Transport endpoint is not connected" (which can be seen in the client logs as per comment #1). There should be at least an xtime that is given back to the client. Further, as per comment #1 there is an termination of the client process (but this does not happen in our test). *** This bug has been marked as a duplicate of bug 959069 *** with the newer geo-replication implementation, this is taken care. With newer geo-rep in place, this scenario is now obsolete. In the current scenario if a single brick goes down: * If the setup is replicate - then other node takes care. * If the setup is distribute - that particular gsync session goes faulty. This is the expected behavior. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html |