Bug 1462210
Summary: | [RFE][GSS]Geo-replication skip the deletion of files if a slave subvolume was down | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Riyas Abdulrasak <rnalakka> |
Component: | geo-replication | Assignee: | Kotresh HR <khiremat> |
Status: | CLOSED WONTFIX | QA Contact: | Rahul Hinduja <rhinduja> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | rhgs-3.3 | CC: | amukherj, atumball, bkunal, csaba, khiremat, rgowdapp, rhs-bugs, rnalakka, storage-qa-internal |
Target Milestone: | --- | Keywords: | FutureFeature, ZStream |
Target Release: | --- | Flags: | khiremat:
needinfo-
|
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-12-14 04:30:50 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1408949, 1468976 |
Description
Riyas Abdulrasak
2017-06-16 12:30:53 UTC
I think this is the expected behavior if all the bricks of a subvolume is down and the file is hashed to that subvolume. Simple example: gluster volume create gv1 node1:/bricks/b1 node2:/bricks/b2 force gluster volume start gv1 mount -t glusterfs localhost:gv1 /mnt/gv1 echo "Hello World" > /mnt/gv1/f1 Check in backend and kill the brick where it is hashed. Then try to access the file or delete the file. We always get "rm: cannot remove 'f1': No such file or directory". Geo-rep thinks that file is already deleted and proceeds without logging. There is no way to differentiate from Geo-replication if the file is already deleted or subvolume is down. I think DHT can be enhanced to return different error code if the subvolume is down. Adding Raghavendra to check the possibility from DHT to differentiate these errors. This falls under the category of issues where required quorum for high availability is not available in the cluster. In that case, such behaviors are expected. The problem in marking the whole session as faulty would make other files from syncing too, which we believe is not an expected behavior. We recommend to close this bug as WONTFIX (or NOTABUG as product's failures are expected when quorum is not met). To answer Aravinda's comment, DHT can never respond to the higher layer properly when the nodes are not reachable. One possible thing is it should return ENOTCONN (not connected) instead ENOENT (not found) as that is the proper error at that time. Let us know what everybody think. (In reply to Amar Tumballi from comment #5) > This falls under the category of issues where required quorum for high > availability is not available in the cluster. In that case, such behaviors > are expected. > > The problem in marking the whole session as faulty would make other files > from syncing too, which we believe is not an expected behavior. We recommend > to close this bug as WONTFIX (or NOTABUG as product's failures are expected > when quorum is not met). > > To answer Aravinda's comment, DHT can never respond to the higher layer > properly when the nodes are not reachable. > > One possible thing is it should return ENOTCONN (not connected) instead > ENOENT (not found) as that is the proper error at that time. > > > Let us know what everybody think. I am in for closing this as WONTFIX and may document somewhere known or expected behaviour. Geo-rep can't do anything about when the only available subvolume is down. |