Bug 1067342
Summary: | [SNAPSHOT]: Restore is successful even when peer glusterd/nodes are down. | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> | |
Component: | snapshot | Assignee: | Avra Sengupta <asengupt> | |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | rhgs-3.0 | CC: | asengupt, josferna, nlevinki, rhs-bugs, rjoseph, sdharane, ssamanta, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.0.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | SNAPSHOT | |||
Fixed In Version: | glusterfs-3.6.0.19-1.el6rhs | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1089906 (view as bug list) | Environment: | ||
Last Closed: | 2014-09-22 19:34:05 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1100282, 1108018, 1108652 | |||
Bug Blocks: | 1089906 |
Description
Rahul Hinduja
2014-02-20 09:32:44 UTC
Marking snapshot BZs to RHS 3.0. Fixed with http://review.gluster.org/7455 Setting flags required to add BZs to RHS 3.0 Errata Cant verify this bug until bz 1100282 is fixed Please move this to ON_QA only when the dependent fix is moved to ON_QA. During verification hit bz 1108652, marking it dependent for verification Cannot verify this bug until bz 1108018 because after restore the version is mismatch and it doesnt handshake so no missed entry once the glusterd is back online. Moving the bug to modified state. Assign back to ON_QA once the dependent bug is fixed. 1100324 is an upstream bug therefore removing from depends on list. 1100282 is the corresponding downstream bug. Verified with build: glusterfs-3.6.0.19-1.el6rhs.x86_64 With server side quorum support the snapshot restore should fail only when glusterd quorum doesnt match else it should be successful. Hence trying the case with 5 node cluster. Case1: When 3/5 machines were brought down, the restore fails as expected with following error message [root@inception ~]# gluster snapshot restore snap1 snapshot restore: failed: glusterds are not in quorum Snapshot command failed [root@inception ~]# Case2: When 2/5 machines were down, restore is successful as expected and the entries are registered into missed_snaps_list as [root@inception ~]# cat /var/lib/glusterd/snaps/missed_snaps_list d7f5e47b-70d8-457e-bce1-615d91c8591e:f17150a5-6099-4995-9e99-5a4fbebe9380=413a77c67519440a865e61ebc283f267:2:/var/run/gluster/snaps/413a77c67519440a865e61ebc283f267/brick2/b0:3:1 b77af951-841b-427e-a7ca-2e9677a896ca:f17150a5-6099-4995-9e99-5a4fbebe9380=413a77c67519440a865e61ebc283f267:4:/var/run/gluster/snaps/413a77c67519440a865e61ebc283f267/brick4/b0:3:1 [root@inception ~]# After glusterd start on one machine: [root@rhs-arch-srv2 ~]# cat /var/lib/glusterd/snaps/missed_snaps_list d7f5e47b-70d8-457e-bce1-615d91c8591e:f17150a5-6099-4995-9e99-5a4fbebe9380=413a77c67519440a865e61ebc283f267:2:/var/run/gluster/snaps/413a77c67519440a865e61ebc283f267/brick2/b0:3:2 b77af951-841b-427e-a7ca-2e9677a896ca:f17150a5-6099-4995-9e99-5a4fbebe9380=413a77c67519440a865e61ebc283f267:4:/var/run/gluster/snaps/413a77c67519440a865e61ebc283f267/brick4/b0:3:1 [root@rhs-arch-srv2 ~]# After glusterd start on another machine: [root@rhs-arch-srv4 ~]# cat /var/lib/glusterd/snaps/missed_snaps_list d7f5e47b-70d8-457e-bce1-615d91c8591e:f17150a5-6099-4995-9e99-5a4fbebe9380=413a77c67519440a865e61ebc283f267:2:/var/run/gluster/snaps/413a77c67519440a865e61ebc283f267/brick2/b0:3:2 b77af951-841b-427e-a7ca-2e9677a896ca:f17150a5-6099-4995-9e99-5a4fbebe9380=413a77c67519440a865e61ebc283f267:4:/var/run/gluster/snaps/413a77c67519440a865e61ebc283f267/brick4/b0:3:2 [root@rhs-arch-srv4 ~]# When all the machines are up and running, then the restore happens on the earlier brought down machines and all nodes in cluster are in sync. Moving the bug to verified state. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |