Bug 1085278
| Summary: | [SNAPSHOT]: After adding a new peer to the cluster, gluster snasphot create ,delete ,restore gives Post Validation error message | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | senaik |
| Component: | snapshot | Assignee: | Avra Sengupta <asengupt> |
| Status: | CLOSED ERRATA | QA Contact: | senaik |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | rhgs-3.0 | CC: | asengupt, josferna, rhinduja, rhs-bugs, rjoseph, ssamanta, storage-qa-internal, vagarwal, vmallika |
| Target Milestone: | --- | ||
| Target Release: | RHGS 3.0.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | SNAPSHOT | ||
| Fixed In Version: | glusterfs-3.6.0.12-1.el6rhs | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2014-09-22 19:35:20 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1112250 | ||
| Bug Blocks: | |||
Your comment was: Some more issues after adding a new peer to the cluster : ======================================================== 1)Snap-Delete gluster snapshot delete snap_vol2_35 Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y snapshot delete: failed: Post Validation failed on 10.70.42.227. Please check log file for details. Snapshot command failed Snapshot is deleted, but with "Post validation" error message shown 3)Snap-restore gluster v stop vol2 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: vol2: success [root@snapshot-02 ~]# gluster snapshot restore snappy snapshot restore: failed: Commit failed on 10.70.42.227. Please check log file for details. Snapshot command failed [root@snapshot-02 ~]# gluster v start vol2 volume start: vol2: success Mounted the restored volume and checked for files. Restore is successful but throws Post Validation error gluster v status vol2 Status of volume: vol2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.43.74:/var/run/gluster/snaps/3752f9b2d53d46 3f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ced 3d35328f21-brick/b2 49296 Y 13933 Brick 10.70.43.151:/var/run/gluster/snaps/3752f9b2d53d4 63f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ce d3d35328f21-brick/b2 49296 Y 21261 Brick 10.70.43.32:/var/run/gluster/snaps/3752f9b2d53d46 3f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ced 3d35328f21-brick/b2 49295 Y 22599 Brick 10.70.43.71:/var/run/gluster/snaps/3752f9b2d53d46 3f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ced 3d35328f21-brick/b2 49295 Y 24348 NFS Server on localhost 2049 Y 21273 Self-heal Daemon on localhost N/A Y 21280 NFS Server on 10.70.43.32 2049 Y 22611 Self-heal Daemon on 10.70.43.32 N/A Y 22618 NFS Server on 10.70.42.227 2049 Y 20270 Self-heal Daemon on 10.70.42.227 N/A Y 20277 NFS Server on 10.70.43.71 2049 Y 24360 Self-heal Daemon on 10.70.43.71 N/A Y 24367 NFS Server on 10.70.43.74 2049 Y 13945 Self-heal Daemon on 10.70.43.74 N/A Y 13952 Task Status of Volume vol2 ------------------------------------------------------------------------------ There are no active volume tasks Marking snapshot BZs to RHS 3.0. Fixed with http://review.gluster.org/7525 Version : glusterfs 3.6.0.20 built on Jun 19 2014 ======== Marking this bug as a dependant of bz 1104478 , as we are getting the error message "glusterd quorum not met" when a new node is attached to the cluster. snapshot create: success: Snap snap4 created successfully snapshot create: failed: glusterds are not in quorum Snapshot command failed snapshot create: success: Snap snap6 created successfully All glusterds were up and running on the nodes , but still we get the message that glusterd quorum is not met. ----------------Part of log--------------------- name:snapshot15.lab.eng.blr.redhat.com [2014-06-23 06:03:31.887252] I [glusterd-handler.c:2522:__glusterd_handle_friend_update] 0-: Received uuid: 7e97d0f0-8ae9-40eb-b822-952cc5a8dc46, host name:10.70.44.54 [2014-06-23 06:03:32.166226] W [glusterd-utils.c:12909:glusterd_snap_quorum_check_for_create] 0-management: glusterds are not in quorum [2014-06-23 06:03:32.166352] W [glusterd-utils.c:13058:glusterd_snap_quorum_check] 0-management: Quorum checkfailed during snapshot create command [2014-06-23 06:03:32.166374] W [glusterd-mgmt.c:1846:glusterd_mgmt_v3_initiate_snap_phases] 0-management: quorum check failed [2014-06-23 06:03:32.166416] W [glusterd-snapshot.c:7012:glusterd_snapshot_postvalidate] 0-management: Snapshot create post-validation failed [2014-06-23 06:03:32.166433] W [glusterd-mgmt.c:248:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed [2014-06-23 06:03:32.166451] E [glusterd-mgmt.c:1335:glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for operation Snapshot on local node [2014-06-23 06:03:32.166467] E [glusterd-mgmt.c:1944:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation Failed [2014-06-23 06:03:33.972792] I [glusterd-handshake.c:1014:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30000 Raised a new bz to track the issue mentioned in Comment 7 . Marking this bug as dependent of bz 1112250 Removing 1114403 from the dependency list, as it's a clone of 1112250. I verified this bug executing the steps mentioned in the descrioption and didn't find any issues:
Created a 2 x 2 volume :
[root@snapshot-01 ~]# gluster pool list
UUID Hostname State
bd1f458d-09cf-481d-a0b8-dff4a8afb8d0 10.70.42.209 Disconnected
a90793ca-58a4-429e-b39b-5ad1b88dafa7 localhost Connected
[root@snapshot-01 ~]# gluster v i
Volume Name: vol1
Type: Distributed-Replicate
Volume ID: ad2a01be-c045-412e-9c84-0696492beb19
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: s1:/rhs/brick1/dir
Brick2: s3:/brick0/dir
Brick3: s1:/rhs/brick2/dir
Brick4: s3:/brick1/dir
Options Reconfigured:
features.barrier: disable
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256
From one terminal started snapshot creation in loop:
[root@snapshot-03 ~]# for i in {21..30}; do gluster snap create snap$i vol1; done
snapshot create: success: Snap snap21 created successfully
snapshot create: success: Snap snap22 created successfully
snapshot create: success: Snap snap23 created successfully
snapshot create: success: Snap snap24 created successfully
snapshot create: success: Snap snap25 created successfully
snapshot create: success: Snap snap26 created successfully
snapshot create: success: Snap snap27 created successfully
snapshot create: success: Snap snap28 created successfully
snapshot create: success: Snap snap29 created successfully
snapshot create: success: Snap snap30 created successfully
From another terminal attached a new peer
[root@snapshot-03 ~]# gluster peer probe s4
peer probe: success.
[root@snapshot-03 ~]# gluster pool list
UUID Hostname State
a90793ca-58a4-429e-b39b-5ad1b88dafa7 10.70.42.16 Connected
f1c5bfa4-997a-4c7e-990e-a45e68bb3c11 s4 Connected
bd1f458d-09cf-481d-a0b8-dff4a8afb8d0 localhost Connected
Marking the bug as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html |
Description of problem: ======================= While snapshot creation is in progress, attach a new peer to the cluster. Snapshot creation gives the following message: snapshot create: failed: Post Validation failed on 10.70.42.227. Please check log file for details. Snapshot command failed Version-Release number of selected component (if applicable): ============================================================ glusterfs 3.4.1.7.snap.mar27.2014git How reproducible: Steps to Reproduce: ================== 1.Create a dist-repl volume and start it 2.Fuse and NFS mount the volume and create some files 3.Create snapshots on the volume. While snapshot creation is in progress, from another node attach a new peer to the cluster. gluster peer probe 10.70.42.227 peer probe: success. [root@snapshot-02 ~]# gluster peer status Number of Peers: 4 Hostname: 10.70.43.74 Uuid: 97c4e585-0915-4a97-b610-79b10d7978e4 State: Peer in Cluster (Connected) Hostname: 10.70.43.32 Uuid: 6cee10cf-5745-43f9-8e2d-df494fee3544 State: Peer in Cluster (Connected) Hostname: 10.70.43.71 Uuid: 6aa084d3-9c8e-496c-afae-15144327ff22 State: Peer in Cluster (Connected) Hostname: 10.70.42.227 Uuid: f838e4ce-04ea-4cb5-858c-1c1a9d672649 State: Peer in Cluster (Connected) for i in {1..100} ; do gluster snapshot create snap_vol2_$i vol2 ; done snapshot create: snap_vol2_1: snap created successfully snapshot create: snap_vol2_2: snap created successfully snapshot create: snap_vol2_3: snap created successfully snapshot create: snap_vol2_4: snap created successfully snapshot create: snap_vol2_5: snap created successfully snapshot create: failed: Post Validation failed on 10.70.42.227. Please check log file for details. Snapshot command failed snapshot create: failed: Post Validation failed on 10.70.42.227. Please check log file for details. Snapshot command failed Snapshots are created successfully, but we get post validation failed on the newly added peer . gluster snapshot list vol2 snap_vol2_1 snap_vol2_2 snap_vol2_3 snap_vol2_4 snap_vol2_5 snap_vol2_6 snap_vol2_7 snap_vol2_8 snap_vol2_9 snap_vol2_10 snap_vol2_11 snap_vol2_12 Actual results: ============== After probe is successful , snapshot creation gives "Post Validation failed on the newly added peer" Expected results: ================ While snapshot creation is in progress, and a new peer is attached to the cluster, snap creation should continue successfully with no error message shown Additional info: