Description of problem: ======================= While snapshot creation is in progress, attach a new peer to the cluster. Snapshot creation gives the following message: snapshot create: failed: Post Validation failed on 10.70.42.227. Please check log file for details. Snapshot command failed Version-Release number of selected component (if applicable): ============================================================ glusterfs 3.4.1.7.snap.mar27.2014git How reproducible: Steps to Reproduce: ================== 1.Create a dist-repl volume and start it 2.Fuse and NFS mount the volume and create some files 3.Create snapshots on the volume. While snapshot creation is in progress, from another node attach a new peer to the cluster. gluster peer probe 10.70.42.227 peer probe: success. [root@snapshot-02 ~]# gluster peer status Number of Peers: 4 Hostname: 10.70.43.74 Uuid: 97c4e585-0915-4a97-b610-79b10d7978e4 State: Peer in Cluster (Connected) Hostname: 10.70.43.32 Uuid: 6cee10cf-5745-43f9-8e2d-df494fee3544 State: Peer in Cluster (Connected) Hostname: 10.70.43.71 Uuid: 6aa084d3-9c8e-496c-afae-15144327ff22 State: Peer in Cluster (Connected) Hostname: 10.70.42.227 Uuid: f838e4ce-04ea-4cb5-858c-1c1a9d672649 State: Peer in Cluster (Connected) for i in {1..100} ; do gluster snapshot create snap_vol2_$i vol2 ; done snapshot create: snap_vol2_1: snap created successfully snapshot create: snap_vol2_2: snap created successfully snapshot create: snap_vol2_3: snap created successfully snapshot create: snap_vol2_4: snap created successfully snapshot create: snap_vol2_5: snap created successfully snapshot create: failed: Post Validation failed on 10.70.42.227. Please check log file for details. Snapshot command failed snapshot create: failed: Post Validation failed on 10.70.42.227. Please check log file for details. Snapshot command failed Snapshots are created successfully, but we get post validation failed on the newly added peer . gluster snapshot list vol2 snap_vol2_1 snap_vol2_2 snap_vol2_3 snap_vol2_4 snap_vol2_5 snap_vol2_6 snap_vol2_7 snap_vol2_8 snap_vol2_9 snap_vol2_10 snap_vol2_11 snap_vol2_12 Actual results: ============== After probe is successful , snapshot creation gives "Post Validation failed on the newly added peer" Expected results: ================ While snapshot creation is in progress, and a new peer is attached to the cluster, snap creation should continue successfully with no error message shown Additional info:
http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/snapshots/1085278/
Your comment was: Some more issues after adding a new peer to the cluster : ======================================================== 1)Snap-Delete gluster snapshot delete snap_vol2_35 Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y snapshot delete: failed: Post Validation failed on 10.70.42.227. Please check log file for details. Snapshot command failed Snapshot is deleted, but with "Post validation" error message shown 3)Snap-restore gluster v stop vol2 Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: vol2: success [root@snapshot-02 ~]# gluster snapshot restore snappy snapshot restore: failed: Commit failed on 10.70.42.227. Please check log file for details. Snapshot command failed [root@snapshot-02 ~]# gluster v start vol2 volume start: vol2: success Mounted the restored volume and checked for files. Restore is successful but throws Post Validation error gluster v status vol2 Status of volume: vol2 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.43.74:/var/run/gluster/snaps/3752f9b2d53d46 3f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ced 3d35328f21-brick/b2 49296 Y 13933 Brick 10.70.43.151:/var/run/gluster/snaps/3752f9b2d53d4 63f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ce d3d35328f21-brick/b2 49296 Y 21261 Brick 10.70.43.32:/var/run/gluster/snaps/3752f9b2d53d46 3f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ced 3d35328f21-brick/b2 49295 Y 22599 Brick 10.70.43.71:/var/run/gluster/snaps/3752f9b2d53d46 3f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ced 3d35328f21-brick/b2 49295 Y 24348 NFS Server on localhost 2049 Y 21273 Self-heal Daemon on localhost N/A Y 21280 NFS Server on 10.70.43.32 2049 Y 22611 Self-heal Daemon on 10.70.43.32 N/A Y 22618 NFS Server on 10.70.42.227 2049 Y 20270 Self-heal Daemon on 10.70.42.227 N/A Y 20277 NFS Server on 10.70.43.71 2049 Y 24360 Self-heal Daemon on 10.70.43.71 N/A Y 24367 NFS Server on 10.70.43.74 2049 Y 13945 Self-heal Daemon on 10.70.43.74 N/A Y 13952 Task Status of Volume vol2 ------------------------------------------------------------------------------ There are no active volume tasks
Marking snapshot BZs to RHS 3.0.
Fixed with http://review.gluster.org/7525
Version : glusterfs 3.6.0.20 built on Jun 19 2014 ======== Marking this bug as a dependant of bz 1104478 , as we are getting the error message "glusterd quorum not met" when a new node is attached to the cluster. snapshot create: success: Snap snap4 created successfully snapshot create: failed: glusterds are not in quorum Snapshot command failed snapshot create: success: Snap snap6 created successfully All glusterds were up and running on the nodes , but still we get the message that glusterd quorum is not met. ----------------Part of log--------------------- name:snapshot15.lab.eng.blr.redhat.com [2014-06-23 06:03:31.887252] I [glusterd-handler.c:2522:__glusterd_handle_friend_update] 0-: Received uuid: 7e97d0f0-8ae9-40eb-b822-952cc5a8dc46, host name:10.70.44.54 [2014-06-23 06:03:32.166226] W [glusterd-utils.c:12909:glusterd_snap_quorum_check_for_create] 0-management: glusterds are not in quorum [2014-06-23 06:03:32.166352] W [glusterd-utils.c:13058:glusterd_snap_quorum_check] 0-management: Quorum checkfailed during snapshot create command [2014-06-23 06:03:32.166374] W [glusterd-mgmt.c:1846:glusterd_mgmt_v3_initiate_snap_phases] 0-management: quorum check failed [2014-06-23 06:03:32.166416] W [glusterd-snapshot.c:7012:glusterd_snapshot_postvalidate] 0-management: Snapshot create post-validation failed [2014-06-23 06:03:32.166433] W [glusterd-mgmt.c:248:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed [2014-06-23 06:03:32.166451] E [glusterd-mgmt.c:1335:glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for operation Snapshot on local node [2014-06-23 06:03:32.166467] E [glusterd-mgmt.c:1944:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation Failed [2014-06-23 06:03:33.972792] I [glusterd-handshake.c:1014:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30000
Raised a new bz to track the issue mentioned in Comment 7 . Marking this bug as dependent of bz 1112250
Removing 1114403 from the dependency list, as it's a clone of 1112250.
I verified this bug executing the steps mentioned in the descrioption and didn't find any issues: Created a 2 x 2 volume : [root@snapshot-01 ~]# gluster pool list UUID Hostname State bd1f458d-09cf-481d-a0b8-dff4a8afb8d0 10.70.42.209 Disconnected a90793ca-58a4-429e-b39b-5ad1b88dafa7 localhost Connected [root@snapshot-01 ~]# gluster v i Volume Name: vol1 Type: Distributed-Replicate Volume ID: ad2a01be-c045-412e-9c84-0696492beb19 Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: s1:/rhs/brick1/dir Brick2: s3:/brick0/dir Brick3: s1:/rhs/brick2/dir Brick4: s3:/brick1/dir Options Reconfigured: features.barrier: disable performance.readdir-ahead: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 From one terminal started snapshot creation in loop: [root@snapshot-03 ~]# for i in {21..30}; do gluster snap create snap$i vol1; done snapshot create: success: Snap snap21 created successfully snapshot create: success: Snap snap22 created successfully snapshot create: success: Snap snap23 created successfully snapshot create: success: Snap snap24 created successfully snapshot create: success: Snap snap25 created successfully snapshot create: success: Snap snap26 created successfully snapshot create: success: Snap snap27 created successfully snapshot create: success: Snap snap28 created successfully snapshot create: success: Snap snap29 created successfully snapshot create: success: Snap snap30 created successfully From another terminal attached a new peer [root@snapshot-03 ~]# gluster peer probe s4 peer probe: success. [root@snapshot-03 ~]# gluster pool list UUID Hostname State a90793ca-58a4-429e-b39b-5ad1b88dafa7 10.70.42.16 Connected f1c5bfa4-997a-4c7e-990e-a45e68bb3c11 s4 Connected bd1f458d-09cf-481d-a0b8-dff4a8afb8d0 localhost Connected Marking the bug as verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2014-1278.html