Bug 1085278 - [SNAPSHOT]: After adding a new peer to the cluster, gluster snasphot create ,delete ,restore gives Post Validation error message
Summary: [SNAPSHOT]: After adding a new peer to the cluster, gluster snasphot create ,...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: snapshot
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: RHGS 3.0.0
Assignee: Avra Sengupta
QA Contact: senaik
URL:
Whiteboard: SNAPSHOT
Depends On: 1112250
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-08 09:32 UTC by senaik
Modified: 2016-09-17 12:52 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.6.0.12-1.el6rhs
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-09-22 19:35:20 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2014:1278 0 normal SHIPPED_LIVE Red Hat Storage Server 3.0 bug fix and enhancement update 2014-09-22 23:26:55 UTC

Description senaik 2014-04-08 09:32:04 UTC
Description of problem:
=======================
While snapshot creation is in progress, attach a new peer to the cluster. Snapshot creation gives the following message: 

snapshot create: failed: Post Validation failed on 10.70.42.227. Please check log file for details.
Snapshot command failed


Version-Release number of selected component (if applicable):
============================================================
glusterfs 3.4.1.7.snap.mar27.2014git


How reproducible:


Steps to Reproduce:
==================
1.Create a dist-repl volume and start it

2.Fuse and NFS mount the volume and create some files

3.Create snapshots on the volume. While snapshot creation is in progress, from another node attach a new peer to the cluster.

gluster peer probe 10.70.42.227
peer probe: success. 

[root@snapshot-02 ~]# gluster peer status
Number of Peers: 4

Hostname: 10.70.43.74
Uuid: 97c4e585-0915-4a97-b610-79b10d7978e4
State: Peer in Cluster (Connected)

Hostname: 10.70.43.32
Uuid: 6cee10cf-5745-43f9-8e2d-df494fee3544
State: Peer in Cluster (Connected)

Hostname: 10.70.43.71
Uuid: 6aa084d3-9c8e-496c-afae-15144327ff22
State: Peer in Cluster (Connected)

Hostname: 10.70.42.227
Uuid: f838e4ce-04ea-4cb5-858c-1c1a9d672649
State: Peer in Cluster (Connected)


for i in {1..100} ; do gluster snapshot create snap_vol2_$i vol2 ; done
snapshot create: snap_vol2_1: snap created successfully
snapshot create: snap_vol2_2: snap created successfully
snapshot create: snap_vol2_3: snap created successfully
snapshot create: snap_vol2_4: snap created successfully
snapshot create: snap_vol2_5: snap created successfully
snapshot create: failed: Post Validation failed on 10.70.42.227. Please check log file for details.
Snapshot command failed
snapshot create: failed: Post Validation failed on 10.70.42.227. Please check log file for details.
Snapshot command failed

Snapshots are created successfully, but we get post validation failed on the newly added peer .

gluster snapshot list vol2
snap_vol2_1
snap_vol2_2
snap_vol2_3
snap_vol2_4
snap_vol2_5
snap_vol2_6
snap_vol2_7
snap_vol2_8
snap_vol2_9
snap_vol2_10
snap_vol2_11
snap_vol2_12


Actual results:
==============
After probe is successful , snapshot creation gives "Post Validation failed on the newly added peer" 


Expected results:
================
While snapshot creation is in progress, and a new peer is attached to the cluster, snap creation should continue successfully with no error message shown 


Additional info:

Comment 3 senaik 2014-04-08 10:19:57 UTC
Your comment was:
Some more issues after adding a new peer to the cluster :
========================================================
1)Snap-Delete 

 gluster snapshot delete snap_vol2_35
Deleting snap will erase all the information about the snap. Do you still want to continue? (y/n) y
snapshot delete: failed: Post Validation failed on 10.70.42.227. Please check log file for details.
Snapshot command failed

Snapshot is deleted, but with "Post validation" error message shown

3)Snap-restore

 gluster v stop vol2
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: vol2: success

[root@snapshot-02 ~]# gluster snapshot restore snappy 
snapshot restore: failed: Commit failed on 10.70.42.227. Please check log file for details.
Snapshot command failed

[root@snapshot-02 ~]# gluster v start vol2
volume start: vol2: success

Mounted the restored volume and checked for files. Restore is successful but throws Post Validation error

gluster v status vol2
Status of volume: vol2
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick 10.70.43.74:/var/run/gluster/snaps/3752f9b2d53d46
3f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ced
3d35328f21-brick/b2					49296	Y	13933
Brick 10.70.43.151:/var/run/gluster/snaps/3752f9b2d53d4
63f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ce
d3d35328f21-brick/b2					49296	Y	21261
Brick 10.70.43.32:/var/run/gluster/snaps/3752f9b2d53d46
3f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ced
3d35328f21-brick/b2					49295	Y	22599
Brick 10.70.43.71:/var/run/gluster/snaps/3752f9b2d53d46
3f8f0ced3d35328f21/dev-VolGroup0-3752f9b2d53d463f8f0ced
3d35328f21-brick/b2					49295	Y	24348
NFS Server on localhost					2049	Y	21273
Self-heal Daemon on localhost				N/A	Y	21280
NFS Server on 10.70.43.32				2049	Y	22611
Self-heal Daemon on 10.70.43.32				N/A	Y	22618
NFS Server on 10.70.42.227				2049	Y	20270
Self-heal Daemon on 10.70.42.227			N/A	Y	20277
NFS Server on 10.70.43.71				2049	Y	24360
Self-heal Daemon on 10.70.43.71				N/A	Y	24367
NFS Server on 10.70.43.74				2049	Y	13945
Self-heal Daemon on 10.70.43.74				N/A	Y	13952
 
Task Status of Volume vol2
------------------------------------------------------------------------------
There are no active volume tasks

Comment 5 Nagaprasad Sathyanarayana 2014-04-21 06:18:13 UTC
Marking snapshot BZs to RHS 3.0.

Comment 6 Avra Sengupta 2014-05-26 07:27:39 UTC
Fixed with http://review.gluster.org/7525

Comment 7 senaik 2014-06-23 06:40:16 UTC
Version : glusterfs 3.6.0.20 built on Jun 19 2014
========

Marking this bug as a dependant of bz 1104478 , as we are getting the error message "glusterd quorum not met" when a new node is attached to the cluster. 

snapshot create: success: Snap snap4 created successfully
snapshot create: failed: glusterds are not in quorum
Snapshot command failed
snapshot create: success: Snap snap6 created successfully

All glusterds were up and running on the nodes , but still we get the message that glusterd quorum is not met. 

----------------Part of log---------------------

name:snapshot15.lab.eng.blr.redhat.com
[2014-06-23 06:03:31.887252] I [glusterd-handler.c:2522:__glusterd_handle_friend_update] 0-: Received uuid: 7e97d0f0-8ae9-40eb-b822-952cc5a8dc46, host
name:10.70.44.54
[2014-06-23 06:03:32.166226] W [glusterd-utils.c:12909:glusterd_snap_quorum_check_for_create] 0-management: glusterds are not in quorum
[2014-06-23 06:03:32.166352] W [glusterd-utils.c:13058:glusterd_snap_quorum_check] 0-management: Quorum checkfailed during snapshot create command
[2014-06-23 06:03:32.166374] W [glusterd-mgmt.c:1846:glusterd_mgmt_v3_initiate_snap_phases] 0-management: quorum check failed
[2014-06-23 06:03:32.166416] W [glusterd-snapshot.c:7012:glusterd_snapshot_postvalidate] 0-management: Snapshot create post-validation failed
[2014-06-23 06:03:32.166433] W [glusterd-mgmt.c:248:gd_mgmt_v3_post_validate_fn] 0-management: postvalidate operation failed
[2014-06-23 06:03:32.166451] E [glusterd-mgmt.c:1335:glusterd_mgmt_v3_post_validate] 0-management: Post Validation failed for operation Snapshot on local node
[2014-06-23 06:03:32.166467] E [glusterd-mgmt.c:1944:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Post Validation Failed
[2014-06-23 06:03:33.972792] I [glusterd-handshake.c:1014:__glusterd_mgmt_hndsk_versions_ack] 0-management: using the op-version 30000

Comment 8 senaik 2014-06-24 06:05:14 UTC
Raised a new bz to track the issue mentioned in Comment 7 .

Marking this bug as dependent of bz 1112250

Comment 9 Avra Sengupta 2014-07-24 07:51:02 UTC
Removing 1114403 from  the dependency list, as it's a clone of 1112250.

Comment 10 Vijaikumar Mallikarjuna 2014-08-08 07:20:11 UTC
I verified this bug executing the steps mentioned in the descrioption and didn't find any issues:

Created a 2 x 2 volume :

[root@snapshot-01 ~]# gluster pool list
UUID					Hostname	State
bd1f458d-09cf-481d-a0b8-dff4a8afb8d0	10.70.42.209	Disconnected 
a90793ca-58a4-429e-b39b-5ad1b88dafa7	localhost	Connected 

[root@snapshot-01 ~]# gluster v i
Volume Name: vol1
Type: Distributed-Replicate
Volume ID: ad2a01be-c045-412e-9c84-0696492beb19
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: s1:/rhs/brick1/dir
Brick2: s3:/brick0/dir
Brick3: s1:/rhs/brick2/dir
Brick4: s3:/brick1/dir
Options Reconfigured:
features.barrier: disable
performance.readdir-ahead: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256


From one terminal started snapshot creation in loop:
[root@snapshot-03 ~]# for  i in {21..30}; do gluster snap create snap$i vol1; done
snapshot create: success: Snap snap21 created successfully
snapshot create: success: Snap snap22 created successfully
snapshot create: success: Snap snap23 created successfully
snapshot create: success: Snap snap24 created successfully
snapshot create: success: Snap snap25 created successfully
snapshot create: success: Snap snap26 created successfully
snapshot create: success: Snap snap27 created successfully
snapshot create: success: Snap snap28 created successfully
snapshot create: success: Snap snap29 created successfully
snapshot create: success: Snap snap30 created successfully

From another terminal attached a new peer
[root@snapshot-03 ~]# gluster peer probe s4
peer probe: success. 
[root@snapshot-03 ~]# gluster pool list
UUID					Hostname	State
a90793ca-58a4-429e-b39b-5ad1b88dafa7	10.70.42.16	Connected 
f1c5bfa4-997a-4c7e-990e-a45e68bb3c11	s4       	Connected 
bd1f458d-09cf-481d-a0b8-dff4a8afb8d0	localhost	Connected 



Marking the bug as verified

Comment 12 errata-xmlrpc 2014-09-22 19:35:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html


Note You need to log in before you can comment on or make changes to this bug.