Description of problem: Clone creation should not be successful when the node participating in volume goes down. Version-Release number of selected component (if applicable): glusterfs-3.7.5-0.3 How reproducible: Always Steps to Reproduce: 1. Create a 4x3 dist-rep volume and start it. 2. Create a snapshot of the volume and activate it. 3. Bring down any one of the node from the pool (in my case it's: 10.70.35.140) 4. Check the volume info, status and snapshot info, status. [root@dhcp35-228 bin]# gluster volume info Volume Name: testvolume Type: Distributed-Replicate Volume ID: bd85248f-2459-4ddb-b5a5-365863985f1a Status: Started Number of Bricks: 4 x 3 = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.228:/bricks/brick0/b0 Brick2: 10.70.35.141:/bricks/brick0/b0 Brick3: 10.70.35.142:/bricks/brick0/b0 Brick4: 10.70.35.140:/bricks/brick0/b0 Brick5: 10.70.35.228:/bricks/brick1/b1 Brick6: 10.70.35.141:/bricks/brick1/b1 Brick7: 10.70.35.142:/bricks/brick1/b1 Brick8: 10.70.35.140:/bricks/brick1/b1 Brick9: 10.70.35.228:/bricks/brick2/b2 Brick10: 10.70.35.141:/bricks/brick2/b2 Brick11: 10.70.35.142:/bricks/brick2/b2 Brick12: 10.70.35.140:/bricks/brick2/b2 Options Reconfigured: features.barrier: disable performance.readdir-ahead: on snap-max-hard-limit: 200 snap-max-soft-limit: 20 cluster.enable-shared-storage: enable [root@dhcp35-228 bin]# gluster volume status Status of volume: testvolume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.228:/bricks/brick0/b0 49152 0 Y 11994 Brick 10.70.35.141:/bricks/brick0/b0 49152 0 Y 11923 Brick 10.70.35.142:/bricks/brick0/b0 49152 0 Y 11916 Brick 10.70.35.228:/bricks/brick1/b1 49153 0 Y 12012 Brick 10.70.35.141:/bricks/brick1/b1 49153 0 Y 11941 Brick 10.70.35.142:/bricks/brick1/b1 49153 0 Y 11934 Brick 10.70.35.228:/bricks/brick2/b2 49154 0 Y 12030 Brick 10.70.35.141:/bricks/brick2/b2 49154 0 Y 11959 Brick 10.70.35.142:/bricks/brick2/b2 49154 0 Y 11953 NFS Server on localhost N/A N/A N N/A Self-heal Daemon on localhost N/A N/A Y 12060 NFS Server on 10.70.35.141 N/A N/A N N/A Self-heal Daemon on 10.70.35.141 N/A N/A Y 11982 NFS Server on 10.70.35.142 N/A N/A N N/A Self-heal Daemon on 10.70.35.142 N/A N/A Y 11984 Task Status of Volume testvolume ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-228 bin]# gluster snapshot info Snapshot : testsnap Snap UUID : f2fb5482-22ca-40e8-813f-03b7735fb3a1 Created : 2015-10-27 10:52:05 Snap Volumes: Snap Volume Name : 69a93590f0ce4df2875faa92f5f3ac2e Origin Volume name : testvolume Snaps taken for testvolume : 1 Snaps available for testvolume : 199 Status : Started [root@dhcp35-228 bin]# gluster snapshot status Snap Name : testsnap Snap UUID : f2fb5482-22ca-40e8-813f-03b7735fb3a1 Brick Path : 10.70.35.228:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick1/b0 Volume Group : RHS_vg0 Brick Running : Yes Brick PID : 12473 Data Percentage : 0.16 LV Size : 19.90g Brick Path : 10.70.35.141:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick2/b0 Volume Group : RHS_vg0 Brick Running : Yes Brick PID : 12279 Data Percentage : 0.16 LV Size : 19.90g Brick Path : 10.70.35.142:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick3/b0 Volume Group : RHS_vg0 Brick Running : Yes Brick PID : 12261 Data Percentage : 0.16 LV Size : 19.90g Brick Path : 10.70.35.228:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick5/b1 Volume Group : RHS_vg1 Brick Running : Yes Brick PID : 12491 Data Percentage : 0.16 LV Size : 19.90g Brick Path : 10.70.35.141:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick6/b1 Volume Group : RHS_vg1 Brick Running : Yes Brick PID : 12297 Data Percentage : 0.16 LV Size : 19.90g Brick Path : 10.70.35.142:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick7/b1 Volume Group : RHS_vg1 Brick Running : Yes Brick PID : 12279 Data Percentage : 0.16 LV Size : 19.90g Brick Path : 10.70.35.228:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick9/b2 Volume Group : RHS_vg2 Brick Running : Yes Brick PID : 12509 Data Percentage : 0.16 LV Size : 19.90g Brick Path : 10.70.35.141:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick10/b2 Volume Group : RHS_vg2 Brick Running : Yes Brick PID : 12315 Data Percentage : 0.16 LV Size : 19.90g Brick Path : 10.70.35.142:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick11/b2 Volume Group : RHS_vg2 Brick Running : Yes Brick PID : 12297 Data Percentage : 0.16 LV Size : 19.90g 5. Create a clone of the snapshot and observe that the clone creation is successful [root@dhcp35-228 bin]# gluster snapshot clone testclone testsnap snapshot clone: success: Clone testclone created successfully [root@dhcp35-228 bin]# gluster volume start testclone volume start: testclone: success 6. Check the info and status of the cloned volume: [root@dhcp35-228 bin]# gluster volume info testclone Volume Name: testclone Type: Distributed-Replicate Volume ID: ded2b595-40f9-4fea-a958-23c3d5226d46 Status: Started Number of Bricks: 4 x 3 = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.228:/run/gluster/snaps/testclone/brick1/b0 Brick2: 10.70.35.141:/run/gluster/snaps/testclone/brick2/b0 Brick3: 10.70.35.142:/run/gluster/snaps/testclone/brick3/b0 Brick4: 10.70.35.140:/run/gluster/snaps/testclone/brick4/b0 Brick5: 10.70.35.228:/run/gluster/snaps/testclone/brick5/b1 Brick6: 10.70.35.141:/run/gluster/snaps/testclone/brick6/b1 Brick7: 10.70.35.142:/run/gluster/snaps/testclone/brick7/b1 Brick8: 10.70.35.140:/run/gluster/snaps/testclone/brick8/b1 Brick9: 10.70.35.228:/run/gluster/snaps/testclone/brick9/b2 Brick10: 10.70.35.141:/run/gluster/snaps/testclone/brick10/b2 Brick11: 10.70.35.142:/run/gluster/snaps/testclone/brick11/b2 Brick12: 10.70.35.140:/run/gluster/snaps/testclone/brick12/b2 Options Reconfigured: performance.readdir-ahead: on snap-max-hard-limit: 200 snap-max-soft-limit: 20 cluster.enable-shared-storage: enable [root@dhcp35-228 bin]# gluster volume status testclone Status of volume: testclone Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.228:/run/gluster/snaps/testc lone/brick1/b0 49158 0 Y 12922 Brick 10.70.35.141:/run/gluster/snaps/testc lone/brick2/b0 49158 0 Y 12516 Brick 10.70.35.142:/run/gluster/snaps/testc lone/brick3/b0 49158 0 Y 12500 Brick 10.70.35.228:/run/gluster/snaps/testc lone/brick5/b1 49159 0 Y 12940 Brick 10.70.35.141:/run/gluster/snaps/testc lone/brick6/b1 49159 0 Y 12534 Brick 10.70.35.142:/run/gluster/snaps/testc lone/brick7/b1 49159 0 Y 12518 Brick 10.70.35.228:/run/gluster/snaps/testc lone/brick9/b2 49160 0 Y 12958 Brick 10.70.35.141:/run/gluster/snaps/testc lone/brick10/b2 49160 0 Y 12552 Brick 10.70.35.142:/run/gluster/snaps/testc lone/brick11/b2 49160 0 Y 12536 NFS Server on localhost 2049 0 Y 12977 Self-heal Daemon on localhost N/A N/A Y 13005 NFS Server on 10.70.35.141 2049 0 Y 12571 Self-heal Daemon on 10.70.35.141 N/A N/A Y 12583 NFS Server on 10.70.35.142 2049 0 Y 12555 Self-heal Daemon on 10.70.35.142 N/A N/A Y 12583 Task Status of Volume testclone ------------------------------------------------------------------------------ There are no active volume tasks 7. Bring back the node online and check the volume status of the cloned volume. Status of volume: testclone Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.228:/run/gluster/snaps/testc lone/brick1/b0 49158 0 Y 12922 Brick 10.70.35.141:/run/gluster/snaps/testc lone/brick2/b0 49158 0 Y 12516 Brick 10.70.35.142:/run/gluster/snaps/testc lone/brick3/b0 49158 0 Y 12500 Brick 10.70.35.140:/run/gluster/snaps/testc lone/brick4/b0 N/A N/A N N/A Brick 10.70.35.228:/run/gluster/snaps/testc lone/brick5/b1 49159 0 Y 12940 Brick 10.70.35.141:/run/gluster/snaps/testc lone/brick6/b1 49159 0 Y 12534 Brick 10.70.35.142:/run/gluster/snaps/testc lone/brick7/b1 49159 0 Y 12518 Brick 10.70.35.140:/run/gluster/snaps/testc lone/brick8/b1 N/A N/A N N/A Brick 10.70.35.228:/run/gluster/snaps/testc lone/brick9/b2 49160 0 Y 12958 Brick 10.70.35.141:/run/gluster/snaps/testc lone/brick10/b2 49160 0 Y 12552 Brick 10.70.35.142:/run/gluster/snaps/testc lone/brick11/b2 49160 0 Y 12536 Brick 10.70.35.140:/run/gluster/snaps/testc lone/brick12/b2 N/A N/A N N/A NFS Server on localhost 2049 0 Y 12977 Self-heal Daemon on localhost N/A N/A Y 13005 NFS Server on 10.70.35.140 N/A N/A N N/A Self-heal Daemon on 10.70.35.140 N/A N/A Y 1939 NFS Server on 10.70.35.141 2049 0 Y 12571 Self-heal Daemon on 10.70.35.141 N/A N/A Y 12583 NFS Server on 10.70.35.142 2049 0 Y 12555 Self-heal Daemon on 10.70.35.142 N/A N/A Y 12583 Actual results: Clone creation is successful when the node participating in volume is down. Expected results: Clone should not created when any node participating in volume or snapshot goes down. Additional info:
Patch sent for master (upstream). http://review.gluster.org/12490
Master URL : http://review.gluster.org/#/c/12490/ Release 3.7 URL : http://review.gluster.org/#/c/12869/ RHGS 3.1.2 URL : https://code.engineering.redhat.com/gerrit/63012
Verified this bug with latest glusterfs-3.7.5-10 build and its working as expected. Steps followed are as below: 1) Create a 4 node cluster, create a tiered volume using all the nodes and start it. 2) Create a snapshot of this volume 3) Create a clone of this snapshot using below commands: [root@dhcp35-141 ~]# gluster snapshot clone clone1 snap1 snapshot clone: success: Clone clone1 created successfully [root@dhcp35-141 ~]# gluster snapshot clone clone2 snap1 snapshot clone: success: Clone clone2 created successfully [root@dhcp35-141 ~]# gluster snapshot clone clone3 snap1 snapshot clone: success: Clone clone3 created successfully 4) shutdown one of the node. 5) Check the volume status which doesn't list the bricks for the down node. Status of volume: tiervolume Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Hot Bricks: Brick 10.70.35.142:/bricks/brick3/b3 49155 0 Y 15541 Brick 10.70.35.141:/bricks/brick3/b3 49155 0 Y 15564 Brick 10.70.35.228:/bricks/brick3/b3 49155 0 Y 15676 Cold Bricks: Brick 10.70.35.228:/bricks/brick0/b0 49152 0 Y 15474 Brick 10.70.35.141:/bricks/brick0/b0 49152 0 Y 15400 Brick 10.70.35.142:/bricks/brick0/b0 49152 0 Y 15376 Brick 10.70.35.228:/bricks/brick1/b1 49153 0 Y 15493 Brick 10.70.35.141:/bricks/brick1/b1 49153 0 Y 15419 Brick 10.70.35.142:/bricks/brick1/b1 49153 0 Y 15395 Brick 10.70.35.228:/bricks/brick2/b2 49154 0 Y 15512 Brick 10.70.35.141:/bricks/brick2/b2 49154 0 Y 15438 Brick 10.70.35.142:/bricks/brick2/b2 49154 0 Y 15414 NFS Server on localhost 2049 0 Y 15696 Self-heal Daemon on localhost N/A N/A Y 15704 Quota Daemon on localhost N/A N/A Y 15712 NFS Server on 10.70.35.142 2049 0 Y 15561 Self-heal Daemon on 10.70.35.142 N/A N/A Y 15569 Quota Daemon on 10.70.35.142 N/A N/A Y 15577 NFS Server on 10.70.35.141 2049 0 Y 15584 Self-heal Daemon on 10.70.35.141 N/A N/A Y 15592 Quota Daemon on 10.70.35.141 N/A N/A Y 15600 Task Status of Volume tiervolume 6) Try to create a clone from the snapshot from different nodes and observe that it fails with the message "quorum is not met" [root@dhcp35-141 ~]# gluster snapshot clone clone4 snap1 snapshot clone: failed: quorum is not met Snapshot command failed [root@dhcp35-228 ~]# gluster snapshot clone clone5 snap1 snapshot clone: failed: quorum is not met Snapshot command failed Based on the above observations, marking this bug as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0193.html