1275633 – Clone creation should not be successful when the node participating in volume goes down.

Bug 1275633 - Clone creation should not be successful when the node participating in volume goes down.

Summary: Clone creation should not be successful when the node participating in volume...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	RHGS 3.1.2
Assignee:	Avra Sengupta
QA Contact:	Shashank Raj
Docs Contact:
URL:
Whiteboard:	SNAPSHOT
Depends On:
Blocks:	1260783 1276023 1288030
TreeView+	depends on / blocked

Reported:	2015-10-27 11:47 UTC by Shashank Raj
Modified:	2016-11-08 03:53 UTC (History)
CC List:	8 users (show)
Fixed In Version:	glusterfs-3.7.5-10
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1276023 (view as bug list)
Environment:
Last Closed:	2016-03-01 05:45:38 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0193	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 update 2	2016-03-01 10:20:36 UTC

Description Shashank Raj 2015-10-27 11:47:04 UTC

Description of problem:

Clone creation should not be successful when the node participating in volume goes down.

Version-Release number of selected component (if applicable):

glusterfs-3.7.5-0.3

How reproducible:

Always

Steps to Reproduce:

1. Create a 4x3 dist-rep volume and start it.
2. Create a snapshot of the volume and activate it.
3. Bring down any one of the node from the pool (in my case it's: 10.70.35.140)
4. Check the volume info, status and snapshot info, status.

 
[root@dhcp35-228 bin]# gluster volume info
 
Volume Name: testvolume
Type: Distributed-Replicate
Volume ID: bd85248f-2459-4ddb-b5a5-365863985f1a
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.228:/bricks/brick0/b0
Brick2: 10.70.35.141:/bricks/brick0/b0
Brick3: 10.70.35.142:/bricks/brick0/b0
Brick4: 10.70.35.140:/bricks/brick0/b0
Brick5: 10.70.35.228:/bricks/brick1/b1
Brick6: 10.70.35.141:/bricks/brick1/b1
Brick7: 10.70.35.142:/bricks/brick1/b1
Brick8: 10.70.35.140:/bricks/brick1/b1
Brick9: 10.70.35.228:/bricks/brick2/b2
Brick10: 10.70.35.141:/bricks/brick2/b2
Brick11: 10.70.35.142:/bricks/brick2/b2
Brick12: 10.70.35.140:/bricks/brick2/b2
Options Reconfigured:
features.barrier: disable
performance.readdir-ahead: on
snap-max-hard-limit: 200
snap-max-soft-limit: 20
cluster.enable-shared-storage: enable

[root@dhcp35-228 bin]# gluster volume status
Status of volume: testvolume
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.228:/bricks/brick0/b0        49152     0          Y       11994
Brick 10.70.35.141:/bricks/brick0/b0        49152     0          Y       11923
Brick 10.70.35.142:/bricks/brick0/b0        49152     0          Y       11916
Brick 10.70.35.228:/bricks/brick1/b1        49153     0          Y       12012
Brick 10.70.35.141:/bricks/brick1/b1        49153     0          Y       11941
Brick 10.70.35.142:/bricks/brick1/b1        49153     0          Y       11934
Brick 10.70.35.228:/bricks/brick2/b2        49154     0          Y       12030
Brick 10.70.35.141:/bricks/brick2/b2        49154     0          Y       11959
Brick 10.70.35.142:/bricks/brick2/b2        49154     0          Y       11953
NFS Server on localhost                     N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       12060
NFS Server on 10.70.35.141                  N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.35.141            N/A       N/A        Y       11982
NFS Server on 10.70.35.142                  N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.35.142            N/A       N/A        Y       11984
 
Task Status of Volume testvolume
------------------------------------------------------------------------------
There are no active volume tasks


[root@dhcp35-228 bin]# gluster snapshot info
Snapshot                  : testsnap
Snap UUID                 : f2fb5482-22ca-40e8-813f-03b7735fb3a1
Created                   : 2015-10-27 10:52:05
Snap Volumes:

        Snap Volume Name          : 69a93590f0ce4df2875faa92f5f3ac2e
        Origin Volume name        : testvolume
        Snaps taken for testvolume      : 1
        Snaps available for testvolume  : 199
        Status                    : Started

[root@dhcp35-228 bin]# gluster snapshot status
Snap Name : testsnap
Snap UUID : f2fb5482-22ca-40e8-813f-03b7735fb3a1

        Brick Path        :   10.70.35.228:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick1/b0
        Volume Group      :   RHS_vg0
        Brick Running     :   Yes
        Brick PID         :   12473
        Data Percentage   :   0.16
        LV Size           :   19.90g


        Brick Path        :   10.70.35.141:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick2/b0
        Volume Group      :   RHS_vg0
        Brick Running     :   Yes
        Brick PID         :   12279
        Data Percentage   :   0.16
        LV Size           :   19.90g


        Brick Path        :   10.70.35.142:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick3/b0
        Volume Group      :   RHS_vg0
        Brick Running     :   Yes
        Brick PID         :   12261
        Data Percentage   :   0.16
        LV Size           :   19.90g


        Brick Path        :   10.70.35.228:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick5/b1
        Volume Group      :   RHS_vg1
        Brick Running     :   Yes
        Brick PID         :   12491
        Data Percentage   :   0.16
        LV Size           :   19.90g

        Brick Path        :   10.70.35.141:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick6/b1
        Volume Group      :   RHS_vg1
        Brick Running     :   Yes
        Brick PID         :   12297
        Data Percentage   :   0.16
        LV Size           :   19.90g


        Brick Path        :   10.70.35.142:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick7/b1
        Volume Group      :   RHS_vg1
        Brick Running     :   Yes
        Brick PID         :   12279
        Data Percentage   :   0.16
        LV Size           :   19.90g


        Brick Path        :   10.70.35.228:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick9/b2
        Volume Group      :   RHS_vg2
        Brick Running     :   Yes
        Brick PID         :   12509
        Data Percentage   :   0.16
        LV Size           :   19.90g


        Brick Path        :   10.70.35.141:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick10/b2
        Volume Group      :   RHS_vg2
        Brick Running     :   Yes
        Brick PID         :   12315
        Data Percentage   :   0.16
        LV Size           :   19.90g

	
         Brick Path        :  10.70.35.142:/run/gluster/snaps/69a93590f0ce4df2875faa92f5f3ac2e/brick11/b2
        Volume Group      :   RHS_vg2
        Brick Running     :   Yes
        Brick PID         :   12297
        Data Percentage   :   0.16
        LV Size           :   19.90g

5. Create a clone of the snapshot and observe that the clone creation is successful

[root@dhcp35-228 bin]# gluster snapshot clone testclone testsnap
snapshot clone: success: Clone testclone created successfully

[root@dhcp35-228 bin]# gluster volume start testclone
volume start: testclone: success

6. Check the info and status of the cloned volume:

[root@dhcp35-228 bin]# gluster volume info testclone
 
Volume Name: testclone
Type: Distributed-Replicate
Volume ID: ded2b595-40f9-4fea-a958-23c3d5226d46
Status: Started
Number of Bricks: 4 x 3 = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.228:/run/gluster/snaps/testclone/brick1/b0
Brick2: 10.70.35.141:/run/gluster/snaps/testclone/brick2/b0
Brick3: 10.70.35.142:/run/gluster/snaps/testclone/brick3/b0
Brick4: 10.70.35.140:/run/gluster/snaps/testclone/brick4/b0
Brick5: 10.70.35.228:/run/gluster/snaps/testclone/brick5/b1
Brick6: 10.70.35.141:/run/gluster/snaps/testclone/brick6/b1
Brick7: 10.70.35.142:/run/gluster/snaps/testclone/brick7/b1
Brick8: 10.70.35.140:/run/gluster/snaps/testclone/brick8/b1
Brick9: 10.70.35.228:/run/gluster/snaps/testclone/brick9/b2
Brick10: 10.70.35.141:/run/gluster/snaps/testclone/brick10/b2
Brick11: 10.70.35.142:/run/gluster/snaps/testclone/brick11/b2
Brick12: 10.70.35.140:/run/gluster/snaps/testclone/brick12/b2
Options Reconfigured:
performance.readdir-ahead: on
snap-max-hard-limit: 200
snap-max-soft-limit: 20
cluster.enable-shared-storage: enable

[root@dhcp35-228 bin]# gluster volume status testclone
Status of volume: testclone
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.228:/run/gluster/snaps/testc
lone/brick1/b0                              49158     0          Y       12922
Brick 10.70.35.141:/run/gluster/snaps/testc
lone/brick2/b0                              49158     0          Y       12516
Brick 10.70.35.142:/run/gluster/snaps/testc
lone/brick3/b0                              49158     0          Y       12500
Brick 10.70.35.228:/run/gluster/snaps/testc
lone/brick5/b1                              49159     0          Y       12940
Brick 10.70.35.141:/run/gluster/snaps/testc
lone/brick6/b1                              49159     0          Y       12534
Brick 10.70.35.142:/run/gluster/snaps/testc
lone/brick7/b1                              49159     0          Y       12518
Brick 10.70.35.228:/run/gluster/snaps/testc
lone/brick9/b2                              49160     0          Y       12958
Brick 10.70.35.141:/run/gluster/snaps/testc
lone/brick10/b2                             49160     0          Y       12552
Brick 10.70.35.142:/run/gluster/snaps/testc
lone/brick11/b2                             49160     0          Y       12536
NFS Server on localhost                     2049      0          Y       12977
Self-heal Daemon on localhost               N/A       N/A        Y       13005
NFS Server on 10.70.35.141                  2049      0          Y       12571
Self-heal Daemon on 10.70.35.141            N/A       N/A        Y       12583
NFS Server on 10.70.35.142                  2049      0          Y       12555
Self-heal Daemon on 10.70.35.142            N/A       N/A        Y       12583
 
Task Status of Volume testclone
------------------------------------------------------------------------------
There are no active volume tasks
 
7. Bring back the node online and check the volume status of the cloned volume. 
Status of volume: testclone
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.228:/run/gluster/snaps/testc
lone/brick1/b0                              49158     0          Y       12922
Brick 10.70.35.141:/run/gluster/snaps/testc
lone/brick2/b0                              49158     0          Y       12516
Brick 10.70.35.142:/run/gluster/snaps/testc
lone/brick3/b0                              49158     0          Y       12500
Brick 10.70.35.140:/run/gluster/snaps/testc
lone/brick4/b0                              N/A       N/A        N       N/A  
Brick 10.70.35.228:/run/gluster/snaps/testc
lone/brick5/b1                              49159     0          Y       12940
Brick 10.70.35.141:/run/gluster/snaps/testc
lone/brick6/b1                              49159     0          Y       12534
Brick 10.70.35.142:/run/gluster/snaps/testc
lone/brick7/b1                              49159     0          Y       12518
Brick 10.70.35.140:/run/gluster/snaps/testc
lone/brick8/b1                              N/A       N/A        N       N/A  
Brick 10.70.35.228:/run/gluster/snaps/testc
lone/brick9/b2                              49160     0          Y       12958
Brick 10.70.35.141:/run/gluster/snaps/testc
lone/brick10/b2                             49160     0          Y       12552
Brick 10.70.35.142:/run/gluster/snaps/testc
lone/brick11/b2                             49160     0          Y       12536
Brick 10.70.35.140:/run/gluster/snaps/testc
lone/brick12/b2                             N/A       N/A        N       N/A  
NFS Server on localhost                     2049      0          Y       12977
Self-heal Daemon on localhost               N/A       N/A        Y       13005
NFS Server on 10.70.35.140                  N/A       N/A        N       N/A  
Self-heal Daemon on 10.70.35.140            N/A       N/A        Y       1939 
NFS Server on 10.70.35.141                  2049      0          Y       12571
Self-heal Daemon on 10.70.35.141            N/A       N/A        Y       12583
NFS Server on 10.70.35.142                  2049      0          Y       12555
Self-heal Daemon on 10.70.35.142            N/A       N/A        Y       12583


Actual results:

Clone creation is successful when the node participating in volume is down.

Expected results:

Clone should not created when any node participating in volume or snapshot goes down.

Additional info:

Comment 2 Avra Sengupta 2015-11-02 12:36:46 UTC

Patch sent for master (upstream). http://review.gluster.org/12490

Comment 3 Avra Sengupta 2015-12-04 07:29:33 UTC

Master URL : http://review.gluster.org/#/c/12490/
Release 3.7 URL : http://review.gluster.org/#/c/12869/
RHGS 3.1.2 URL : https://code.engineering.redhat.com/gerrit/63012

Comment 5 Shashank Raj 2015-12-09 07:33:22 UTC

Verified this bug with latest glusterfs-3.7.5-10 build and its working as expected.

Steps followed are as below:

1) Create a 4 node cluster, create a tiered volume using all the nodes and start it.
2) Create a snapshot of this volume
3) Create a clone of this snapshot using below commands:

[root@dhcp35-141 ~]# gluster snapshot clone clone1 snap1
snapshot clone: success: Clone clone1 created successfully

[root@dhcp35-141 ~]# gluster snapshot clone clone2 snap1
snapshot clone: success: Clone clone2 created successfully

[root@dhcp35-141 ~]# gluster snapshot clone clone3 snap1
snapshot clone: success: Clone clone3 created successfully

4) shutdown one of the node.
5) Check the volume status which doesn't list the bricks for the down node.

Status of volume: tiervolume
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Hot Bricks:
Brick 10.70.35.142:/bricks/brick3/b3        49155     0          Y       15541
Brick 10.70.35.141:/bricks/brick3/b3        49155     0          Y       15564
Brick 10.70.35.228:/bricks/brick3/b3        49155     0          Y       15676
Cold Bricks:
Brick 10.70.35.228:/bricks/brick0/b0        49152     0          Y       15474
Brick 10.70.35.141:/bricks/brick0/b0        49152     0          Y       15400
Brick 10.70.35.142:/bricks/brick0/b0        49152     0          Y       15376
Brick 10.70.35.228:/bricks/brick1/b1        49153     0          Y       15493
Brick 10.70.35.141:/bricks/brick1/b1        49153     0          Y       15419
Brick 10.70.35.142:/bricks/brick1/b1        49153     0          Y       15395
Brick 10.70.35.228:/bricks/brick2/b2        49154     0          Y       15512
Brick 10.70.35.141:/bricks/brick2/b2        49154     0          Y       15438
Brick 10.70.35.142:/bricks/brick2/b2        49154     0          Y       15414
NFS Server on localhost                     2049      0          Y       15696
Self-heal Daemon on localhost               N/A       N/A        Y       15704
Quota Daemon on localhost                   N/A       N/A        Y       15712
NFS Server on 10.70.35.142                  2049      0          Y       15561
Self-heal Daemon on 10.70.35.142            N/A       N/A        Y       15569
Quota Daemon on 10.70.35.142                N/A       N/A        Y       15577
NFS Server on 10.70.35.141                  2049      0          Y       15584
Self-heal Daemon on 10.70.35.141            N/A       N/A        Y       15592
Quota Daemon on 10.70.35.141                N/A       N/A        Y       15600
 
Task Status of Volume tiervolume

6) Try to create a clone from the snapshot from different nodes and observe that it fails with the message "quorum is not met"

[root@dhcp35-141 ~]# gluster snapshot clone clone4 snap1
snapshot clone: failed: quorum is not met
Snapshot command failed

[root@dhcp35-228 ~]# gluster snapshot clone clone5 snap1
snapshot clone: failed: quorum is not met
Snapshot command failed

Based on the above observations, marking this bug as Verified.

Comment 7 errata-xmlrpc 2016-03-01 05:45:38 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0193.html

Note You need to log in before you can comment on or make changes to this bug.