Bug 1238070 - snapd/quota/nfs runs on the RHGS node, even after that node was detached from trusted storage pool
Summary: snapd/quota/nfs runs on the RHGS node, even after that node was detached from...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: snapshot
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: RHGS 3.1.1
Assignee: Avra Sengupta
QA Contact: SATHEESARAN
URL:
Whiteboard: SNAPSHOT
Depends On:
Blocks: 1238706 1251815 1255386
TreeView+ depends on / blocked
 
Reported: 2015-07-01 06:56 UTC by SATHEESARAN
Modified: 2016-09-17 12:54 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.7.1-13
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1238706 (view as bug list)
Environment:
Last Closed: 2015-10-05 07:16:18 UTC
Embargoed:


Attachments (Terms of Use)
sosreport from node1 - dhcp37-46 (11.75 MB, application/x-xz)
2015-07-01 07:05 UTC, SATHEESARAN
no flags Details
sosreport from node2 - dhcp37-117 (10.64 MB, application/x-xz)
2015-07-01 07:09 UTC, SATHEESARAN
no flags Details
sosreport from node3 - dhcp37-167 (7.71 MB, application/x-xz)
2015-07-01 07:10 UTC, SATHEESARAN
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1845 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.1 update 2015-10-05 11:06:22 UTC

Description SATHEESARAN 2015-07-01 06:56:12 UTC
Description of problem:
-----------------------
When the RHGS node is removed from the trusted storage pool, snapshot daemon still runs on that node

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHGS 3.1 Nightly build

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Create a 'Trusted Storage Pool' ( gluster cluster ) of 3 RHGS nodes. ( NODE1, NODE2, NODE3 )
2. Create a any type of volume with the bricks from NODE1 and NODE2, start the volume
3. Create a gluster volume snapshot of a volume
4. Activate the snapshot
5. Enable USS ( user-serviceable-snapshot ) on the volume
6. Remove the NODE3 from the 'Trusted Storage Pool'

Actual results:
---------------
snapshot daemon was still running on NODE3.

Expected results:
-----------------
snapshot daemon that should be no longer running, as NODE3 doesn't have anyother volumes or bricks

Comment 2 SATHEESARAN 2015-07-01 07:05:49 UTC
Created attachment 1044922 [details]
sosreport from node1 - dhcp37-46

Comment 4 SATHEESARAN 2015-07-01 07:09:09 UTC
Created attachment 1044923 [details]
sosreport from node2 - dhcp37-117

Comment 5 SATHEESARAN 2015-07-01 07:10:38 UTC
Created attachment 1044924 [details]
sosreport from node3 - dhcp37-167

Comment 7 Gaurav Kumar Garg 2015-07-02 13:00:23 UTC
patch available in upstream: http://review.gluster.org/#/c/11509/

Comment 10 Atin Mukherjee 2015-08-08 13:30:09 UTC
(In reply to Gaurav Kumar Garg from comment #7)
> patch available in upstream: http://review.gluster.org/#/c/11509/

This patch is meant for some other problem. I don't think it will solve this problem.

Comment 12 Atin Mukherjee 2015-08-20 11:52:37 UTC
(In reply to Atin Mukherjee from comment #10)
> (In reply to Gaurav Kumar Garg from comment #7)
> > patch available in upstream: http://review.gluster.org/#/c/11509/
> 
> This patch is meant for some other problem. I don't think it will solve this
> problem.

Ignore this comment.

Comment 13 Avra Sengupta 2015-08-20 12:35:27 UTC
Patch upstream at http://review.gluster.org/#/c/11509/

Comment 14 Gaurav Kumar Garg 2015-08-25 05:49:58 UTC
downstream patch url: https://code.engineering.redhat.com/gerrit/56148

Comment 15 Shashank Raj 2015-08-27 11:12:39 UTC
Verified this bug with glusterfs-3.7.1-13 build and its working as expected.

Steps followed are as below:

1) Added the nodes in the cluster:

[root@dhcp35-148 brick1]# gluster peer status
Number of Peers: 3

Hostname: 10.70.35.28
Uuid: a7b191cd-c7f0-4325-90bb-0aee49bff301
State: Peer in Cluster (Connected)

Hostname: 10.70.35.214
Uuid: d2fe96b1-b2d5-4ee6-a166-f677bb2bc3aa
State: Peer in Cluster (Connected)

Hostname: 10.70.35.211
Uuid: 735e1a96-8770-4af6-afac-55acb9789d1c
State: Peer in Cluster (Connected)

2) Created the volume, a snapshot of it , enabled uss, quota and nfs for the volume.

3) Verified that these services are running on the node (Hostname: 10.70.35.211), which is to be detached.

[root@dhcp35-211 brick1]# ps aux|grep snapd
root      7189  0.0  0.2 530816 21164 ?        Ssl  16:09   0:00 /usr/sbin/glusterfsd -s localhost --volfile-id snapd/testvolume -p /var/lib/glusterd/vols/testvolume/run/testvolume-snapd.pid -l /var/log/glusterfs/snaps/testvolume/snapd.log --brick-name snapd-testvolume -S /var/run/gluster/90e5690e1ba0cf3c897a1744552e980d.socket --brick-port 49168 --xlator-option testvolume-server.listen-port=49168 --no-mem-accounting
root      7326  0.0  0.0 112644   964 pts/0    S+   16:12   0:00 grep --color=auto snapd

[root@dhcp35-211 brick1]# ps aux|grep quota
root      7213  0.0  0.4 459468 32300 ?        Ssl  16:09   0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/quotad -p /var/lib/glusterd/quotad/run/quotad.pid -l /var/log/glusterfs/quotad.log -S /var/run/gluster/887f0ad839cfcbb6e3655f06020a40bf.socket --xlator-option *replicate*.data-self-heal=off --xlator-option *replicate*.metadata-self-heal=off --xlator-option *replicate*.entry-self-heal=off
root      7328  0.0  0.0 112640   964 pts/0    S+   16:12   0:00 grep --color=auto quota

[root@dhcp35-211 brick1]# ps aux|grep nfs
root      7199  0.0  0.8 578896 65156 ?        Ssl  16:09   0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/gluster/865cb336d8dffca50721eacbd0d46180.socket
root      7338  0.0  0.0 112644   960 pts/0    S+   16:12   0:00 grep --color=auto nfs


[root@dhcp35-211 brick1]# gluster volume status
Status of volume: testvolume
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.148:/rhs/brick1/b1           49165     0          Y       9244 
Brick 10.70.35.28:/rhs/brick1/b1            49165     0          Y       6421 
Brick 10.70.35.148:/rhs/brick2/b2           49166     0          Y       9262 
Brick 10.70.35.28:/rhs/brick2/b2            49166     0          Y       6448 
Snapshot Daemon on localhost                49168     0          Y       7189 
NFS Server on localhost                     2049      0          Y       7199 
Self-heal Daemon on localhost               N/A       N/A        Y       7204 
Quota Daemon on localhost                   N/A       N/A        Y       7213 
Snapshot Daemon on dhcp35-148.lab.eng.blr.r
edhat.com                                   49169     0          Y       9650 
NFS Server on dhcp35-148.lab.eng.blr.redhat
.com                                        2049      0          Y       10152
Self-heal Daemon on dhcp35-148.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       10160
Quota Daemon on dhcp35-148.lab.eng.blr.redh
at.com                                      N/A       N/A        Y       10601
Snapshot Daemon on 10.70.35.214             49162     0          Y       6867 
NFS Server on 10.70.35.214                  2049      0          Y       7217 
Self-heal Daemon on 10.70.35.214            N/A       N/A        Y       7225 
Quota Daemon on 10.70.35.214                N/A       N/A        Y       7569 
Snapshot Daemon on 10.70.35.28              49169     0          Y       6683 
NFS Server on 10.70.35.28                   2049      0          Y       7063 
Self-heal Daemon on 10.70.35.28             N/A       N/A        Y       7071 
Quota Daemon on 10.70.35.28                 N/A       N/A        Y       7409 
 
Task Status of Volume testvolume
------------------------------------------------------------------------------
There are no active volume tasks
 
4) Detached the node (Hostname: 10.70.35.211) from the cluster:

[root@dhcp35-148 brick1]# gluster peer detach 10.70.35.211
peer detach: success

[root@dhcp35-148 brick1]# gluster peer status
Number of Peers: 2

Hostname: 10.70.35.28
Uuid: a7b191cd-c7f0-4325-90bb-0aee49bff301
State: Peer in Cluster (Connected)

Hostname: 10.70.35.214
Uuid: d2fe96b1-b2d5-4ee6-a166-f677bb2bc3aa
State: Peer in Cluster (Connected)

5) Verified that after detaching the node (Hostname: 10.70.35.211) from the cluster, all the services are stopped on the node and there is no volume present.

[root@dhcp35-211 brick1]# gluster volume status
No volumes present

[root@dhcp35-211 brick1]# ps aux|grep snapd
root      7467  0.0  0.0 112640   964 pts/0    S+   16:18   0:00 grep --color=auto snapd
[root@dhcp35-211 brick1]# ps aux|grep quota
root      7469  0.0  0.0 112640   964 pts/0    S+   16:18   0:00 grep --color=auto quota
[root@dhcp35-211 brick1]# ps aux|grep nfs
root      7471  0.0  0.0 112640   964 pts/0    S+   16:18   0:00 grep --color=auto nfs

Based on the above observation, marking this bug as verified.

Comment 17 errata-xmlrpc 2015-10-05 07:16:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1845.html


Note You need to log in before you can comment on or make changes to this bug.