Bug 1158883
Summary: | [USS]: snapd process is not killed once the glusterd comes back | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Rahul Hinduja <rhinduja> | |
Component: | snapshot | Assignee: | Sachin Pandit <spandit> | |
Status: | CLOSED ERRATA | QA Contact: | Rahul Hinduja <rhinduja> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | unspecified | CC: | nsathyan, rhs-bugs, rjoseph, ssamanta, storage-qa-internal, surs, vagarwal | |
Target Milestone: | --- | Keywords: | ZStream | |
Target Release: | RHGS 3.0.3 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | USS | |||
Fixed In Version: | glusterfs-3.6.0.33-1 | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1161015 1240338 (view as bug list) | Environment: | ||
Last Closed: | 2015-01-15 13:41:29 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1161015, 1162694, 1175735, 1240338, 1240952, 1240955 |
Description
Rahul Hinduja
2014-10-30 12:47:10 UTC
Additional info: ================ Lets say now, you enable the uss on the same volume, than the ports are shown as N/A for all the servers which were brought online [root@inception ~]# gluster v status vol3 | grep -i "snapshot daemon" Snapshot Daemon on localhost 49159 Y 2716 Snapshot Daemon on rhs-arch-srv4.lab.eng.blr.redhat.com N/A Y 3265 Snapshot Daemon on rhs-arch-srv2.lab.eng.blr.redhat.com N/A Y 3868 Snapshot Daemon on rhs-arch-srv3.lab.eng.blr.redhat.com N/A Y 3731 [root@inception ~]# This issue is resolved, and the patch which fixes the issue is reviewed upstream, we are waiting for the regression to pass, so that it can be merged upstream. After that I'll send a relevant patch downstream. https://code.engineering.redhat.com/gerrit/#/c/36772/ fixes the issue Verified the bug with the following gluster version and did not find the issue. Marking the Bug as VERIFIED. [root@dhcp42-244 yum.repos.d]# rpm -qa | grep glusterfs samba-glusterfs-3.6.509-169.1.el6rhs.x86_64 glusterfs-3.6.0.33-1.el6rhs.x86_64 glusterfs-rdma-3.6.0.33-1.el6rhs.x86_64 glusterfs-cli-3.6.0.33-1.el6rhs.x86_64 glusterfs-libs-3.6.0.33-1.el6rhs.x86_64 glusterfs-api-3.6.0.33-1.el6rhs.x86_64 glusterfs-server-3.6.0.33-1.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.33-1.el6rhs.x86_64 glusterfs-fuse-3.6.0.33-1.el6rhs.x86_64 [root@dhcp42-244 yum.repos.d]# service glusterd start Starting glusterd: [ OK ] [root@dhcp42-244 yum.repos.d]# gluster peer status Number of Peers: 3 Hostname: 10.70.43.6 Uuid: 2c0d5fe8-a014-4978-ace7-c663e4cc8d91 State: Peer in Cluster (Connected) Hostname: 10.70.42.204 Uuid: 2a2a1b36-37e3-4336-b82a-b09dcc2f745e State: Peer in Cluster (Connected) Hostname: 10.70.42.10 Uuid: 77c49bfc-6cb4-44f3-be12-41447a3a452e State: Peer in Cluster (Connected) [root@dhcp42-244 yum.repos.d]# [root@dhcp42-244 yum.repos.d]# gluster volume info Volume Name: testvol Type: Distributed-Replicate Volume ID: 60c63773-39e8-4145-9985-5bcedf59cd1b Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.42.244:/rhs/brick1/testvol Brick2: 10.70.43.6:/rhs/brick2/testvol Brick3: 10.70.42.204:/rhs/brick3/testvol Brick4: 10.70.42.10:/rhs/brick4/testvol Options Reconfigured: performance.readdir-ahead: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 Volume Name: testvol1 Type: Distributed-Replicate Volume ID: bcd90c32-e79d-4197-a5b2-b0ea1d52002d Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: 10.70.42.244:/rhs/brick2/testvol Brick2: 10.70.43.6:/rhs/brick3/testvol Brick3: 10.70.42.204:/rhs/brick4/testvol Brick4: 10.70.42.10:/rhs/brick1/testvol Options Reconfigured: performance.readdir-ahead: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 [root@dhcp42-244 yum.repos.d]# gluster volume status Status of volume: testvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.244:/rhs/brick1/testvol 49152 Y 28796 Brick 10.70.43.6:/rhs/brick2/testvol 49152 Y 28582 Brick 10.70.42.204:/rhs/brick3/testvol 49152 Y 28859 Brick 10.70.42.10:/rhs/brick4/testvol 49152 Y 25645 NFS Server on localhost 2049 Y 28810 Self-heal Daemon on localhost N/A Y 28815 NFS Server on 10.70.43.6 2049 Y 28596 Self-heal Daemon on 10.70.43.6 N/A Y 28601 NFS Server on 10.70.42.10 2049 Y 25660 Self-heal Daemon on 10.70.42.10 N/A Y 25665 NFS Server on 10.70.42.204 2049 Y 28873 Self-heal Daemon on 10.70.42.204 N/A Y 28878 Task Status of Volume testvol ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: testvol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.244:/rhs/brick2/testvol 49153 Y 28801 Brick 10.70.43.6:/rhs/brick3/testvol 49153 Y 28589 Brick 10.70.42.204:/rhs/brick4/testvol 49153 Y 28866 Brick 10.70.42.10:/rhs/brick1/testvol 49153 Y 25653 NFS Server on localhost 2049 Y 28810 Self-heal Daemon on localhost N/A Y 28815 NFS Server on 10.70.42.10 2049 Y 25660 Self-heal Daemon on 10.70.42.10 N/A Y 25665 NFS Server on 10.70.43.6 2049 Y 28596 Self-heal Daemon on 10.70.43.6 N/A Y 28601 NFS Server on 10.70.42.204 2049 Y 28873 Self-heal Daemon on 10.70.42.204 N/A Y 28878 Task Status of Volume testvol1 ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp42-244 yum.repos.d]# [root@dhcp42-244 yum.repos.d]# ps -aef | grep glusterfs* root 28796 1 0 00:29 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.42.244 --volfile-id testvol.10.70.42.244.rhs-brick1-testvol -p /var/lib/glusterd/vols/testvol/run/10.70.42.244-rhs-brick1-testvol.pid -S /var/run/5d2ea4e94d53cee919733c03d99598b3.socket --brick-name /rhs/brick1/testvol -l /var/log/glusterfs/bricks/rhs-brick1-testvol.log --xlator-option *-posix.glusterd-uuid=1ed937c4-aaba-4c64-abd8-556f37a63030 --brick-port 49152 --xlator-option testvol-server.listen-port=49152 root 28801 1 0 00:29 ? 00:00:00 /usr/sbin/glusterfsd -s 10.70.42.244 --volfile-id testvol1.10.70.42.244.rhs-brick2-testvol -p /var/lib/glusterd/vols/testvol1/run/10.70.42.244-rhs-brick2-testvol.pid -S /var/run/65406ee4edd7eb0b46d39e0a7738cf24.socket --brick-name /rhs/brick2/testvol -l /var/log/glusterfs/bricks/rhs-brick2-testvol.log --xlator-option *-posix.glusterd-uuid=1ed937c4-aaba-4c64-abd8-556f37a63030 --brick-port 49153 --xlator-option testvol1-server.listen-port=49153 root 28810 1 0 00:29 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/c7d7a0963dade75bd42ba7eef07e657f.socket root 28815 1 0 00:29 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/113cd33d135531e963db306d2e62da0f.socket --xlator-option *replicate*.node-uuid=1ed937c4-aaba-4c64-abd8-556f37a63030 root 28902 28035 0 00:32 pts/0 00:00:00 grep glusterfs* [root@dhcp42-244 yum.repos.d]# gluster snapshot list No snapshots present [root@dhcp42-244 yum.repos.d]# [root@dhcp42-244 ~]# gluster snapshot list snap1 snap2 [root@dhcp42-244 ~]# [root@dhcp42-244 ~]# ps -aef | grep snapd root 29633 28035 0 00:40 pts/0 00:00:00 grep snapd [root@dhcp42-244 ~]# [root@dhcp42-244 ~]# ps -ef | grep snapd root 29660 1 0 00:42 ? 00:00:00 /usr/sbin/glusterfsd -s localhost --volfile-id snapd/testvol -p /var/lib/glusterd/vols/testvol/run/testvol-snapd.pid -l /var/log/glusterfs/snaps/testvol/snapd.log --brick-name snapd-testvol -S /var/run/2b39eca5a85774c651a3ae045f4834e9.socket --brick-port 49154 --xlator-option testvol-server.listen-port=49154 root 29737 28035 0 00:43 pts/0 00:00:00 grep snapd [root@dhcp42-244 ~]# [root@dhcp42-244 ~]# gluster volume info testvol | grep uss features.uss: on [root@dhcp42-244 ~]# [root@dhcp42-244 ~]# gluster v status testvol | grep -i "snapshot daemon" Snapshot Daemon on localhost 49154 Y 29660 Snapshot Daemon on 10.70.42.204 49154 Y 29556 Snapshot Daemon on 10.70.42.10 49154 Y 26344 Snapshot Daemon on 10.70.43.6 49154 Y 29288 [root@dhcp42-244 ~]# [root@dhcp42-244 ~]# gluster volume set testvol features.uss off volume set: success [root@dhcp42-244 ~]# gluster volume info testvol | grep uss features.uss: off [root@dhcp42-244 ~]# gluster v status testvol | grep -i "snapshot daemon" [root@dhcp42-244 ~]# [root@dhcp42-244 ~]# gluster v status testvol | grep -i "snapshot daemon" Snapshot Daemon on localhost 49155 Y 29925 Snapshot Daemon on 10.70.43.6 49155 Y 29497 Snapshot Daemon on 10.70.42.10 49155 Y 26539 Snapshot Daemon on 10.70.42.204 49155 Y 29746 [root@dhcp43-6 yum.repos.d]# service glusterd stop [root@dhcp43-6 yum.repos.d]# [ OK ] Verify that the snapd is not running on the node where glusterd is down: ===================================================== [root@dhcp42-244 ~]# gluster v status testvol | grep -i "snapshot daemon" Snapshot Daemon on localhost 49155 Y 29925 Snapshot Daemon on 10.70.42.204 49155 Y 29746 Snapshot Daemon on 10.70.42.10 49155 Y 26539 [root@dhcp42-244 ~]# [root@dhcp42-244 ~]# gluster volume set testvol features.uss off volume set: success [root@dhcp42-244 ~]# gluster v status testvol | grep -i "snapshot daemon" [root@dhcp42-244 ~]# gluster volume set testvol features.uss on volume set: success [root@dhcp42-244 ~]# gluster v status testvol | grep -i "snapshot daemon" Snapshot Daemon on localhost 49156 Y 30116 Snapshot Daemon on 10.70.42.204 49156 Y 29886 Snapshot Daemon on 10.70.42.10 49156 Y 26679 [root@dhcp42-244 ~]# Restart the glusterd on the node where glusterd is stopped and verify the snapd is running on the node: ========================================================================= [root@dhcp43-6 yum.repos.d]# service glusterd start Starting glusterd: [ OK ] [root@dhcp43-6 yum.repos.d]# [root@dhcp42-244 ~]# gluster v status testvol | grep -i "snapshot daemon" Snapshot Daemon on localhost 49156 Y 30116 Snapshot Daemon on 10.70.42.10 49156 Y 26679 Snapshot Daemon on 10.70.42.204 49156 Y 29886 Snapshot Daemon on 10.70.43.6 49156 Y 29798 [root@dhcp42-244 ~]# [root@dhcp42-244 ~]# gluster volume set testvol features.uss off volume set: success [root@dhcp42-244 ~]# gluster v status testvol | grep -i "snapshot daemon" [root@dhcp42-244 ~]# [root@dhcp42-244 ~]# gluster snapshot status Snap Name : snap1 Snap UUID : 78cc0645-31f7-4b9c-8d4b-c0565247f84e Brick Path : 10.70.42.244:/var/run/gluster/snaps/623c4bb66e584122830e27bb9e512519/brick1/testvol Volume Group : RHS_vg1 Brick Running : No Brick PID : N/A Data Percentage : 0.20 LV Size : 13.47g Brick Path : 10.70.43.6:/var/run/gluster/snaps/623c4bb66e584122830e27bb9e512519/brick2/testvol Volume Group : RHS_vg2 Brick Running : No Brick PID : N/A Data Percentage : 0.20 LV Size : 13.47g Brick Path : 10.70.42.204:/var/run/gluster/snaps/623c4bb66e584122830e27bb9e512519/brick3/testvol Volume Group : RHS_vg3 Brick Running : No Brick PID : N/A Data Percentage : 0.20 LV Size : 13.47g Brick Path : 10.70.42.10:/var/run/gluster/snaps/623c4bb66e584122830e27bb9e512519/brick4/testvol Volume Group : RHS_vg4 Brick Running : No Brick PID : N/A Data Percentage : 0.20 LV Size : 13.47g Snap Name : snap2 Snap UUID : 3febe842-d07c-4b54-8e5c-d17c60c8e845 Brick Path : 10.70.42.244:/var/run/gluster/snaps/95b63de2c6af4c0d995b0012ffc5b60e/brick1/testvol Volume Group : RHS_vg1 Brick Running : No Brick PID : N/A Data Percentage : 0.20 LV Size : 13.47g Brick Path : 10.70.43.6:/var/run/gluster/snaps/95b63de2c6af4c0d995b0012ffc5b60e/brick2/testvol Volume Group : RHS_vg2 Brick Running : No Brick PID : N/A Data Percentage : 0.20 LV Size : 13.47g Brick Path : 10.70.42.204:/var/run/gluster/snaps/95b63de2c6af4c0d995b0012ffc5b60e/brick3/testvol Volume Group : RHS_vg3 Brick Running : No Brick PID : N/A Data Percentage : 0.20 LV Size : 13.47g Brick Path : 10.70.42.10:/var/run/gluster/snaps/95b63de2c6af4c0d995b0012ffc5b60e/brick4/testvol Volume Group : RHS_vg4 Brick Running : No Brick PID : N/A Data Percentage : 0.20 LV Size : 13.47g [root@dhcp42-244 ~]# gluster snapshot activate snap1 Snapshot activate: snap1: Snap activated successfully [root@dhcp42-244 ~]# gluster snapshot activate snap2 Snapshot activate: snap2: Snap activated successfully [root@dhcp42-244 ~]# [root@dhcp43-190 .snaps]# ls -lrt total 0 d---------. 0 root root 0 Dec 31 1969 snap2 d---------. 0 root root 0 Dec 31 1969 snap1 [root@dhcp43-190 .snaps]# [root@dhcp42-244 ~]# gluster volume stop testvol Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: testvol: success [root@dhcp42-244 ~]# gluster volume status Volume testvol is not started Status of volume: testvol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.244:/rhs/brick2/testvol 49153 Y 28801 Brick 10.70.43.6:/rhs/brick3/testvol 49153 Y 28589 Brick 10.70.42.204:/rhs/brick4/testvol 49153 Y 28866 Brick 10.70.42.10:/rhs/brick1/testvol 49153 Y 25653 NFS Server on localhost 2049 Y 31452 Self-heal Daemon on localhost N/A Y 31459 NFS Server on 10.70.42.204 2049 Y 31083 Self-heal Daemon on 10.70.42.204 N/A Y 31090 NFS Server on 10.70.42.10 2049 Y 27862 Self-heal Daemon on 10.70.42.10 N/A Y 27877 NFS Server on 10.70.43.6 2049 Y 31003 Self-heal Daemon on 10.70.43.6 N/A Y 31010 Task Status of Volume testvol1 ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp42-244 ~]# gluster volume start testvol volume start: testvol: success [root@dhcp42-244 ~]# gluster volume status Status of volume: testvol Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.244:/rhs/brick1/testvol 49152 Y 31493 Brick 10.70.43.6:/rhs/brick2/testvol 49152 Y 31033 Brick 10.70.42.204:/rhs/brick3/testvol 49152 Y 31107 Brick 10.70.42.10:/rhs/brick4/testvol 49152 Y 27893 Snapshot Daemon on localhost 49157 Y 31505 NFS Server on localhost 2049 Y 31512 Self-heal Daemon on localhost N/A Y 31523 Snapshot Daemon on 10.70.43.6 49157 Y 31045 NFS Server on 10.70.43.6 2049 Y 31052 Self-heal Daemon on 10.70.43.6 N/A N N/A Snapshot Daemon on 10.70.42.10 49157 Y 27905 NFS Server on 10.70.42.10 N/A N N/A Self-heal Daemon on 10.70.42.10 N/A N N/A Snapshot Daemon on 10.70.42.204 49157 Y 31119 NFS Server on 10.70.42.204 N/A N N/A Self-heal Daemon on 10.70.42.204 N/A N N/A Task Status of Volume testvol ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: testvol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.42.244:/rhs/brick2/testvol 49153 Y 28801 Brick 10.70.43.6:/rhs/brick3/testvol 49153 Y 28589 Brick 10.70.42.204:/rhs/brick4/testvol 49153 Y 28866 Brick 10.70.42.10:/rhs/brick1/testvol 49153 Y 25653 NFS Server on localhost 2049 Y 31512 Self-heal Daemon on localhost N/A Y 31523 NFS Server on 10.70.43.6 2049 Y 31052 Self-heal Daemon on 10.70.43.6 N/A N N/A NFS Server on 10.70.42.10 N/A N N/A Self-heal Daemon on 10.70.42.10 N/A N N/A NFS Server on 10.70.42.204 N/A N N/A Self-heal Daemon on 10.70.42.204 N/A N N/A Task Status of Volume testvol1 ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp42-244 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0038.html |