Description of problem: ============================ When a brick is offline restarted glusterd to bring back the brick online which is restarting the brick process. But it is also restarting online "nfs" and "self-heal-daemon" process . Version-Release number of selected component (if applicable): =========================================================== glusterfs 3.6.0.24 built on Jul 3 2014 11:03:38 How reproducible: =================== Often Steps to Reproduce: ========================== 1. Create a 2 x 2 distribute-replicate volume. Start the volume. 2. Create a fuse mount. Create files/dirs from the mount. 3. Bring down brick1. (kill -KILL <brick_pid>) 4. simulated disk replacement (rm -rf <brick_path>/*) 5. restart glusterd. (service glusterd restart) Actual results: ===================== Output from NFS process restarts: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ root@rhs-client11 [Jul-23-2014- 9:20:20] >ps -ef | grep nfs root 9423 1 0 09:04 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket root 9699 8394 0 09:22 pts/0 00:00:00 grep nfs root@rhs-client11 [Jul-23-2014- 9:22:47] >gluster v status vol1 Status of volume: vol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device0/b1 49154 Y 9415 Brick rhs-client12:/rhs/device0/b2 49152 Y 14924 Brick rhs-client13:/rhs/device0/b3 49152 Y 22192 Brick rhs-client14:/rhs/device0/b4 49152 Y 28035 NFS Server on localhost 2049 Y 9423 Self-heal Daemon on localhost N/A Y 9430 NFS Server on 10.70.34.92 2049 Y 20426 Self-heal Daemon on 10.70.34.92 N/A Y 20433 NFS Server on rhs-client13 2049 Y 17965 Self-heal Daemon on rhs-client13 N/A Y 17972 NFS Server on rhs-client12 2049 Y 10283 Self-heal Daemon on rhs-client12 N/A Y 10290 NFS Server on rhs-client14 2049 Y 22055 Self-heal Daemon on rhs-client14 N/A Y 22062 Task Status of Volume vol1 ------------------------------------------------------------------------------ There are no active volume tasks root@rhs-client11 [Jul-23-2014- 9:22:52] >kill -KILL 9415 root@rhs-client11 [Jul-23-2014- 9:22:58] >rm -rf /rhs/device0/b1/* root@rhs-client11 [Jul-23-2014- 9:23:02] >service glusterd restart Starting glusterd: [ OK ] root@rhs-client11 [Jul-23-2014- 9:23:20] >ps -ef | grep nfs root 9423 1 0 09:04 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket root 9875 8394 0 09:23 pts/0 00:00:00 grep nfs root@rhs-client11 [Jul-23-2014- 9:23:24] >ps -ef | grep nfs root 9885 8394 0 09:23 pts/0 00:00:00 grep nfs root@rhs-client11 [Jul-23-2014- 9:23:25] >ps -ef | grep nfs root 9888 1 0 09:23 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket root 9889 9888 0 09:23 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket root 9907 8394 0 09:23 pts/0 00:00:00 grep nfs root@rhs-client11 [Jul-23-2014- 9:23:28] >ps -ef | grep nfs root 9889 1 2 09:23 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket root 9918 8394 0 09:23 pts/0 00:00:00 grep nfs root@rhs-client11 [Jul-23-2014- 9:23:30] > Restart of self-heal-daemon process: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ root@rhs-client11 [Jul-23-2014- 9:02:33] >gluster v status vol1 Status of volume: vol1 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick rhs-client11:/rhs/device0/b1 49154 Y 8853 Brick rhs-client12:/rhs/device0/b2 49152 Y 14924 Brick rhs-client13:/rhs/device0/b3 49152 Y 22192 Brick rhs-client14:/rhs/device0/b4 49152 Y 28035 NFS Server on localhost 2049 Y 9121 Self-heal Daemon on localhost N/A Y 9128 NFS Server on 10.70.34.92 2049 Y 20426 Self-heal Daemon on 10.70.34.92 N/A Y 20433 NFS Server on rhs-client13 2049 Y 17965 Self-heal Daemon on rhs-client13 N/A Y 17972 NFS Server on rhs-client14 2049 Y 22055 Self-heal Daemon on rhs-client14 N/A Y 22062 NFS Server on rhs-client12 2049 Y 10283 Self-heal Daemon on rhs-client12 N/A Y 10290 Task Status of Volume vol1 ------------------------------------------------------------------------------ There are no active volume tasks root@rhs-client11 [Jul-23-2014- 9:03:41] > root@rhs-client11 [Jul-23-2014- 9:03:42] >ps -ef | grep glustershd root 9128 1 0 09:00 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db root 9236 8394 0 09:03 pts/0 00:00:00 grep glustershd root@rhs-client11 [Jul-23-2014- 9:03:48] >kill -KILL 8853 root@rhs-client11 [Jul-23-2014- 9:04:02] >rm -rf /rhs/device0/b1/* root@rhs-client11 [Jul-23-2014- 9:04:13] >ps -ef | grep glustershd root 9128 1 0 09:00 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db root 9256 8394 0 09:04 pts/0 00:00:00 grep glustershd root@rhs-client11 [Jul-23-2014- 9:04:17] > root@rhs-client11 [Jul-23-2014- 9:04:19] > root@rhs-client11 [Jul-23-2014- 9:04:20] >service glusterd restart Starting glusterd: [ OK ] root@rhs-client11 [Jul-23-2014- 9:04:30] > root@rhs-client11 [Jul-23-2014- 9:04:31] >ps -ef | grep glustershd root 9128 1 0 09:00 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db root 9410 8394 0 09:04 pts/0 00:00:00 grep glustershd root@rhs-client11 [Jul-23-2014- 9:04:33] > root@rhs-client11 [Jul-23-2014- 9:04:35] > root@rhs-client11 [Jul-23-2014- 9:04:36] >ps -ef | grep glustershd root 9429 1 0 09:04 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db root 9430 9429 1 09:04 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db root 9441 8394 0 09:04 pts/0 00:00:00 grep glustershd root@rhs-client11 [Jul-23-2014- 9:04:38] >ps -ef | grep glustershd root 9430 1 1 09:04 ? 00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db root 9453 8394 0 09:04 pts/0 00:00:00 grep glustershd root@rhs-client11 [Jul-23-2014- 9:04:42] > Expected results: ===================== restarting glusterd should only restart the offline processes, not the online processes. Additional info: ==================== root@rhs-client11 [Jul-23-2014- 9:33:26] >gluster v info Volume Name: vol1 Type: Distributed-Replicate Volume ID: 5cc2e193-af63-45b5-834f-9bd757cf4e84 Status: Started Snap Volume: no Number of Bricks: 2 x 2 = 4 Transport-type: tcp Bricks: Brick1: rhs-client11:/rhs/device0/b1 Brick2: rhs-client12:/rhs/device0/b2 Brick3: rhs-client13:/rhs/device0/b3 Brick4: rhs-client14:/rhs/device0/b4 Options Reconfigured: performance.readdir-ahead: on performance.write-behind: on auto-delete: disable snap-max-soft-limit: 90 snap-max-hard-limit: 256 root@rhs-client11 [Jul-23-2014- 9:33:29] >
SOS Reports: http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1122371/
Please review and sign-off edited doc text.
(In reply to Shalaka from comment #5) > Please review and sign-off edited doc text. Looks good.
We'll not fix this issue as this doesn't impact any functionality apart from restarting the daemons.