Bug 1122371 - Restarting glusterd to bring a offline brick online is also restarting nfs and glustershd process
Summary: Restarting glusterd to bring a offline brick online is also restarting nfs an...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: 1087818 1122509
TreeView+ depends on / blocked
 
Reported: 2014-07-23 05:55 UTC by spandura
Modified: 2016-06-23 04:45 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
The NFS server process and gluster self-heal daemon process restarts when gluster daemon process is restarted.
Clone Of:
: 1122509 (view as bug list)
Environment:
Last Closed: 2016-06-23 04:45:08 UTC


Attachments (Terms of Use)

Description spandura 2014-07-23 05:55:39 UTC
Description of problem:
============================
When a brick is offline restarted glusterd to bring back the brick online which is restarting the brick process. But it is also restarting online "nfs" and "self-heal-daemon" process . 

Version-Release number of selected component (if applicable):
===========================================================
glusterfs 3.6.0.24 built on Jul  3 2014 11:03:38

How reproducible:
===================
Often

Steps to Reproduce:
==========================
1. Create a 2 x 2 distribute-replicate volume. Start the volume. 

2. Create a fuse mount. Create files/dirs from the mount. 

3. Bring down brick1. (kill -KILL <brick_pid>)

4. simulated disk replacement (rm -rf <brick_path>/*)

5. restart glusterd. (service glusterd restart)

Actual results:
=====================
Output from NFS process restarts:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root@rhs-client11 [Jul-23-2014- 9:20:20] >ps -ef | grep nfs
root      9423     1  0 09:04 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket
root      9699  8394  0 09:22 pts/0    00:00:00 grep nfs

root@rhs-client11 [Jul-23-2014- 9:22:47] >gluster v status vol1
Status of volume: vol1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/device0/b1			49154	Y	9415
Brick rhs-client12:/rhs/device0/b2			49152	Y	14924
Brick rhs-client13:/rhs/device0/b3			49152	Y	22192
Brick rhs-client14:/rhs/device0/b4			49152	Y	28035
NFS Server on localhost					2049	Y	9423
Self-heal Daemon on localhost				N/A	Y	9430
NFS Server on 10.70.34.92				2049	Y	20426
Self-heal Daemon on 10.70.34.92				N/A	Y	20433
NFS Server on rhs-client13				2049	Y	17965
Self-heal Daemon on rhs-client13			N/A	Y	17972
NFS Server on rhs-client12				2049	Y	10283
Self-heal Daemon on rhs-client12			N/A	Y	10290
NFS Server on rhs-client14				2049	Y	22055
Self-heal Daemon on rhs-client14			N/A	Y	22062
 
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks
 
root@rhs-client11 [Jul-23-2014- 9:22:52] >kill -KILL 9415

root@rhs-client11 [Jul-23-2014- 9:22:58] >rm -rf /rhs/device0/b1/*

root@rhs-client11 [Jul-23-2014- 9:23:02] >service glusterd restart
Starting glusterd:                                         [  OK  ]

root@rhs-client11 [Jul-23-2014- 9:23:20] >ps -ef | grep nfs
root      9423     1  0 09:04 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket
root      9875  8394  0 09:23 pts/0    00:00:00 grep nfs

root@rhs-client11 [Jul-23-2014- 9:23:24] >ps -ef | grep nfs
root      9885  8394  0 09:23 pts/0    00:00:00 grep nfs

root@rhs-client11 [Jul-23-2014- 9:23:25] >ps -ef | grep nfs
root      9888     1  0 09:23 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket

root      9889  9888  0 09:23 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket
root      9907  8394  0 09:23 pts/0    00:00:00 grep nfs

root@rhs-client11 [Jul-23-2014- 9:23:28] >ps -ef | grep nfs
root      9889     1  2 09:23 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/7030a213bf7d85840ebe1479666c12b6.socket
root      9918  8394  0 09:23 pts/0    00:00:00 grep nfs

root@rhs-client11 [Jul-23-2014- 9:23:30] >

Restart of self-heal-daemon process:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
root@rhs-client11 [Jul-23-2014- 9:02:33] >gluster v status vol1
Status of volume: vol1
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/device0/b1			49154	Y	8853
Brick rhs-client12:/rhs/device0/b2			49152	Y	14924
Brick rhs-client13:/rhs/device0/b3			49152	Y	22192
Brick rhs-client14:/rhs/device0/b4			49152	Y	28035
NFS Server on localhost					2049	Y	9121
Self-heal Daemon on localhost				N/A	Y	9128
NFS Server on 10.70.34.92				2049	Y	20426
Self-heal Daemon on 10.70.34.92				N/A	Y	20433
NFS Server on rhs-client13				2049	Y	17965
Self-heal Daemon on rhs-client13			N/A	Y	17972
NFS Server on rhs-client14				2049	Y	22055
Self-heal Daemon on rhs-client14			N/A	Y	22062
NFS Server on rhs-client12				2049	Y	10283
Self-heal Daemon on rhs-client12			N/A	Y	10290
 
Task Status of Volume vol1
------------------------------------------------------------------------------
There are no active volume tasks
 

root@rhs-client11 [Jul-23-2014- 9:03:41] >
root@rhs-client11 [Jul-23-2014- 9:03:42] >ps -ef | grep glustershd
root      9128     1  0 09:00 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db
root      9236  8394  0 09:03 pts/0    00:00:00 grep glustershd

root@rhs-client11 [Jul-23-2014- 9:03:48] >kill -KILL 8853

root@rhs-client11 [Jul-23-2014- 9:04:02] >rm -rf /rhs/device0/b1/*

root@rhs-client11 [Jul-23-2014- 9:04:13] >ps -ef | grep glustershd
root      9128     1  0 09:00 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db
root      9256  8394  0 09:04 pts/0    00:00:00 grep glustershd

root@rhs-client11 [Jul-23-2014- 9:04:17] >
root@rhs-client11 [Jul-23-2014- 9:04:19] >
root@rhs-client11 [Jul-23-2014- 9:04:20] >service glusterd restart
Starting glusterd:                                         [  OK  ]
root@rhs-client11 [Jul-23-2014- 9:04:30] >

root@rhs-client11 [Jul-23-2014- 9:04:31] >ps -ef | grep glustershd
root      9128     1  0 09:00 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db
root      9410  8394  0 09:04 pts/0    00:00:00 grep glustershd

root@rhs-client11 [Jul-23-2014- 9:04:33] >
root@rhs-client11 [Jul-23-2014- 9:04:35] >
root@rhs-client11 [Jul-23-2014- 9:04:36] >ps -ef | grep glustershd
root      9429     1  0 09:04 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db

root      9430  9429  1 09:04 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db
root      9441  8394  0 09:04 pts/0    00:00:00 grep glustershd

root@rhs-client11 [Jul-23-2014- 9:04:38] >ps -ef | grep glustershd
root      9430     1  1 09:04 ?        00:00:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/7297d333b864178bfc8928f6a963a939.socket --xlator-option *replicate*.node-uuid=809e3daf-d43b-49e9-91e4-6eadff2875db
root      9453  8394  0 09:04 pts/0    00:00:00 grep glustershd
root@rhs-client11 [Jul-23-2014- 9:04:42] >

Expected results:
=====================
restarting glusterd should only restart the offline processes, not the online processes. 

Additional info:
====================

root@rhs-client11 [Jul-23-2014- 9:33:26] >gluster v info
 
Volume Name: vol1
Type: Distributed-Replicate
Volume ID: 5cc2e193-af63-45b5-834f-9bd757cf4e84
Status: Started
Snap Volume: no
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/device0/b1
Brick2: rhs-client12:/rhs/device0/b2
Brick3: rhs-client13:/rhs/device0/b3
Brick4: rhs-client14:/rhs/device0/b4
Options Reconfigured:
performance.readdir-ahead: on
performance.write-behind: on
auto-delete: disable
snap-max-soft-limit: 90
snap-max-hard-limit: 256
root@rhs-client11 [Jul-23-2014- 9:33:29] >

Comment 5 Shalaka 2014-07-25 09:29:54 UTC
Please review and sign-off edited doc text.

Comment 6 SATHEESARAN 2014-07-25 10:03:30 UTC
(In reply to Shalaka from comment #5)
> Please review and sign-off edited doc text.

Looks good.

Comment 8 Atin Mukherjee 2016-06-23 04:45:08 UTC
We'll not fix this issue as this doesn't impact any functionality apart from restarting the daemons.


Note You need to log in before you can comment on or make changes to this bug.