Bug 1204044 - [geo-rep] stop-all-gluster-processes.sh fails to stop all gluster processes
Summary: [geo-rep] stop-all-gluster-processes.sh fails to stop all gluster processes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: RHGS 3.1.0
Assignee: Kotresh HR
QA Contact: Rahul Hinduja
URL:
Whiteboard:
: 1014850 (view as bug list)
Depends On: 1225331
Blocks: 1152992 1184846 1191838 1202842 1223636
TreeView+ depends on / blocked
 
Reported: 2015-03-20 09:06 UTC by Sweta Anandpara
Modified: 2019-03-22 07:41 UTC (History)
13 users (show)

Fixed In Version: glusterfs-3.7.1-1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1204641 (view as bug list)
Environment:
Last Closed: 2015-07-29 04:39:47 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2015:1495 0 normal SHIPPED_LIVE Important: Red Hat Gluster Storage 3.1 update 2015-07-29 08:26:26 UTC

Description Sweta Anandpara 2015-03-20 09:06:19 UTC
Description of problem:

'service glusterd stop' stops only the management daemon and not all the gluster processes - which is how it was decided (as per bug 1152992 and the related doc bug 1184846). A separate script 'stop-all-gluster-processes.sh' was created to achieve the intent of stopping all gluster processes. That fails to work when there is more than one gsync process running.


Version-Release number of selected component (if applicable):

glusterfs-3.6.0.52-1

How reproducible:  Always


Steps to Reproduce:

1. Have geo-rep session(s) created between master and slave (any configuration)
2. After successfully establishing and starting session(s), do a "ps aux | grep gluster | grep gsync | awk {'print $2'} " to make sure that there is more than one gsync process running (or any related process for that matter)
3. Execute the script "/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh" and that should go through without any errors.
4. Run the 'ps' command again (mentioned in step 2) to verify that the gluster processes have stopped. 


Actual results:

The script fails when there is more than one process running. It errors out saying 'too many arguments'


Expected results:

The script should work irrespective of number of processes active at that time. 


Additional info:

Having the 'kill' command in a loop, rather than a stand-alone one would help in fixing this.

When the steps mentioned in the script are run manually (and individually) on each pid, the process (associated with that pid) does get killed.


[root@dhcp43-154 ~]# service glusterd status
glusterd (pid  28125) is running...
[root@dhcp43-154 ~]# service glusterd stop
[root@dhcp43-154 ~]# service glusterd status               [  OK  ]
glusterd is stopped
[root@dhcp43-154 ~]# 
[root@dhcp43-154 ~]# ps aux | grep gsync 
root     28764  0.0  0.0 103252   840 pts/0    S+   17:52   0:00 grep gsync
root     30471  0.0  0.1 801384 13376 ?        Ssl  Mar17   1:57 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/rhs/brick1/d1 --path=/rhs/brick2/d1  -c /var/lib/glusterd/geo-replication/master_dhcp42-130_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=508fa5df-0fb9-4bad-9878-7e05b25d4fd9 geoacc@dhcp42-130::slave -N -p  --slave-id e896eed0-bddd-448b-bd58-dc2b34962aea --local-path /rhs/brick2/d1 --agent --rpc-fd 7,11,10,9
root     30472  0.2  0.2 1497500 17716 ?       Sl   Mar17  10:18 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/rhs/brick1/d1 --path=/rhs/brick2/d1  -c /var/lib/glusterd/geo-replication/master_dhcp42-130_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=508fa5df-0fb9-4bad-9878-7e05b25d4fd9 geoacc@dhcp42-130::slave -N -p  --slave-id e896eed0-bddd-448b-bd58-dc2b34962aea --feedback-fd 13 --local-path /rhs/brick2/d1 --local-id .%2Frhs%2Fbrick2%2Fd1 --rpc-fd 10,9,7,11 --resource-remote ssh://geoacc@dhcp42-130:gluster://localhost:slave
root     30482  0.0  0.0  62508  4344 ?        S    Mar17   0:02 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-JdGxkw/f7cab8010385926c094e1701b0926f87.sock geoacc@dhcp42-130 /nonexistent/gsyncd --session-owner fcf732d1-81d6-42d1-8915-cc2107fd72f2 -N --listen --timeout 120 gluster://localhost:slave
root     30492  0.4  0.7 567480 57520 ?        Ssl  Mar17  17:34 /usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Fgeoacc%4010.70.42.130%3Agluster%3A%2F%2F127.0.0.1%3Aslave.%2Frhs%2Fbrick2%2Fd1.gluster.log --volfile-server=localhost --volfile-id=master --client-pid=-1 /tmp/gsyncd-aux-mount-bHyf9N
[root@dhcp43-154 ~]# ps aux | grep gsync | wc -l
5
[root@dhcp43-154 ~]# sh /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh 
sending SIGTERM to process 23366
/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh: line 9: kill: (23366) - No such process
/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh: line 16: test: too many arguments
sending SIGKILL to process 23366
/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh: line 25: kill: (23366) - No such process
/usr/share/glusterfs/scripts/stop-all-gluster-processes.sh: line 30: test: too many arguments
[root@dhcp43-154 ~]# 
[root@dhcp43-154 ~]# ps aux | grep gsync | wc -l
5
[root@dhcp43-154 ~]#
[root@dhcp43-154 ~]# ps aux | grep gsync 
root     28796  0.0  0.0 103252   844 pts/0    S+   17:52   0:00 grep gsync
root     30471  0.0  0.1 801384 15400 ?        Ssl  Mar17   1:57 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/rhs/brick1/d1 --path=/rhs/brick2/d1  -c /var/lib/glusterd/geo-replication/master_dhcp42-130_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=508fa5df-0fb9-4bad-9878-7e05b25d4fd9 geoacc@dhcp42-130::slave -N -p  --slave-id e896eed0-bddd-448b-bd58-dc2b34962aea --local-path /rhs/brick2/d1 --agent --rpc-fd 7,11,10,9
root     30472  0.2  0.2 1497500 17716 ?       Sl   Mar17  10:18 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/rhs/brick1/d1 --path=/rhs/brick2/d1  -c /var/lib/glusterd/geo-replication/master_dhcp42-130_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=508fa5df-0fb9-4bad-9878-7e05b25d4fd9 geoacc@dhcp42-130::slave -N -p  --slave-id e896eed0-bddd-448b-bd58-dc2b34962aea --feedback-fd 13 --local-path /rhs/brick2/d1 --local-id .%2Frhs%2Fbrick2%2Fd1 --rpc-fd 10,9,7,11 --resource-remote ssh://geoacc@dhcp42-130:gluster://localhost:slave
root     30482  0.0  0.0  62508  4344 ?        S    Mar17   0:02 ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-JdGxkw/f7cab8010385926c094e1701b0926f87.sock geoacc@dhcp42-130 /nonexistent/gsyncd --session-owner fcf732d1-81d6-42d1-8915-cc2107fd72f2 -N --listen --timeout 120 gluster://localhost:slave
root     30492  0.4  0.7 567480 57520 ?        Ssl  Mar17  17:34 /usr/sbin/glusterfs --aux-gfid-mount --log-file=/var/log/glusterfs/geo-replication/master/ssh%3A%2F%2Fgeoacc%4010.70.42.130%3Agluster%3A%2F%2F127.0.0.1%3Aslave.%2Frhs%2Fbrick2%2Fd1.gluster.log --volfile-server=localhost --volfile-id=master --client-pid=-1 /tmp/gsyncd-aux-mount-bHyf9N
[root@dhcp43-154 ~]# vi /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh
[root@dhcp43-154 ~]# 


###########
Running the steps from the script, manually, on an individual pid
###########


[root@dhcp43-154 ~]# cat /var/lib/glusterd/geo-replication/master_dhcp42-130_slave/ssh%3A%2F%2Fgeoacc%4010.70.42.130%3Agluster%3A%2F%2F127.0.0.1%3Aslave.pid
23366
[root@dhcp43-154 ~]# 
[root@dhcp43-154 ~]# ps aux | grep gluster | grep gsync | awk '{print $2}'
30471
30472
30482
30492
[root@dhcp43-154 ~]# 
[root@dhcp43-154 ~]# test -n 30471
[root@dhcp43-154 ~]# kill -KILL 30471
[root@dhcp43-154 ~]# ps aux | grep gluster | grep gsync | awk {'print $2'}
30472
30482
30492
[root@dhcp43-154 ~]#

Comment 1 Sweta Anandpara 2015-03-20 09:21:18 UTC
Severity could be changed depending on the number of customers who are using this script.

Choosing to take/not-take this bug in 3.0.4 will affect the outcome of related-doc-bug 1184846

Comment 4 Pavithra 2015-03-23 07:04:41 UTC
Hi Sweta,

I see this being added as a known issue for the 3.0.4 release. Could you please lead me to someone who can provide the info or can you please fill out the doc text field?

Comment 7 Sweta Anandpara 2015-03-23 08:24:23 UTC
Please get in touch with Aravinda or Kotresh for the updated content that has to go in the doc.

Comment 8 Kotresh HR 2015-03-23 09:47:35 UTC
Patch Sent Upstream:

http://review.gluster.org/#/c/9970/

Comment 9 Kotresh HR 2015-03-23 09:59:00 UTC
RCA: test -n takes single string as argument. In the script, if multiple
     pids are got, it fails with "Too many arguments".

Documentation changes required irrespective of whether the patch is taken in for 3.04 or not:
   
Present Documentation:
----------
     In section: 4.1. Starting and Stopping the glusterd service

       Run the following command to stop glusterd manually.

       # service glusterd stop

       Note
        Stopping the glusterd service does not stop all the glusterprocesses    
        such as glusterfsd, gsyncd etc. Execute the script stop-all-gluster-
        processes.sh available at the /usr/share/glusterfs/scripts directory to 
        stop all the gluster processes. 
-------------------

Instead of directly killing geo-rep processes through script, we should say
use geo-rep stop command to stop geo-replication if geo-rep is established that inturn kills all geo-rep processess and update status properly.
Later use the "stop-all-gluster-processes.sh" script to kill all other gluster processess.

Comment 10 Pavithra 2015-03-23 10:11:13 UTC
Kotresh,

I see this bug added in the known issues tracker bug for the 3.0.4 release. Could you please change the doc type to known issue and fill out the doc text field?

Comment 11 Pavithra 2015-03-23 16:57:52 UTC
Hi Kotresh,

Please review the edited doc text for technical accuracy and sign off.

Comment 13 Kotresh HR 2015-04-07 04:10:46 UTC
The doc text looks fine.

Comment 14 Aravinda VK 2015-04-21 10:21:53 UTC
*** Bug 1014850 has been marked as a duplicate of this bug. ***

Comment 18 Rahul Hinduja 2015-05-27 06:37:23 UTC
stop-all-gluster-processes.sh is not packaged with the glusterfs-3.7.0-2 version. Moving this bug to assigned state.


[root@georep1 ~]# ls /usr/share/glusterfs/scripts/
generate-gfid-file.sh  get-gfid.sh  gsync-sync-gfid  gsync-upgrade.sh  post-upgrade-script-for-quota.sh  pre-upgrade-script-for-quota.sh  slave-upgrade.sh
[root@georep1 ~]# rpm -qa | grep gluster | xargs rpm -ql | grep stop-all-gluster-processes.sh*
[root@georep1 ~]#

Comment 20 Rahul Hinduja 2015-07-04 10:52:07 UTC
Verified with build: glusterfs-3.7.1-7.el6rhs.x86_64

[root@georep1 ~]#  ps aux | grep gsync  | awk  {'print $2'}
16277
16327
16328
16344
16363
16449
16450
16463
16469
18137
18138
18159
[root@georep1 ~]# bash /usr/share/glusterfs/scripts/stop-all-gluster-processes.sh 
sending SIGTERM to process 16277
sending SIGTERM to process 16563
sending SIGTERM to process 16618
sending SIGTERM to process 15420
sending SIGTERM to process 15402
sending SIGTERM to process 16610
sending SIGTERM to process 15656
sending SIGTERM to process 16092
sending SIGKILL to process 16277
[root@georep1 ~]# 
[root@georep1 ~]#  ps aux | grep gsync 
root     18207  0.0  0.0 103308   852 pts/0    S+   16:20   0:00 grep gsync
[root@georep1 ~]# 

Moving the bug to verified

Comment 22 errata-xmlrpc 2015-07-29 04:39:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html


Note You need to log in before you can comment on or make changes to this bug.