1784375 – 'gluster volume set <VOLUME> disable.nfs' accidentally killed unexpected process, and forced a data brick offline.

Bug 1784375 - 'gluster volume set <VOLUME> disable.nfs' accidentally killed unexpected process, and forced a data brick offline.

Summary: 'gluster volume set <VOLUME> disable.nfs' accidentally killed unexpected proc...

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Sanju
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1784211 1849533
Blocks:
TreeView+	depends on / blocked

Reported:	2019-12-17 10:47 UTC by Sanju
Modified:	2020-06-22 07:06 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Clone Of:	1784211
Environment:
Last Closed:	2019-12-18 03:47:28 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Gluster.org Gerrit	23890	0	None	Merged	glusterd: unlink the file after killing the process	2019-12-18 03:47:27 UTC

Description Sanju 2019-12-17 10:47:26 UTC

Description of problem:
////////////////////////////////////////////////////////////////////////////

- 'gluster volume set vol01 disable.nfs' accidentally killed unexpected process, and forced a data brick offline.
- Command was executed when no nfs daemon spawned by the glusterd was running.
- Killed pid 22821was reused from the nfs daemon previously running.

- 4 nodes configuration
- Command forced a problem ran on node kiotloglgst04

[root@kiotloglgst04 hpework]# gluster volume set vol01 nfs.disable off
volume set: success

[ log: /cases/02502431/40-sosreport-kiotloglgst04-2019-10-28-iolxwep.tar.xz/sosreport-kiotloglgst04-2019-10-28-iolxwep/var/log/glusterfs/cmd_history.log-20191025 ]
[2019-10-24 05:31:04.248179]  : volume set vol01 nfs.disable off : SUCCESS 

- Command accidentally killed a process pid:22821 which is not an nfs process spawned by glusterd but another unknown process.

[ log: /cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/var/log/glusterfs/glusterd.log-20191025 ]
[2019-10-24 05:31:02.211907] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping nfs daemon running in pid: 22821    <----- killed
[2019-10-24 05:31:03.212479] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped
..... snip .....
[2019-10-24 05:31:03.217608] I [MSGID: 106567] [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting nfs service
[2019-10-24 05:31:04.241212] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already stopped

- The data brick kiotloglgst03:/bricks/brick01/1 forced offline as a result of command.

[ log: /cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/var/log/glusterfs/bricks/bricks-brick01-1.log-20191025 ]
[2019-10-24 05:25:42.670252] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-vol01-server: Shutting down connection kiotloglgst03-14626-2019/07/05-02:27:23:117770-vol01-client-4-0-0
[2019-10-24 05:29:06.609832] I [addr.c:55:compare_addr_and_update] 0-/bricks/brick01/1: allowed = "*", received addr = "100.65.182.132"
[2019-10-24 05:29:06.609889] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 42126d88-8cf4-4c37-83a9-5d0bf62e1d2e
[2019-10-24 05:29:06.609926] I [MSGID: 115029] [server-handshake.c:564:server_setvolume] 0-vol01-server: accepted client from kiotloglgst04-8482-2019/10/24-05:29:06:571393-vol01-client-4-0-0 (version: 3.12.2) with subvol /bricks/brick01/1
[2019-10-24 05:29:06.675247] I [MSGID: 115036] [server.c:571:server_rpc_notify] 0-vol01-server: disconnecting connection from kiotloglgst04-8482-2019/10/24-05:29:06:571393-vol01-client-4-0-0
[2019-10-24 05:29:06.675559] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-vol01-server: Shutting down connection kiotloglgst04-8482-2019/10/24-05:29:06:571393-vol01-client-4-0-0
[2019-10-24 05:31:02.195748] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec] 0-mgmt: Volume file changed
..... Customer restarted brick .....
[2019-10-24 07:50:25.949355] I [MSGID: 100030] [glusterfsd.c:2504:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.12.2 (args: /usr/sbin/glusterfsd -s kiotloglgst03 --volfile-id vol01.kiotloglgst03.bricks-brick01-1 -p /var/run/gluster/vols/vol01/kiotloglgst03-bricks-brick01-1.pid -S /var/run/gluster/f0a387bbc58a292e.socket --brick-name /bricks/brick01/1 -l /var/log/glusterfs/bricks/bricks-brick01-1.log --xlator-option *-posix.glusterd-uuid=0621266b-41ea-401e-bfcb-839d2b16ca82 --brick-port 49153 --xlator-option vol01-server.listen-port=49153)
[2019-10-24 07:50:25.955180] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2019-10-24 07:50:25.958438] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1

- Where pid 22821 from?

[log: /cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/var/log/glusterfs/glusterd.log-20190717]
..... snip .....
[2019-07-16 07:03:46.632909] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping nfs daemon running in pid: 22821    <----- pid
[2019-07-16 07:03:47.633263] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped

[ log: /cases/02502431/40-sosreport-kiotloglgst04-2019-10-28-iolxwep.tar.xz/sosreport-kiotloglgst04-2019-10-28-iolxwep/var/log/glusterfs/cmd_history.log-20190717.gz]
.... snip .....
[2019-07-16 06:49:38.747879]  : volume set vol01 nfs.disable off : SUCCESS
[2019-07-16 07:03:47.652222]  : volume set vol01 nfs.disable on : SUCCESS

- Node kiotloglgst03 booted on Fri Jul  5, apparently pid 22821 is reused by another process.
[log: /cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/last]
..... snip .....
reboot   system boot  3.10.0-957.12.2. Fri Jul  5 11:27 - 16:48 (111+05:21)

- Customer confirmed that there was no nfs daemon running before command. They have output from those commands as below.

[ log: /cases/02502431/40-sosreport-kiotloglgst04-2019-10-28-iolxwep.tar.xz/sosreport-kiotloglgst04-2019-10-28-iolxwep/var/log/glusterfs/cmd_history.log-20191025 ]

[2019-10-24 05:28:55.097546]  : volume status vol01 : SUCCESS
[2019-10-24 05:30:45.862941]  : volume status vol01 : SUCCESS                           <----- (*1)
[2019-10-24 05:31:04.248179]  : volume set vol01 nfs.disable off : SUCCESS       <----- (*2)
[2019-10-24 05:31:08.796906]  : volume status vol01 : SUCCESS                           <----- (*3)

  (*1)
[root@kiotloglgst04 hpework]# gluster volume status vol01
Status of volume: vol01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick kiotloglgst01:/bricks/brick01/1       49152     0          Y       30193
Brick kiotloglgst02:/bricks/brick01/1       49152     0          Y       31254
Brick kiotloglgst03:/bricks/brick03/1       49152     0          Y       14926
Brick kiotloglgst02:/bricks/brick02/1       49153     0          Y       31263
Brick kiotloglgst03:/bricks/brick01/1       49153     0          Y       14913      <----- Brick will be killed 
Brick kiotloglgst04:/bricks/brick03/1       49152     0          Y       14941
Brick kiotloglgst03:/bricks/brick02/1       49154     0          Y       14912
Brick kiotloglgst04:/bricks/brick01/1       49153     0          Y       14935
Brick kiotloglgst01:/bricks/brick03/1       49153     0          Y       30213
Brick kiotloglgst04:/bricks/brick02/1       49154     0          Y       14942
Brick kiotloglgst01:/bricks/brick02/1       49154     0          Y       30233
Brick kiotloglgst02:/bricks/brick03/1       49154     0          Y       31269
Self-heal Daemon on localhost               N/A       N/A        Y       14586
Quota Daemon on localhost                   N/A       N/A        Y       14886
Self-heal Daemon on kiotloglgst02           N/A       N/A        Y       31234
Quota Daemon on kiotloglgst02               N/A       N/A        Y       31244
Self-heal Daemon on kiotloglgst01           N/A       N/A        Y       30255
Quota Daemon on kiotloglgst01               N/A       N/A        Y       30265
Self-heal Daemon on kiotloglgst03           N/A       N/A        Y       14611
Quota Daemon on kiotloglgst03               N/A       N/A        Y       14874
 
Task Status of Volume vol01
------------------------------------------------------------------------------
There are no active volume tasks

(*2)
[root@kiotloglgst04 hpework]# gluster volume set vol01 nfs.disable off
volume set: success

(*3)
[root@kiotloglgst04 hpework]# gluster volume set vol01 nfs.disable offtatus vol01
Status of volume: vol01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick kiotloglgst01:/bricks/brick01/1       49152     0          Y       30193
Brick kiotloglgst02:/bricks/brick01/1       49152     0          Y       31254
Brick kiotloglgst03:/bricks/brick03/1       49152     0          Y       14926
Brick kiotloglgst02:/bricks/brick02/1       49153     0          Y       31263
Brick kiotloglgst03:/bricks/brick01/1       N/A       N/A        N       N/A        <----- Brick killed
Brick kiotloglgst04:/bricks/brick03/1       49152     0          Y       14941
Brick kiotloglgst03:/bricks/brick02/1       49154     0          Y       14912
Brick kiotloglgst04:/bricks/brick01/1       49153     0          Y       14935
Brick kiotloglgst01:/bricks/brick03/1       49153     0          Y       30213
Brick kiotloglgst04:/bricks/brick02/1       49154     0          Y       14942
Brick kiotloglgst01:/bricks/brick02/1       49154     0          Y       30233
Brick kiotloglgst02:/bricks/brick03/1       49154     0          Y       31269
NFS Server on localhost                     2049      0          Y       8716 
Self-heal Daemon on localhost               N/A       N/A        Y       14586
Quota Daemon on localhost                   N/A       N/A        Y       14886
NFS Server on kiotloglgst01                 2049      0          Y       348  
Self-heal Daemon on kiotloglgst01           N/A       N/A        Y       30255
Quota Daemon on kiotloglgst01               N/A       N/A        Y       30265
NFS Server on kiotloglgst03                 2049      0          Y       17671
Self-heal Daemon on kiotloglgst03           N/A       N/A        Y       14611
Quota Daemon on kiotloglgst03               N/A       N/A        Y       14874
NFS Server on kiotloglgst02                 2049      0          Y       19331
Self-heal Daemon on kiotloglgst02           N/A       N/A        Y       31234
Quota Daemon on kiotloglgst02               N/A       N/A        Y       31244
 
Task Status of Volume vol01
------------------------------------------------------------------------------
There are no active volume tasks

==============================================================================
gluster volume info
==============================================================================
Volume Name: vol01
Type: Distributed-Replicate
Volume ID: be62535e-9fbd-4bd8-b1cd-90eca0b25ed0
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: kiotloglgst01:/bricks/brick01/1
Brick2: kiotloglgst02:/bricks/brick01/1
Brick3: kiotloglgst03:/bricks/brick03/1 (arbiter)
Brick4: kiotloglgst02:/bricks/brick02/1
Brick5: kiotloglgst03:/bricks/brick01/1
Brick6: kiotloglgst04:/bricks/brick03/1 (arbiter)
Brick7: kiotloglgst03:/bricks/brick02/1
Brick8: kiotloglgst04:/bricks/brick01/1
Brick9: kiotloglgst01:/bricks/brick03/1 (arbiter)
Brick10: kiotloglgst04:/bricks/brick02/1
Brick11: kiotloglgst01:/bricks/brick02/1
Brick12: kiotloglgst02:/bricks/brick03/1 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
features.shard: off
features.shard-block-size: 16MB
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
features.default-soft-limit: 95
network.ping-timeout: 10

Version-Release number of selected component (if applicable):
////////////////////////////////////////////////////////////////////////////

==============================================================================
grep "gluster" installed-rpms
==============================================================================
glusterfs-3.12.2-25.el7rhgs.x86_64                          Tue Jun 11 10:58:20 2019
glusterfs-api-3.12.2-25.el7rhgs.x86_64                      Tue Jun 11 10:58:21 2019
glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64           Tue Jun 11 10:57:35 2019
glusterfs-cli-3.12.2-25.el7rhgs.x86_64                      Tue Jun 11 10:57:35 2019
glusterfs-events-3.12.2-25.el7rhgs.x86_64                   Tue Jun 11 10:58:54 2019
glusterfs-fuse-3.12.2-25.el7rhgs.x86_64                     Tue Jun 11 10:58:23 2019
glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64          Tue Jun 11 10:58:54 2019
glusterfs-libs-3.12.2-25.el7rhgs.x86_64                     Tue Jun 11 10:57:31 2019
glusterfs-rdma-3.12.2-25.el7rhgs.x86_64                     Tue Jun 11 10:58:36 2019
glusterfs-server-3.12.2-25.el7rhgs.x86_64                   Tue Jun 11 10:58:53 2019
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64               Tue Jun 11 10:58:43 2019
gluster-nagios-common-0.2.4-1.el7rhgs.noarch                Tue Jun 11 10:57:46 2019
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.10.x86_64 Tue Jun 11 16:04:57 2019
python2-gluster-3.12.2-25.el7rhgs.x86_64                    Tue Jun 11 10:57:36 2019
tendrl-gluster-integration-1.6.3-13.el7rhgs.noarch          Tue Jun 11 16:04:59 2019
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch                     Tue Jun 11 10:59:13 2019

==============================================================================
cat /etc/redhat-storage-release
==============================================================================
Red Hat Gluster Storage Server 3.4

How reproducible:

- Always reproduced if nfs deamon's pid is reused by another process.


Problem easily reproduced:
////////////////////////////////////////////////////////////////////////////

- Create 2 +1 arbiter vol
- Make sure, no nfs daemon running

[root@rhel-a glusterfs]# gluster vol info avol

Volume Name: avol
Type: Replicate
Volume ID: 1e1e1870-b21d-4b0c-9cd0-b95d67725e1b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: rhel-a:/vol1/brick/b1
Brick2: rhel-b:/vol1/brick/b1
Brick3: rhel-c:/vol1/brick/b1 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on                             <----- No nfs 
transport.address-family: inet

///// No nfs daemon shown in status below.
[root@rhel-a glusterfs]# gluster vol status avol
Status of volume: avol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhel-a:/vol1/brick/b1                 49152     0          Y       4386
Brick rhel-b:/vol1/brick/b1                 49152     0          Y       4109
Brick rhel-c:/vol1/brick/b1                 49152     0          Y       4140
Self-heal Daemon on localhost               N/A       N/A        Y       4285
Self-heal Daemon on rhel-c                  N/A       N/A        Y       4081
Self-heal Daemon on rhel-b                  N/A       N/A        Y       4058

Task Status of Volume avol
------------------------------------------------------------------------------
There are no active volume tasks

- Start dummy process

[root@rhel-a nfs]# strace sleep 100000; date

- Find sleep pid and overwrite /var/run/gluster/nfs/nfs.pid file with sleep's pid.

[root@rhel-a glusterfs]# ps aux | grep sleep
root      8545  0.0  0.0 107952   360 ?        S    13:08   0:00 sleep 60
root      8586  0.0  0.0   7188   816 pts/1    S+   13:09   0:00 strace sleep 100000
root      8588  0.0  0.0 107952   584 pts/1    S+   13:09   0:00 sleep 100000          <------ Dummy pid 8588
root      8590  0.0  0.0 112680   696 pts/0    R+   13:09   0:00 grep --color=auto sleep

[root@rhel-a glusterfs]# ls -l /var/run/gluster/nfs/nfs.pid
-rw-r--r--. 1 root root 5 Dec 12 13:07 /var/run/gluster/nfs/nfs.pid

[root@rhel-a glusterfs]# echo 8588 > /var/run/gluster/nfs/nfs.pid

- Ensure no nfs daemon running

[root@rhel-a glusterfs]# ps aux | grep glusterfs
root      4285  0.0  0.3 677720  7296 ?        Ssl  10:55   0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/827bf70aee5e557b.socket --log-level=INFO --xlator-option *replicate*.node-uuid=080da7ab-1eec-4d0c-8551-b4e7226e9f55
root      4386  0.0  1.0 929228 19580 ?        Ssl  10:55   0:00 /usr/sbin/glusterfsd -s rhel-a --volfile-id avol.rhel-a.vol1-brick-b1 -p /var/run/gluster/vols/avol/rhel-a-vol1-brick-b1.pid -S /var/run/gluster/a12b523ebfcadaec.socket --brick-name /vol1/brick/b1 -l /var/log/glusterfs/bricks/vol1-brick-b1.log --xlator-option *-posix.glusterd-uuid=080da7ab-1eec-4d0c-8551-b4e7226e9f55 --brick-port 49152 --xlator-option avol-server.listen-port=49152
root      8619  0.0  0.0 112684   956 pts/0    S+   13:10   0:00 grep --color=auto glusterfs

- Set 'nfs.disble off'

[root@rhel-a glusterfs]# gluster vol set avol nfs.disable off; date
volume set: success
Thu Dec 12 13:10:43 JST 2019

[root@rhel-a glusterfs]# gluster vol status avol
Status of volume: avol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhel-a:/vol1/brick/b1                 49152     0          Y       4386
Brick rhel-b:/vol1/brick/b1                 49152     0          Y       4109
Brick rhel-c:/vol1/brick/b1                 49152     0          Y       4140
NFS Server on localhost                     2049      0          Y       8644        <----- nfs daemon started
Self-heal Daemon on localhost               N/A       N/A        Y       4285
NFS Server on rhel-c                        2049      0          Y       8235
Self-heal Daemon on rhel-c                  N/A       N/A        Y       4081
NFS Server on rhel-b                        2049      0          Y       8181
Self-heal Daemon on rhel-b                  N/A       N/A        Y       4058

Task Status of Volume avol
------------------------------------------------------------------------------
There are no active volume tasks

- Dummy sleep killed
.... snip .....
fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9e8162000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2502
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0x7ff9e8162000, 4096)            = 0
open("/usr/lib/locale/En/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
nanosleep({100000, 0},  <unfinished ...>
+++ killed by SIGKILL +++
強制終了
Thu Dec 12 13:10:39 JST 2019
[root@rhel-a nfs]#

- /var/log/glusterfs/glusterd.log shows sleep was killed.

[2019-12-12 04:10:39.809093] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping nfs daemon running in pid: 8588  <----- Dummy sleep killed
[2019-12-12 04:10:40.810417] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-12-12 04:10:40.811184] I [MSGID: 106540] [glusterd-utils.c:5115:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV3 successfully
[2019-12-12 04:10:40.811327] I [MSGID: 106540] [glusterd-utils.c:5124:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV1 successfully
[2019-12-12 04:10:40.811441] I [MSGID: 106540] [glusterd-utils.c:5133:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NFSV3 successfully
[2019-12-12 04:10:40.811588] I [MSGID: 106540] [glusterd-utils.c:5142:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v4 successfully
[2019-12-12 04:10:40.811701] I [MSGID: 106540] [glusterd-utils.c:5151:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v1 successfully
[2019-12-12 04:10:40.811810] I [MSGID: 106540] [glusterd-utils.c:5160:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered ACL v3 successfully
[2019-12-12 04:10:40.813519] I [MSGID: 106567] [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting nfs service


Code path for starting nfs daemon
////////////////////////////////////////////////////////////////////////////

glusterd_nfssvc_manager() {
    ....
   ret = svc->stop (svc, SIGKILL);     <*** calls glusterd_nfssvc_stop()
    ....
    ret = svc->start (svc, flags);
    ....
}

Whenever nfs daemon is started, glusterd kills existing nfs daemon by calling glusterd_nfssvc_stop().
In glusterd_nfssvc_stop() code path,

glusterd_nfssvc_stop() {
    glusterd_proc_is_running() {
    ....    
    glusterd_svc_stop (svc, sig) {
        glusterd_proc_stop() {
            gf_is_service_running() {
                fopen (pidfile, "r+");                             <**1** "/var/run/gluster/nfs/nfs.pid" exists?
                fscanf (file, "%d", pid);                         <**2** Read PID in
                gf_is_pid_running(*pid) {
                    snprintf(fname, sizeof(fname), "/proc/%d/cmdline", pid);
                    if (sys_access (fname , R_OK) != 0) {                                       <**3**  "/proc/PID/cmdline" accessible?
                        ....
                    }
                     ....
                }
                ....
            }
            /*** If conditions above met, kills pid below. Conditions are enclosed with **X**

            ret = kill (pid, sig);
        ....
    ....
}
///// Apparently we have a bug. Running pid check should be done more accurate or remove "/var/run/gluster/nfs/nfs.pid" file when nfs daemon gets killed by "disable.nfs on".

--- Additional comment from Sanju on 2019-12-17 09:05:21 IST ---

Thanks for the detailed bug report. I will read the code and send out a fix if there is a bug.

Comment 1 Worker Ant 2019-12-17 10:53:29 UTC

REVIEW: https://review.gluster.org/23890 (glusterd: unlink the file after killing the process) posted (#1) for review on master by Sanju Rakonde

Comment 2 Worker Ant 2019-12-18 03:47:28 UTC

REVIEW: https://review.gluster.org/23890 (glusterd: unlink the file after killing the process) merged (#2) on master by Amar Tumballi

Note You need to log in before you can comment on or make changes to this bug.