1784211 – [GSS] - 'gluster volume set <VOLUME> disable.nfs' accidentally killed unexpected process, and forced a data brick offline.

Bug 1784211 - [GSS] - 'gluster volume set <VOLUME> disable.nfs' accidentally killed unexpected process, and forced a data brick offline.

Summary: [GSS] - 'gluster volume set <VOLUME> disable.nfs' accidentally killed unexpec...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.4
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.5.z Batch Update 3
Assignee:	Srijan Sivakumar
QA Contact:	milind
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1784375 1849533
TreeView+	depends on / blocked

Reported:	2019-12-16 23:53 UTC by Kenichiro Kagoshima
Modified:	2023-03-24 16:28 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-6.0-38
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1784375 1849533 (view as bug list)
Environment:
Last Closed:	2020-12-17 04:50:48 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:5603	0	None	None	None	2020-12-17 04:51:35 UTC

Description Kenichiro Kagoshima 2019-12-16 23:53:07 UTC

Description of problem:
////////////////////////////////////////////////////////////////////////////

- 'gluster volume set vol01 disable.nfs' accidentally killed unexpected process, and forced a data brick offline.
- Command was executed when no nfs daemon spawned by the glusterd was running.
- Killed pid 22821was reused from the nfs daemon previously running.

- 4 nodes configuration
- Command forced a problem ran on node kiotloglgst04

[root@kiotloglgst04 hpework]# gluster volume set vol01 nfs.disable off
volume set: success

[ log: /cases/02502431/40-sosreport-kiotloglgst04-2019-10-28-iolxwep.tar.xz/sosreport-kiotloglgst04-2019-10-28-iolxwep/var/log/glusterfs/cmd_history.log-20191025 ]
[2019-10-24 05:31:04.248179]  : volume set vol01 nfs.disable off : SUCCESS 

- Command accidentally killed a process pid:22821 which is not an nfs process spawned by glusterd but another unknown process.

[ log: /cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/var/log/glusterfs/glusterd.log-20191025 ]
[2019-10-24 05:31:02.211907] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping nfs daemon running in pid: 22821    <----- killed
[2019-10-24 05:31:03.212479] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped
..... snip .....
[2019-10-24 05:31:03.217608] I [MSGID: 106567] [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting nfs service
[2019-10-24 05:31:04.241212] I [MSGID: 106132] [glusterd-proc-mgmt.c:84:glusterd_proc_stop] 0-management: bitd already stopped

- The data brick kiotloglgst03:/bricks/brick01/1 forced offline as a result of command.

[ log: /cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/var/log/glusterfs/bricks/bricks-brick01-1.log-20191025 ]
[2019-10-24 05:25:42.670252] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-vol01-server: Shutting down connection kiotloglgst03-14626-2019/07/05-02:27:23:117770-vol01-client-4-0-0
[2019-10-24 05:29:06.609832] I [addr.c:55:compare_addr_and_update] 0-/bricks/brick01/1: allowed = "*", received addr = "100.65.182.132"
[2019-10-24 05:29:06.609889] I [login.c:111:gf_auth] 0-auth/login: allowed user names: 42126d88-8cf4-4c37-83a9-5d0bf62e1d2e
[2019-10-24 05:29:06.609926] I [MSGID: 115029] [server-handshake.c:564:server_setvolume] 0-vol01-server: accepted client from kiotloglgst04-8482-2019/10/24-05:29:06:571393-vol01-client-4-0-0 (version: 3.12.2) with subvol /bricks/brick01/1
[2019-10-24 05:29:06.675247] I [MSGID: 115036] [server.c:571:server_rpc_notify] 0-vol01-server: disconnecting connection from kiotloglgst04-8482-2019/10/24-05:29:06:571393-vol01-client-4-0-0
[2019-10-24 05:29:06.675559] I [MSGID: 101055] [client_t.c:443:gf_client_unref] 0-vol01-server: Shutting down connection kiotloglgst04-8482-2019/10/24-05:29:06:571393-vol01-client-4-0-0
[2019-10-24 05:31:02.195748] I [glusterfsd-mgmt.c:52:mgmt_cbk_spec] 0-mgmt: Volume file changed
..... Customer restarted brick .....
[2019-10-24 07:50:25.949355] I [MSGID: 100030] [glusterfsd.c:2504:main] 0-/usr/sbin/glusterfsd: Started running /usr/sbin/glusterfsd version 3.12.2 (args: /usr/sbin/glusterfsd -s kiotloglgst03 --volfile-id vol01.kiotloglgst03.bricks-brick01-1 -p /var/run/gluster/vols/vol01/kiotloglgst03-bricks-brick01-1.pid -S /var/run/gluster/f0a387bbc58a292e.socket --brick-name /bricks/brick01/1 -l /var/log/glusterfs/bricks/bricks-brick01-1.log --xlator-option *-posix.glusterd-uuid=0621266b-41ea-401e-bfcb-839d2b16ca82 --brick-port 49153 --xlator-option vol01-server.listen-port=49153)
[2019-10-24 07:50:25.955180] W [MSGID: 101002] [options.c:995:xl_opt_validate] 0-glusterfs: option 'address-family' is deprecated, preferred is 'transport.address-family', continuing with correction
[2019-10-24 07:50:25.958438] I [MSGID: 101190] [event-epoll.c:613:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1

- Where pid 22821 from?

[log: /cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/var/log/glusterfs/glusterd.log-20190717]
..... snip .....
[2019-07-16 07:03:46.632909] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping nfs daemon running in pid: 22821    <----- pid
[2019-07-16 07:03:47.633263] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped

[ log: /cases/02502431/40-sosreport-kiotloglgst04-2019-10-28-iolxwep.tar.xz/sosreport-kiotloglgst04-2019-10-28-iolxwep/var/log/glusterfs/cmd_history.log-20190717.gz]
.... snip .....
[2019-07-16 06:49:38.747879]  : volume set vol01 nfs.disable off : SUCCESS
[2019-07-16 07:03:47.652222]  : volume set vol01 nfs.disable on : SUCCESS

- Node kiotloglgst03 booted on Fri Jul  5, apparently pid 22821 is reused by another process.
[log: /cases/02502431/30-sosreport-kiotloglgst03-2019-10-28-xswmtvf.tar.xz/sosreport-kiotloglgst03-2019-10-28-xswmtvf/last]
..... snip .....
reboot   system boot  3.10.0-957.12.2. Fri Jul  5 11:27 - 16:48 (111+05:21)

- Customer confirmed that there was no nfs daemon running before command. They have output from those commands as below.

[ log: /cases/02502431/40-sosreport-kiotloglgst04-2019-10-28-iolxwep.tar.xz/sosreport-kiotloglgst04-2019-10-28-iolxwep/var/log/glusterfs/cmd_history.log-20191025 ]

[2019-10-24 05:28:55.097546]  : volume status vol01 : SUCCESS
[2019-10-24 05:30:45.862941]  : volume status vol01 : SUCCESS                           <----- (*1)
[2019-10-24 05:31:04.248179]  : volume set vol01 nfs.disable off : SUCCESS       <----- (*2)
[2019-10-24 05:31:08.796906]  : volume status vol01 : SUCCESS                           <----- (*3)

  (*1)
[root@kiotloglgst04 hpework]# gluster volume status vol01
Status of volume: vol01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick kiotloglgst01:/bricks/brick01/1       49152     0          Y       30193
Brick kiotloglgst02:/bricks/brick01/1       49152     0          Y       31254
Brick kiotloglgst03:/bricks/brick03/1       49152     0          Y       14926
Brick kiotloglgst02:/bricks/brick02/1       49153     0          Y       31263
Brick kiotloglgst03:/bricks/brick01/1       49153     0          Y       14913      <----- Brick will be killed 
Brick kiotloglgst04:/bricks/brick03/1       49152     0          Y       14941
Brick kiotloglgst03:/bricks/brick02/1       49154     0          Y       14912
Brick kiotloglgst04:/bricks/brick01/1       49153     0          Y       14935
Brick kiotloglgst01:/bricks/brick03/1       49153     0          Y       30213
Brick kiotloglgst04:/bricks/brick02/1       49154     0          Y       14942
Brick kiotloglgst01:/bricks/brick02/1       49154     0          Y       30233
Brick kiotloglgst02:/bricks/brick03/1       49154     0          Y       31269
Self-heal Daemon on localhost               N/A       N/A        Y       14586
Quota Daemon on localhost                   N/A       N/A        Y       14886
Self-heal Daemon on kiotloglgst02           N/A       N/A        Y       31234
Quota Daemon on kiotloglgst02               N/A       N/A        Y       31244
Self-heal Daemon on kiotloglgst01           N/A       N/A        Y       30255
Quota Daemon on kiotloglgst01               N/A       N/A        Y       30265
Self-heal Daemon on kiotloglgst03           N/A       N/A        Y       14611
Quota Daemon on kiotloglgst03               N/A       N/A        Y       14874
 
Task Status of Volume vol01
------------------------------------------------------------------------------
There are no active volume tasks

(*2)
[root@kiotloglgst04 hpework]# gluster volume set vol01 nfs.disable off
volume set: success

(*3)
[root@kiotloglgst04 hpework]# gluster volume set vol01 nfs.disable offtatus vol01
Status of volume: vol01
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick kiotloglgst01:/bricks/brick01/1       49152     0          Y       30193
Brick kiotloglgst02:/bricks/brick01/1       49152     0          Y       31254
Brick kiotloglgst03:/bricks/brick03/1       49152     0          Y       14926
Brick kiotloglgst02:/bricks/brick02/1       49153     0          Y       31263
Brick kiotloglgst03:/bricks/brick01/1       N/A       N/A        N       N/A        <----- Brick killed
Brick kiotloglgst04:/bricks/brick03/1       49152     0          Y       14941
Brick kiotloglgst03:/bricks/brick02/1       49154     0          Y       14912
Brick kiotloglgst04:/bricks/brick01/1       49153     0          Y       14935
Brick kiotloglgst01:/bricks/brick03/1       49153     0          Y       30213
Brick kiotloglgst04:/bricks/brick02/1       49154     0          Y       14942
Brick kiotloglgst01:/bricks/brick02/1       49154     0          Y       30233
Brick kiotloglgst02:/bricks/brick03/1       49154     0          Y       31269
NFS Server on localhost                     2049      0          Y       8716 
Self-heal Daemon on localhost               N/A       N/A        Y       14586
Quota Daemon on localhost                   N/A       N/A        Y       14886
NFS Server on kiotloglgst01                 2049      0          Y       348  
Self-heal Daemon on kiotloglgst01           N/A       N/A        Y       30255
Quota Daemon on kiotloglgst01               N/A       N/A        Y       30265
NFS Server on kiotloglgst03                 2049      0          Y       17671
Self-heal Daemon on kiotloglgst03           N/A       N/A        Y       14611
Quota Daemon on kiotloglgst03               N/A       N/A        Y       14874
NFS Server on kiotloglgst02                 2049      0          Y       19331
Self-heal Daemon on kiotloglgst02           N/A       N/A        Y       31234
Quota Daemon on kiotloglgst02               N/A       N/A        Y       31244
 
Task Status of Volume vol01
------------------------------------------------------------------------------
There are no active volume tasks

==============================================================================
gluster volume info
==============================================================================
Volume Name: vol01
Type: Distributed-Replicate
Volume ID: be62535e-9fbd-4bd8-b1cd-90eca0b25ed0
Status: Started
Snapshot Count: 0
Number of Bricks: 4 x (2 + 1) = 12
Transport-type: tcp
Bricks:
Brick1: kiotloglgst01:/bricks/brick01/1
Brick2: kiotloglgst02:/bricks/brick01/1
Brick3: kiotloglgst03:/bricks/brick03/1 (arbiter)
Brick4: kiotloglgst02:/bricks/brick02/1
Brick5: kiotloglgst03:/bricks/brick01/1
Brick6: kiotloglgst04:/bricks/brick03/1 (arbiter)
Brick7: kiotloglgst03:/bricks/brick02/1
Brick8: kiotloglgst04:/bricks/brick01/1
Brick9: kiotloglgst01:/bricks/brick03/1 (arbiter)
Brick10: kiotloglgst04:/bricks/brick02/1
Brick11: kiotloglgst01:/bricks/brick02/1
Brick12: kiotloglgst02:/bricks/brick03/1 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
features.shard: off
features.shard-block-size: 16MB
features.quota: on
features.inode-quota: on
features.quota-deem-statfs: on
features.default-soft-limit: 95
network.ping-timeout: 10

Version-Release number of selected component (if applicable):
////////////////////////////////////////////////////////////////////////////

==============================================================================
grep "gluster" installed-rpms
==============================================================================
glusterfs-3.12.2-25.el7rhgs.x86_64                          Tue Jun 11 10:58:20 2019
glusterfs-api-3.12.2-25.el7rhgs.x86_64                      Tue Jun 11 10:58:21 2019
glusterfs-client-xlators-3.12.2-25.el7rhgs.x86_64           Tue Jun 11 10:57:35 2019
glusterfs-cli-3.12.2-25.el7rhgs.x86_64                      Tue Jun 11 10:57:35 2019
glusterfs-events-3.12.2-25.el7rhgs.x86_64                   Tue Jun 11 10:58:54 2019
glusterfs-fuse-3.12.2-25.el7rhgs.x86_64                     Tue Jun 11 10:58:23 2019
glusterfs-geo-replication-3.12.2-25.el7rhgs.x86_64          Tue Jun 11 10:58:54 2019
glusterfs-libs-3.12.2-25.el7rhgs.x86_64                     Tue Jun 11 10:57:31 2019
glusterfs-rdma-3.12.2-25.el7rhgs.x86_64                     Tue Jun 11 10:58:36 2019
glusterfs-server-3.12.2-25.el7rhgs.x86_64                   Tue Jun 11 10:58:53 2019
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64               Tue Jun 11 10:58:43 2019
gluster-nagios-common-0.2.4-1.el7rhgs.noarch                Tue Jun 11 10:57:46 2019
libvirt-daemon-driver-storage-gluster-4.5.0-10.el7_6.10.x86_64 Tue Jun 11 16:04:57 2019
python2-gluster-3.12.2-25.el7rhgs.x86_64                    Tue Jun 11 10:57:36 2019
tendrl-gluster-integration-1.6.3-13.el7rhgs.noarch          Tue Jun 11 16:04:59 2019
vdsm-gluster-4.19.43-2.3.el7rhgs.noarch                     Tue Jun 11 10:59:13 2019

==============================================================================
cat /etc/redhat-storage-release
==============================================================================
Red Hat Gluster Storage Server 3.4

How reproducible:

- Always reproduced if nfs deamon's pid is reused by another process.


Problem easily reproduced:
////////////////////////////////////////////////////////////////////////////

- Create 2 +1 arbiter vol
- Make sure, no nfs daemon running

[root@rhel-a glusterfs]# gluster vol info avol

Volume Name: avol
Type: Replicate
Volume ID: 1e1e1870-b21d-4b0c-9cd0-b95d67725e1b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: rhel-a:/vol1/brick/b1
Brick2: rhel-b:/vol1/brick/b1
Brick3: rhel-c:/vol1/brick/b1 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on                             <----- No nfs 
transport.address-family: inet

///// No nfs daemon shown in status below.
[root@rhel-a glusterfs]# gluster vol status avol
Status of volume: avol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhel-a:/vol1/brick/b1                 49152     0          Y       4386
Brick rhel-b:/vol1/brick/b1                 49152     0          Y       4109
Brick rhel-c:/vol1/brick/b1                 49152     0          Y       4140
Self-heal Daemon on localhost               N/A       N/A        Y       4285
Self-heal Daemon on rhel-c                  N/A       N/A        Y       4081
Self-heal Daemon on rhel-b                  N/A       N/A        Y       4058

Task Status of Volume avol
------------------------------------------------------------------------------
There are no active volume tasks

- Start dummy process

[root@rhel-a nfs]# strace sleep 100000; date

- Find sleep pid and overwrite /var/run/gluster/nfs/nfs.pid file with sleep's pid.

[root@rhel-a glusterfs]# ps aux | grep sleep
root      8545  0.0  0.0 107952   360 ?        S    13:08   0:00 sleep 60
root      8586  0.0  0.0   7188   816 pts/1    S+   13:09   0:00 strace sleep 100000
root      8588  0.0  0.0 107952   584 pts/1    S+   13:09   0:00 sleep 100000          <------ Dummy pid 8588
root      8590  0.0  0.0 112680   696 pts/0    R+   13:09   0:00 grep --color=auto sleep

[root@rhel-a glusterfs]# ls -l /var/run/gluster/nfs/nfs.pid
-rw-r--r--. 1 root root 5 Dec 12 13:07 /var/run/gluster/nfs/nfs.pid

[root@rhel-a glusterfs]# echo 8588 > /var/run/gluster/nfs/nfs.pid

- Ensure no nfs daemon running

[root@rhel-a glusterfs]# ps aux | grep glusterfs
root      4285  0.0  0.3 677720  7296 ?        Ssl  10:55   0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/827bf70aee5e557b.socket --log-level=INFO --xlator-option *replicate*.node-uuid=080da7ab-1eec-4d0c-8551-b4e7226e9f55
root      4386  0.0  1.0 929228 19580 ?        Ssl  10:55   0:00 /usr/sbin/glusterfsd -s rhel-a --volfile-id avol.rhel-a.vol1-brick-b1 -p /var/run/gluster/vols/avol/rhel-a-vol1-brick-b1.pid -S /var/run/gluster/a12b523ebfcadaec.socket --brick-name /vol1/brick/b1 -l /var/log/glusterfs/bricks/vol1-brick-b1.log --xlator-option *-posix.glusterd-uuid=080da7ab-1eec-4d0c-8551-b4e7226e9f55 --brick-port 49152 --xlator-option avol-server.listen-port=49152
root      8619  0.0  0.0 112684   956 pts/0    S+   13:10   0:00 grep --color=auto glusterfs

- Set 'nfs.disble off'

[root@rhel-a glusterfs]# gluster vol set avol nfs.disable off; date
volume set: success
Thu Dec 12 13:10:43 JST 2019

[root@rhel-a glusterfs]# gluster vol status avol
Status of volume: avol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick rhel-a:/vol1/brick/b1                 49152     0          Y       4386
Brick rhel-b:/vol1/brick/b1                 49152     0          Y       4109
Brick rhel-c:/vol1/brick/b1                 49152     0          Y       4140
NFS Server on localhost                     2049      0          Y       8644        <----- nfs daemon started
Self-heal Daemon on localhost               N/A       N/A        Y       4285
NFS Server on rhel-c                        2049      0          Y       8235
Self-heal Daemon on rhel-c                  N/A       N/A        Y       4081
NFS Server on rhel-b                        2049      0          Y       8181
Self-heal Daemon on rhel-b                  N/A       N/A        Y       4058

Task Status of Volume avol
------------------------------------------------------------------------------
There are no active volume tasks

- Dummy sleep killed
.... snip .....
fstat(3, {st_mode=S_IFREG|0644, st_size=2502, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9e8162000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2502
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0x7ff9e8162000, 4096)            = 0
open("/usr/lib/locale/En/LC_IDENTIFICATION", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
nanosleep({100000, 0},  <unfinished ...>
+++ killed by SIGKILL +++
強制終了
Thu Dec 12 13:10:39 JST 2019
[root@rhel-a nfs]#

- /var/log/glusterfs/glusterd.log shows sleep was killed.

[2019-12-12 04:10:39.809093] I [MSGID: 106568] [glusterd-proc-mgmt.c:88:glusterd_proc_stop] 0-management: Stopping nfs daemon running in pid: 8588  <----- Dummy sleep killed
[2019-12-12 04:10:40.810417] I [MSGID: 106568] [glusterd-svc-mgmt.c:243:glusterd_svc_stop] 0-management: nfs service is stopped
[2019-12-12 04:10:40.811184] I [MSGID: 106540] [glusterd-utils.c:5115:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV3 successfully
[2019-12-12 04:10:40.811327] I [MSGID: 106540] [glusterd-utils.c:5124:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered MOUNTV1 successfully
[2019-12-12 04:10:40.811441] I [MSGID: 106540] [glusterd-utils.c:5133:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NFSV3 successfully
[2019-12-12 04:10:40.811588] I [MSGID: 106540] [glusterd-utils.c:5142:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v4 successfully
[2019-12-12 04:10:40.811701] I [MSGID: 106540] [glusterd-utils.c:5151:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered NLM v1 successfully
[2019-12-12 04:10:40.811810] I [MSGID: 106540] [glusterd-utils.c:5160:glusterd_nfs_pmap_deregister] 0-glusterd: De-registered ACL v3 successfully
[2019-12-12 04:10:40.813519] I [MSGID: 106567] [glusterd-svc-mgmt.c:211:glusterd_svc_start] 0-management: Starting nfs service


Code path for starting nfs daemon
////////////////////////////////////////////////////////////////////////////

glusterd_nfssvc_manager() {
    ....
   ret = svc->stop (svc, SIGKILL);     <*** calls glusterd_nfssvc_stop()
    ....
    ret = svc->start (svc, flags);
    ....
}

Whenever nfs daemon is started, glusterd kills existing nfs daemon by calling glusterd_nfssvc_stop().
In glusterd_nfssvc_stop() code path,

glusterd_nfssvc_stop() {
    glusterd_proc_is_running() {
    ....    
    glusterd_svc_stop (svc, sig) {
        glusterd_proc_stop() {
            gf_is_service_running() {
                fopen (pidfile, "r+");                             <**1** "/var/run/gluster/nfs/nfs.pid" exists?
                fscanf (file, "%d", pid);                         <**2** Read PID in
                gf_is_pid_running(*pid) {
                    snprintf(fname, sizeof(fname), "/proc/%d/cmdline", pid);
                    if (sys_access (fname , R_OK) != 0) {                                       <**3**  "/proc/PID/cmdline" accessible?
                        ....
                    }
                     ....
                }
                ....
            }
            /*** If conditions above met, kills pid below. Conditions are enclosed with **X**

            ret = kill (pid, sig);
        ....
    ....
}
///// Apparently we have a bug. Running pid check should be done more accurate or remove "/var/run/gluster/nfs/nfs.pid" file when nfs daemon gets killed by "disable.nfs on".

Comment 1 Sanju 2019-12-17 03:35:21 UTC

Thanks for the detailed bug report. I will read the code and send out a fix if there is a bug.

Comment 2 Sanju 2019-12-17 10:58:47 UTC

upstream patch: https://review.gluster.org/#/c/glusterfs/+/23890

Comment 3 Yaniv Kaul 2020-02-24 14:23:42 UTC

(In reply to Sanju from comment #2)
> upstream patch: https://review.gluster.org/#/c/glusterfs/+/23890

This is merged. What's the next step here?

Comment 9 milind 2020-10-07 08:17:50 UTC

[node.example.com]# gluster v info 
 
Volume Name: vol
Type: Replicate
Volume ID: 75fea424-ac46-42a9-ba76-acd844742b8b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.159.52:/bricks/brick0/vol
Brick2: node3:/bricks/brick0/vol
Brick3: node4:/bricks/brick0/vol
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.brick-multiplex: on
[node.example.com]# gluster v status 
Status of volume: vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.16.159.52:/bricks/brick0/vol       49152     0          Y       163653
Brick node3:/bricks/brick0/vol      49152     0          Y       61197
Brick node4:/bricks/brick0/vol       49152     0          Y       143348
Self-heal Daemon on localhost               N/A       N/A        Y       163670
Self-heal Daemon on node1           N/A       N/A        Y       44985
Self-heal Daemon on node3           N/A       N/A        Y       61214
Self-heal Daemon on node2            N/A       N/A        Y       88757
Self-heal Daemon on 10.16.159.124           N/A       N/A        Y       90124
Self-heal Daemon on node4            N/A       N/A        Y       143365
 
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
 
[node.example.com]#  gluster volume set vol nfs.disable off
volume set: success


[node.example.com]# gluster v info 
 
Volume Name: vol
Type: Replicate
Volume ID: 75fea424-ac46-42a9-ba76-acd844742b8b
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 10.16.159.52:/bricks/brick0/vol
Brick2: node3:/bricks/brick0/vol
Brick3: node4:/bricks/brick0/vol
Options Reconfigured:
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: off
performance.client-io-threads: off
cluster.brick-multiplex: on
[node.example.com]# gluster v status 
Status of volume: vol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.16.159.52:/bricks/brick0/vol       49152     0          Y       163653
Brick node3:/bricks/brick0/vol      49152     0          Y       61197
Brick node4:/bricks/brick0/vol       49152     0          Y       143348
NFS Server on localhost                     2049      0          Y       163723
Self-heal Daemon on localhost               N/A       N/A        Y       163670
NFS Server on 10.16.159.124                 2049      0          Y       90152
Self-heal Daemon on 10.16.159.124           N/A       N/A        Y       90124
NFS Server on node1                 2049      0          Y       45012
Self-heal Daemon on node1           N/A       N/A        Y       44985
NFS Server on node2                  2049      0          Y       88784
Self-heal Daemon on node2            N/A       N/A        Y       88757
NFS Server on node3                 2049      0          Y       61245
Self-heal Daemon on node3           N/A       N/A        Y       61214
NFS Server on node4                  2049      0          Y       143396
Self-heal Daemon on node4            N/A       N/A        Y       143365
 
Task Status of Volume vol
------------------------------------------------------------------------------
There are no active volume tasks
---------------------------------------------
Additional info :

glusterfs-6.0-45.el8rhgs.x86_64
glusterfs-fuse-6.0-45.el8rhgs.x86_64
glusterfs-api-6.0-45.el8rhgs.x86_64
glusterfs-selinux-1.0-1.el8rhgs.noarch
glusterfs-client-xlators-6.0-45.el8rhgs.x86_64
glusterfs-server-6.0-45.el8rhgs.x86_64
glusterfs-cli-6.0-45.el8rhgs.x86_64
glusterfs-libs-6.0-45.el8rhgs.x86_64


After performning the steps.
As no brick is going offline after nfs.disable: off , Hence marking this bug as verified

Comment 11 errata-xmlrpc 2020-12-17 04:50:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (glusterfs bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5603

Note You need to log in before you can comment on or make changes to this bug.