RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1181779 - rpcbind prevents Gluster/NFS from registering itself after a restart/reboot
Summary: rpcbind prevents Gluster/NFS from registering itself after a restart/reboot
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: rpcbind
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Yongcheng Yang
URL:
Whiteboard:
: 1184661 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-01-13 18:07 UTC by Marcelo Barbosa "firemanxbr"
Modified: 2018-09-03 11:35 UTC (History)
9 users (show)

Fixed In Version: rpcbind-0.2.0-27.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-19 05:32:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1183118 0 medium CLOSED Gluster/NFS does not exit cleanly on reboot, leaving rpcbind registrations behind 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2015:2205 0 normal SHIPPED_LIVE rpcbind bug fix update 2015-11-19 08:17:40 UTC

Internal Links: 1183118

Description Marcelo Barbosa "firemanxbr" 2015-01-13 18:07:06 UTC
I'm using RHEL 7.0 + GlusterFS with packages:

glusterfs-libs-3.6.1-1.el7.x86_64
glusterfs-fuse-3.6.1-1.el7.x86_64
vdsm-gluster-4.16.10-0.el7.noarch
glusterfs-cli-3.6.1-1.el7.x86_64
glusterfs-server-3.6.1-1.el7.x86_64
glusterfs-api-3.6.1-1.el7.x86_64
glusterfs-geo-replication-3.6.1-1.el7.x86_64
glusterfs-3.6.1-1.el7.x86_64
glusterfs-rdma-3.6.1-1.el7.x86_64
rpcbind-0.2.0-23.el7.x86_64

My error is:

# systemctl status glusterd.service
glusterd.service - GlusterFS, a clustered file-system server
   Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled)
   Active: active (running) since Tue 2015-01-13 14:34:29 BRST; 5min ago
  Process: 20445 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid (code=exited, status=0/SUCCESS)
 Main PID: 20446 (glusterd)
   CGroup: /system.slice/glusterd.service
           ├─ 3426 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-ctdb.ped-dc02.datacom.gluster-ctdb02 -p /var/lib/glusterd/vols/vol-ctdb/ru...
           ├─ 3432 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-data.ped-dc02.datacom.gluster-data02 -p /var/lib/glusterd/vols/vol-data/ru...
           ├─ 3440 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-export.ped-dc02.datacom.gluster-export02 -p /var/lib/glusterd/vols/vol-exp...
           ├─ 3445 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-iso.ped-dc02.datacom.gluster-iso02 -p /var/lib/glusterd/vols/vol-iso/run/p...
           ├─ 3450 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-unguarded.ped-dc02.datacom.gluster-unguarded02 -p /var/lib/glusterd/vols/v...
           ├─ 3457 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-vm.ped-dc02.datacom.gluster-vm02 -p /var/lib/glusterd/vols/vol-vm/run/ped-...
           ├─20446 /usr/sbin/glusterd -p /var/run/glusterd.pid
           ├─20689 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glus...
           └─20695 /sbin/rpc.statd
 
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: backtrace 1
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: dlfcn 1
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: libpthread 1
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: llistxattr 1
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: setfsid 1
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: spinlock 1
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: epoll.h 1
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: xattr.h 1
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: st_atim.tv_nsec 1
Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: package-string: glusterfs 3.6.1

log:

# tail -f /var/log/glusterfs/nfs.log
[2015-01-13 16:11:40.961035] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (0), shutting down
[2015-01-13 16:27:31.683546] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.1 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/d90c692c9430f00aafeb7d6741c1a54b.socket)
[2015-01-13 16:27:32.730510] I [rpcsvc.c:2142:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 16
[2015-01-13 16:27:32.804293] E [rpcsvc.c:1303:rpcsvc_program_register_portmap] 0-rpc-service: Could not register with portmap 100021 4 38468
[2015-01-13 16:27:32.804314] E [nfs.c:331:nfs_init_versions] 0-nfs: Program  NLM4 registration failed
[2015-01-13 16:27:32.804321] E [nfs.c:1341:init] 0-nfs: Failed to initialize protocols
[2015-01-13 16:27:32.804328] E [xlator.c:425:xlator_init] 0-nfs-server: Initialization of volume 'nfs-server' failed, review your volfile again
[2015-01-13 16:27:32.804334] E [graph.c:322:glusterfs_graph_init] 0-nfs-server: initializing translator failed
[2015-01-13 16:27:32.804340] E [graph.c:525:glusterfs_graph_activate] 0-graph: init failed
[2015-01-13 16:27:32.804626] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (0), shutting down

Solution:
'sed "s/ -w//" /usr/lib/systemd/system/rpcbind.service > /etc/systemd/system/rpcbind.service ; systemctl daemon-reload ; systemctl restart rpcbind ; systemctl restart glusterd'

Comment 1 Niels de Vos 2015-01-13 18:13:46 UTC
The problem was that rpcbind always starts with the -w option. This prevents the Gluster/NFS server from registering itself at rpcbind after a reboot.

Removing the -w option from the rpcbind.service does not do a warm-restart on boot, and all RPC-programs should be able to register themselves without problem.

This might be related to the fact that upon reboot the Gluster/NFS service does not (always) unregister itself from rpcbind.

I think the rpcbind.service should only add the -w option on reload, not on (re)start.

Comment 3 Steve Dickson 2015-01-15 14:02:09 UTC
(In reply to Niels de Vos from comment #1)
> The problem was that rpcbind always starts with the -w option. This prevents
> the Gluster/NFS server from registering itself at rpcbind after a reboot.
> 
> Removing the -w option from the rpcbind.service does not do a warm-restart
> on boot, and all RPC-programs should be able to register themselves without
> problem.
> 
> This might be related to the fact that upon reboot the Gluster/NFS service
> does not (always) unregister itself from rpcbind.
> 
> I think the rpcbind.service should only add the -w option on reload, not on
> (re)start.

How do you do this with systemd scripts???

Comment 4 Niels de Vos 2015-01-16 17:48:00 UTC
(In reply to Steve Dickson from comment #3)
> How do you do this with systemd scripts???

Uh, yeah, well, that does not seem as trivial as I thought it would be.

This simple configuration doesn't work, probably because the PID changes:

[Service]
Type=forking
EnvironmentFile=/etc/sysconfig/rpcbind
ExecStart=/sbin/rpcbind ${RPCBIND_ARGS}
ExecReload=-/bin/kill ${MAINPID} ; /sbin/rpcbind -w ${RPCBIND_ARGS}


So, trying to fake a rpcbind.pid in the hope it would do something more (line breaks added in this comment, commands should be on one line):

[Service]
Type=forking
PIDFile=/run/rpcbind.pid
EnvironmentFile=/etc/sysconfig/rpcbind
ExecStart=/bin/sh -c "/sbin/rpcbind ${RPCBIND_ARGS} ; sleep 1 ; \
                      /usr/sbin/pidof rpcbind > /run/rpcbind.pid"
ExecReload=-/bin/kill ${MAINPID} ; /sbin/rpcbind -w ${RPCBIND_ARGS} ; \
           -/usr/sbin/pidof rpcbind > /run/rpcbind.pid


But no luck :-/

The problem is caused because the Gluster/NFS server is not stopped on a shutdown/reboot. It therefore does not unregister at rpcbind. I guess a similar problem would happen when an RPC-service crashes.

Do you have a recommendation on how to handle this kind of issue? I need to look into decently stopping the Gluster/NFS service on systemd environments, but a potential crashing/unclean de-registrations seems unhandled.

(Idea: maybe move the -w status file to /var/run which is cleared upon reboot?)

Comment 5 Steve Dickson 2015-02-26 18:24:26 UTC
(In reply to Niels de Vos from comment #4)
> (Idea: maybe move the -w status file to /var/run which is cleared upon
> reboot?)
Working with Anand at this year's Connectathon, I see what the problem is.
I think moving the warm up file /var/run is a good idea because 
rpcbind needs to remember server over restarts not reboots!

Comment 9 Steve Dickson 2015-05-04 12:38:40 UTC
(In reply to Steve Dickson from comment #5)
> (In reply to Niels de Vos from comment #4)
> > (Idea: maybe move the -w status file to /var/run which is cleared upon
> > reboot?)
> Working with Anand at this year's Connectathon, I see what the problem is.
> I think moving the warm up file /var/run is a good idea because 
> rpcbind needs to remember server over restarts not reboots!

I was playing around with this in Fedora and it turns just moving 
the rpcbind directory to /var/run didn't work because the the
/var/run/rpcbind was being created on reboot... But I just
stumbled over  systemd-tmpfiles which appears create 
directories durning boot which is exactly what is needed
(I think! ;-) )

Comment 10 Steve Dickson 2015-05-04 15:01:15 UTC
*** Bug 1184661 has been marked as a duplicate of this bug. ***

Comment 21 Steve Dickson 2015-09-24 15:42:40 UTC
Hello,

There were some changes made to the latest rpcbind package
Would you mind retesting with 
   http://people.redhat.com/steved/.bz1240817/rpcbind-0.2.0-30.el7.x86_64.rpm

to ensure there are no regressions

Comment 24 errata-xmlrpc 2015-11-19 05:32:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-2205.html


Note You need to log in before you can comment on or make changes to this bug.