Bug 1181779
| Summary: | rpcbind prevents Gluster/NFS from registering itself after a restart/reboot | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Marcelo Barbosa "firemanxbr" <marcelo.barbosa> |
| Component: | rpcbind | Assignee: | Steve Dickson <steved> |
| Status: | CLOSED ERRATA | QA Contact: | Yongcheng Yang <yoyang> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.0 | CC: | asmarre, eguan, fs-qe, jiyin, joe, ndevos, smayhew, steved, yoyang |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | rpcbind-0.2.0-27.el7 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-11-19 05:32:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
The problem was that rpcbind always starts with the -w option. This prevents the Gluster/NFS server from registering itself at rpcbind after a reboot. Removing the -w option from the rpcbind.service does not do a warm-restart on boot, and all RPC-programs should be able to register themselves without problem. This might be related to the fact that upon reboot the Gluster/NFS service does not (always) unregister itself from rpcbind. I think the rpcbind.service should only add the -w option on reload, not on (re)start. (In reply to Niels de Vos from comment #1) > The problem was that rpcbind always starts with the -w option. This prevents > the Gluster/NFS server from registering itself at rpcbind after a reboot. > > Removing the -w option from the rpcbind.service does not do a warm-restart > on boot, and all RPC-programs should be able to register themselves without > problem. > > This might be related to the fact that upon reboot the Gluster/NFS service > does not (always) unregister itself from rpcbind. > > I think the rpcbind.service should only add the -w option on reload, not on > (re)start. How do you do this with systemd scripts??? (In reply to Steve Dickson from comment #3) > How do you do this with systemd scripts??? Uh, yeah, well, that does not seem as trivial as I thought it would be. This simple configuration doesn't work, probably because the PID changes: [Service] Type=forking EnvironmentFile=/etc/sysconfig/rpcbind ExecStart=/sbin/rpcbind ${RPCBIND_ARGS} ExecReload=-/bin/kill ${MAINPID} ; /sbin/rpcbind -w ${RPCBIND_ARGS} So, trying to fake a rpcbind.pid in the hope it would do something more (line breaks added in this comment, commands should be on one line): [Service] Type=forking PIDFile=/run/rpcbind.pid EnvironmentFile=/etc/sysconfig/rpcbind ExecStart=/bin/sh -c "/sbin/rpcbind ${RPCBIND_ARGS} ; sleep 1 ; \ /usr/sbin/pidof rpcbind > /run/rpcbind.pid" ExecReload=-/bin/kill ${MAINPID} ; /sbin/rpcbind -w ${RPCBIND_ARGS} ; \ -/usr/sbin/pidof rpcbind > /run/rpcbind.pid But no luck :-/ The problem is caused because the Gluster/NFS server is not stopped on a shutdown/reboot. It therefore does not unregister at rpcbind. I guess a similar problem would happen when an RPC-service crashes. Do you have a recommendation on how to handle this kind of issue? I need to look into decently stopping the Gluster/NFS service on systemd environments, but a potential crashing/unclean de-registrations seems unhandled. (Idea: maybe move the -w status file to /var/run which is cleared upon reboot?) (In reply to Niels de Vos from comment #4) > (Idea: maybe move the -w status file to /var/run which is cleared upon > reboot?) Working with Anand at this year's Connectathon, I see what the problem is. I think moving the warm up file /var/run is a good idea because rpcbind needs to remember server over restarts not reboots! (In reply to Steve Dickson from comment #5) > (In reply to Niels de Vos from comment #4) > > (Idea: maybe move the -w status file to /var/run which is cleared upon > > reboot?) > Working with Anand at this year's Connectathon, I see what the problem is. > I think moving the warm up file /var/run is a good idea because > rpcbind needs to remember server over restarts not reboots! I was playing around with this in Fedora and it turns just moving the rpcbind directory to /var/run didn't work because the the /var/run/rpcbind was being created on reboot... But I just stumbled over systemd-tmpfiles which appears create directories durning boot which is exactly what is needed (I think! ;-) ) *** Bug 1184661 has been marked as a duplicate of this bug. *** Hello, There were some changes made to the latest rpcbind package Would you mind retesting with http://people.redhat.com/steved/.bz1240817/rpcbind-0.2.0-30.el7.x86_64.rpm to ensure there are no regressions Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-2205.html |
I'm using RHEL 7.0 + GlusterFS with packages: glusterfs-libs-3.6.1-1.el7.x86_64 glusterfs-fuse-3.6.1-1.el7.x86_64 vdsm-gluster-4.16.10-0.el7.noarch glusterfs-cli-3.6.1-1.el7.x86_64 glusterfs-server-3.6.1-1.el7.x86_64 glusterfs-api-3.6.1-1.el7.x86_64 glusterfs-geo-replication-3.6.1-1.el7.x86_64 glusterfs-3.6.1-1.el7.x86_64 glusterfs-rdma-3.6.1-1.el7.x86_64 rpcbind-0.2.0-23.el7.x86_64 My error is: # systemctl status glusterd.service glusterd.service - GlusterFS, a clustered file-system server Loaded: loaded (/usr/lib/systemd/system/glusterd.service; enabled) Active: active (running) since Tue 2015-01-13 14:34:29 BRST; 5min ago Process: 20445 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid (code=exited, status=0/SUCCESS) Main PID: 20446 (glusterd) CGroup: /system.slice/glusterd.service ├─ 3426 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-ctdb.ped-dc02.datacom.gluster-ctdb02 -p /var/lib/glusterd/vols/vol-ctdb/ru... ├─ 3432 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-data.ped-dc02.datacom.gluster-data02 -p /var/lib/glusterd/vols/vol-data/ru... ├─ 3440 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-export.ped-dc02.datacom.gluster-export02 -p /var/lib/glusterd/vols/vol-exp... ├─ 3445 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-iso.ped-dc02.datacom.gluster-iso02 -p /var/lib/glusterd/vols/vol-iso/run/p... ├─ 3450 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-unguarded.ped-dc02.datacom.gluster-unguarded02 -p /var/lib/glusterd/vols/v... ├─ 3457 /usr/sbin/glusterfsd -s ped-dc02.datacom --volfile-id vol-vm.ped-dc02.datacom.gluster-vm02 -p /var/lib/glusterd/vols/vol-vm/run/ped-... ├─20446 /usr/sbin/glusterd -p /var/run/glusterd.pid ├─20689 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glus... └─20695 /sbin/rpc.statd Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: backtrace 1 Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: dlfcn 1 Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: libpthread 1 Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: llistxattr 1 Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: setfsid 1 Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: spinlock 1 Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: epoll.h 1 Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: xattr.h 1 Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: st_atim.tv_nsec 1 Jan 13 14:34:38 ped-dc02.datacom nfs[20679]: package-string: glusterfs 3.6.1 log: # tail -f /var/log/glusterfs/nfs.log [2015-01-13 16:11:40.961035] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (0), shutting down [2015-01-13 16:27:31.683546] I [MSGID: 100030] [glusterfsd.c:2018:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.6.1 (args: /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/d90c692c9430f00aafeb7d6741c1a54b.socket) [2015-01-13 16:27:32.730510] I [rpcsvc.c:2142:rpcsvc_set_outstanding_rpc_limit] 0-rpc-service: Configured rpc.outstanding-rpc-limit with value 16 [2015-01-13 16:27:32.804293] E [rpcsvc.c:1303:rpcsvc_program_register_portmap] 0-rpc-service: Could not register with portmap 100021 4 38468 [2015-01-13 16:27:32.804314] E [nfs.c:331:nfs_init_versions] 0-nfs: Program NLM4 registration failed [2015-01-13 16:27:32.804321] E [nfs.c:1341:init] 0-nfs: Failed to initialize protocols [2015-01-13 16:27:32.804328] E [xlator.c:425:xlator_init] 0-nfs-server: Initialization of volume 'nfs-server' failed, review your volfile again [2015-01-13 16:27:32.804334] E [graph.c:322:glusterfs_graph_init] 0-nfs-server: initializing translator failed [2015-01-13 16:27:32.804340] E [graph.c:525:glusterfs_graph_activate] 0-graph: init failed [2015-01-13 16:27:32.804626] W [glusterfsd.c:1194:cleanup_and_exit] (--> 0-: received signum (0), shutting down Solution: 'sed "s/ -w//" /usr/lib/systemd/system/rpcbind.service > /etc/systemd/system/rpcbind.service ; systemctl daemon-reload ; systemctl restart rpcbind ; systemctl restart glusterd'