Bug 702707
Summary: | systemd doesn't start rpcidmapd reliably | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michael Young <m.a.young> | ||||||
Component: | nfs-utils | Assignee: | Steve Dickson <steved> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | unspecified | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 15 | CC: | amessina, bfields, gene-redhat, harald, jlayton, johannbg, lpoetter, metherid, mschmidt, notting, plautrba, steved, tomek | ||||||
Target Milestone: | --- | Keywords: | Patch | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2011-08-16 23:02:00 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Michael Young
2011-05-06 16:32:27 UTC
What does "systemctl status rpcidmapd.service" report? Please boot with "log_buf_len=1M systemd.log_level=debug systemd.log_target=kmsg" and when the problem is reproduced, attach here the output of the "dmesg" command. Created attachment 498070 [details]
(extracts from) output from dmesg
The relevant bits of the dmesg output is attached. The output from systemctl status rpcidmapd.service is
rpcidmapd.service - LSB: Starts the NFSv4 id mapping daemon
Loaded: loaded (/etc/rc.d/init.d/rpcidmapd)
Active: active (running) since Tue, 10 May 2011 16:02:03 +0100; 1min 42s ago
Process: 1224 ExecStart=/etc/rc.d/init.d/rpcidmapd start (code=exited, status=6/NOTCONFIGURED)
CGroup: name=systemd:/system/rpcidmapd.service
There is no running process 1224
code=exited, status=6/NOTCONFIGURED <= that's your problem Looking at the script: start|condstart) # Check that networking is up. [ "${NETWORKING}" != "yes" ] && exit 6 ... # Make sure the rpc_pipefs filesystem is available [ "${RPCMTAB}" != "noload" ] && { RPCMTAB=`grep -v '^#' /proc/mounts | \ awk '{ if ($3 ~ /^rpc_pipefs$/ ) print $2}'` [ -z "${RPCMTAB}" ] && { [ -x /sbin/lsmod -a -x /sbin/modprobe ] && { if ! /sbin/lsmod | grep sunrpc > /dev/null ; then /sbin/modprobe sunrpc fi } RPCMTAB=`grep -v '^#' /proc/mounts | \ awk '{ if ($3 ~ /^rpc_pipefs$/ ) print $2}'` [ -z "${RPCMTAB}" ] && { \ echo "Error: RPC MTAB does not exist." exit 6 } } } ... Are you able to determine which of these it's failing on? (NETWORKING=yes should come from /etc/sysconfig/network - that's the easy one to test.) Right, the initscript is failing for one of the two reasons. Reassigning to nfs-utils. Though it is not clear yet how exactly it is failing for Michael, obviously the initscript has a lot to improve: - The pattern `grep ... |awk ...` is suboptimal, awk can match regexs by itself. - It assumes sunrpc is a loadable module, but a custom kernel may have it built-in. - It depends on /etc/modprobe.d/dist.conf doing the nasty hack of mounting the rpc_pipefs filesystem. It should be the other way around. The initscript should mount the filesystem. The kernel will then ask for the sunrpc module if needed. The mount hacks in dist.conf should all go away some day. (In reply to comment #4) > The initscript should mount the filesystem. Or even better: We can put a couple of var-lib-nfs-rpc_pipefs.{,auto}mount unit files in the systemd-units package and the rpcidmapd scripts can stop caring about the mount completely. Michael, are you using a custom-built kernel? I cannot tell from the trimmed dmesg. Could you boot with "systemd.default_standard_output=syslog systemd.default_standard_error=syslog" and then look into /var/log/messages for the output of the rpcidmapd initscript? No it was a standard kernel 2.6.38.5-24.fc15.x86_64 I think there is a race condition between the rpcidmapd and rpcgssd start up scripts (which would probably also include rpcsvcgssd if I had it enabled) all trying to mount sunrpc at the same time, and the second one exits modprobe when the first one has loaded the module but before it has finished the rpc_pipefs mount. Output when rpcidmapd was started May 17 14:22:21 itspc114 rpcidmapd[1229]: /etc/init.d/functions: line 58: /dev/stderr: No such device or address May 17 14:22:21 itspc114 rpcgssd[1318]: /etc/init.d/functions: line 58: /dev/stderr: No such device or address May 17 14:22:21 itspc114 rpcgssd[1318]: Starting RPC gssd: FATAL: Module sunrpc already in kernel. May 17 14:22:21 itspc114 rpcgssd[1318]: FATAL: Error running install command for sunrpc May 17 14:22:21 itspc114 rpcgssd[1318]: Error: RPC MTAB does not exist. May 17 14:22:22 itspc114 rpcidmapd[1229]: Starting RPC idmapd: #033[60G[#033[0;32m OK #033[0;39m] Output when rpcidmapd was not started May 17 14:25:52 itspc114 rpcidmapd[1255]: /etc/init.d/functions: line 58: /dev/stderr: No such device or address May 17 14:25:52 itspc114 rpcgssd[1346]: /etc/init.d/functions: line 58: /dev/stderr: No such device or address May 17 14:25:52 itspc114 rpcidmapd[1255]: Starting RPC idmapd: FATAL: Module sunrpc already in kernel. May 17 14:25:52 itspc114 rpcidmapd[1255]: FATAL: Error running install command for sunrpc May 17 14:25:52 itspc114 rpcidmapd[1255]: Error: RPC MTAB does not exist. May 17 14:25:53 itspc114 rpcgssd[1346]: Starting RPC gssd: #033[60G[#033[0;32m OK #033[0;39m] Created attachment 499392 [details]
initscripts: just try to mount rpc_pipefs always
How about this patch to the nfs-utils' initscripts?
Don't forget to also remove the "install" and "remove" hacks for sunrpc from /etc/modprobe.d/dist.conf
Question how come the /var/lib/nfs/rpc_pipefs file system is not mount when the modprobe sunrpc happens? Thats how it worked in previous releases... /var/lib/nfs/rpc_pipefs is mounted, but too late. The scripts worked previously because init scripts were run sequentially avoiding all possibility of a race condition. Systemd runs several init scripts as once, and thus you get two or more scripts, eg. rpcgssd and rpcidmapd running at the same time. Both run lsmod and see sunrpc isn't mounted. Both run modprobe. One loads the sunrpc module, the other exits when module is loaded but before the subsequent mount of rpc_pipefs has finished and thus fails because rpc_pipefs isn't there yet. The one which loaded sunrpc finishes by mounting rpc_pipefs and completes successfully. So only one of the scripts actually works. you might want to fix the LSB headers of those initscripts and add dependencies I looks like this will be integrated into systemd: http://cgit.freedesktop.org/systemd/commit/?id=af49ec2c7de8698da61114c9e1ddfc8a8b9802c8 Any thoughts on how long the update might take to come to Fedora 15? For Kerberized NFS networks, this is a critical problem, post update. (In reply to comment #12) > I looks like this will be integrated into systemd: > > http://cgit.freedesktop.org/systemd/commit/?id=af49ec2c7de8698da61114c9e1ddfc8a8b9802c8 > > Any thoughts on how long the update might take to come to Fedora 15? For > Kerberized NFS networks, this is a critical problem, post update. The commit is already included in systemd >= 25. Any comments about the patch https://bugzilla.redhat.com/attachment.cgi?id=499392 ? (In reply to comment #13) > Any comments about the patch > https://bugzilla.redhat.com/attachment.cgi?id=499392 ? I hand edited that patch in place and it seems to work correctly, though I have other issues with nfs related start up so they might be masking any problems. (In reply to comment #4) > Right, the initscript is failing for one of the two reasons. Reassigning to > nfs-utils. > > Though it is not clear yet how exactly it is failing for Michael, obviously the > initscript has a lot to improve: > - The pattern `grep ... |awk ...` is suboptimal, awk can match regexs > by itself. > - It assumes sunrpc is a loadable module, but a custom kernel may have it > built-in. Yes sunrpc module has been a loadable in Fedora forever so this is a valid assumption. Plus setting the RPCMTAB=="noload" is how people who build custom kernel stop the module loading. This was a bug fix and again its been around since FC2. > - It depends on /etc/modprobe.d/dist.conf doing the nasty hack of mounting > the rpc_pipefs filesystem. It should be the other way around. The initscript > should mount the filesystem. The kernel will then ask for the sunrpc module > if needed. The mount hacks in dist.conf should all go away some day. When did this become an 'hack'? It has worked very well for a long time now... I just don't understand this.... (In reply to comment #14) > (In reply to comment #13) > > > Any comments about the patch > > https://bugzilla.redhat.com/attachment.cgi?id=499392 ? > > I hand edited that patch in place and it seems to work correctly, though I have > other issues with nfs related start up so they might be masking any problems. Where the guarantee that sunrpc is already load as your new scripts assumes... Also what happens when the file system is mounted and then the module is loaded? Or even worse, a daemon tries to access something on that mounted file system and the sunrpc module is not loaded... How is that one line mount preventing this? > Yes sunrpc module has been a loadable in Fedora forever so > this is a valid assumption. About two months ago I had a discussion with Peter Zijlstra about kernel development in Fedora. What he and other kernel developers often do, is that they configure their kernels with 'make localyesconfig' to make all modules built-in. It makes the modify/build/reboot/test cycle faster. Peter complained about the fragility of the Fedora initscripts and he specifically mentioned NFS as being broken when running with a monolithic kernel. So although the assumption works for common users, it is not a good assumption to make. It is possible to make the script work regardless how the kernel is configured. > When did this become an 'hack'? It has worked very well for a long > time now... I just don't understand this.... One reason is that doing it this way forces you to make the assumption about the kernel config. In general, mounting filesystems from modprobe rules is frowned upon by the udev and module-init-tools developers. See for instance: https://bugzilla.redhat.com/show_bug.cgi?id=695375#c4 > Where the guarantee that sunrpc is already load as your new > scripts assumes... It does not matter whether the module is already loaded. If it is not, the kernel will request the module when it is asked to mount the filesystem and udev will modprobe it. > Or even worse, a daemon tries to access something on that mounted > file system and the sunrpc module is not loaded... This is not possible. The 'rpc_pipefs' filesystem is implemented in the sunrpc module. That the filesystem is mounted implies that the module is loaded in the kernel. (In reply to comment #17) > It does not matter whether the module is already loaded. If it is not, the > kernel will request the module when it is asked to mount the filesystem and > udev will modprobe it. A minor correction: udev is not involved. The kernel just runs '/sbin/modprobe -q -- rpc_pipefs' directly as a usermode helper. ... -> do_kern_mount() -> get_fs_type() -> __request_module() -> call_usermodehelper_fns() And the name 'rpc_pipefs' gets translated to the actual module name by /etc/modprobe.d/dist.conf: alias rpc_pipefs sunrpc (In reply to comment #18) > (In reply to comment #17) > > It does not matter whether the module is already loaded. If it is not, the > > kernel will request the module when it is asked to mount the filesystem and > > udev will modprobe it. > > A minor correction: udev is not involved. > The kernel just runs '/sbin/modprobe -q -- rpc_pipefs' directly as a usermode > helper. > > ... > -> do_kern_mount() > -> get_fs_type() > -> __request_module() > -> call_usermodehelper_fns() > > And the name 'rpc_pipefs' gets translated to the actual module name by > /etc/modprobe.d/dist.conf: > alias rpc_pipefs sunrpc I'm a bit confused now.. If the "alias rpc_pipefs sunrpc" causes the sunrpc module to get loaded when /var/lib/nfs/rpc_pipefs is mounted, why were the install/remove rules, in /etc/modprobe.d/dist.conf, that did the explicit mount ever needed? Has something changed? (In reply to comment #17) > > Yes sunrpc module has been a loadable in Fedora forever so > > this is a valid assumption. > > About two months ago I had a discussion with Peter Zijlstra about kernel > development in Fedora. What he and other kernel developers often do, is that > they configure their kernels with 'make localyesconfig' to make all modules > built-in. It makes the modify/build/reboot/test cycle faster. Peter complained > about the fragility of the Fedora initscripts and he specifically mentioned NFS > as being broken when running with a monolithic kernel. Yeah they had to tweak the init scripts to make it work in which I do agree was suboptimal... > > So although the assumption works for common users, it is not a good assumption > to make. It is possible to make the script work regardless how the kernel is > configured. > > > When did this become an 'hack'? It has worked very well for a long > > time now... I just don't understand this.... > > One reason is that doing it this way forces you to make the assumption about > the kernel config. > In general, mounting filesystems from modprobe rules is frowned upon by the > udev and module-init-tools developers. See for instance: > https://bugzilla.redhat.com/show_bug.cgi?id=695375#c4 Fine, if there is a better way... so be it... > > > Where the guarantee that sunrpc is already load as your new > > scripts assumes... > > It does not matter whether the module is already loaded. If it is not, the > kernel will request the module when it is asked to mount the filesystem and > udev will modprobe it. > > > Or even worse, a daemon tries to access something on that mounted > > file system and the sunrpc module is not loaded... > > This is not possible. The 'rpc_pipefs' filesystem is implemented in the sunrpc > module. That the filesystem is mounted implies that the module is loaded in the > kernel. Assume the above statements are accurate, then the change does make sense.. (In reply to comment #19) > (In reply to comment #18) > > And the name 'rpc_pipefs' gets translated to the actual module name by > > /etc/modprobe.d/dist.conf: > > alias rpc_pipefs sunrpc > > I'm a bit confused now.. If the "alias rpc_pipefs sunrpc" causes > the sunrpc module to get loaded when /var/lib/nfs/rpc_pipefs is > mounted, why were the install/remove rules, in /etc/modprobe.d/dist.conf, > that did the explicit mount ever needed? Has something changed? Assuming a kernel with a modular sunrpc.ko, two things need to happen on boot to make /var/lib/nfs/rpc_pipefs available: (A): The sunrpc module needs to be loaded into the kernel. (B): The rpc_pipefs filesystem needs to be mounted on /var/lib/nfs/rpc_pipefs. The old initscript called '/sbin/modprobe sunrpc' to do (A) and relied on the modprobe install rule in dist.conf to do (B). The proposed new way is to turn this around and make the initscript do the mounting (B) and depend on the kernel's module request to let (A) happen. (BTW, we could even get rid of the 'alias rpc_pipefs sunrpc' rule if sunrpc declared a proper MODULE_ALIAS. I'll propose a kernel patch to add it.) (In reply to comment #21) > (In reply to comment #19) > > (In reply to comment #18) > > > And the name 'rpc_pipefs' gets translated to the actual module name by > > > /etc/modprobe.d/dist.conf: > > > alias rpc_pipefs sunrpc > > > > I'm a bit confused now.. If the "alias rpc_pipefs sunrpc" causes > > the sunrpc module to get loaded when /var/lib/nfs/rpc_pipefs is > > mounted, why were the install/remove rules, in /etc/modprobe.d/dist.conf, > > that did the explicit mount ever needed? Has something changed? > > Assuming a kernel with a modular sunrpc.ko, two things need to happen on boot > to make /var/lib/nfs/rpc_pipefs available: > (A): The sunrpc module needs to be loaded into the kernel. > (B): The rpc_pipefs filesystem needs to be mounted on /var/lib/nfs/rpc_pipefs. > > The old initscript called '/sbin/modprobe sunrpc' to do (A) and relied on the > modprobe install rule in dist.conf to do (B). Right because without (A), (B) would always fail, at least way back when... > > The proposed new way is to turn this around and make the initscript do the > mounting (B) and depend on the kernel's module request to let (A) happen. Understood... As long as just mount works %100 of the time, I'm all for it... > > (BTW, we could even get rid of the 'alias rpc_pipefs sunrpc' rule if sunrpc > declared a proper MODULE_ALIAS. I'll propose a kernel patch to add it.) Feel free, the upstream mailing list is linux-nfs.org Michal Schmidt's patch has been applied to rawhide nfs-utils as commit d37883f4821bff7bdf7f1bd42d8a16eda3884a2f and to F15 as 3be559dd770aefc9a9ce1c89b48fc7e4517fc785. (I'm assuming the right thing to do now is close this bug; if not, feel free to correct me.) I see the F15 build in koji, tagged only as updates-candidate. Would you be able to submit the request for updates-testing so we can try this out and confirm the fix? Apologies, I'm new to this: all I knew to do was push to an f15 branch and run "fedpkg build". What more does it need? (I'm happy to just RTFM, if I know which M I need to R....) Use Bodhi to submit updates: https://admin.fedoraproject.org/updates/ See also: https://fedoraproject.org/wiki/Package_update_HOWTO#Submit_your_update_to_Bodhi Steve, any chance you'd be able to submit this to updates-testing? I'd like to test this out and report back. Should be done, thanks for the pointers, and apologies for the delay getting back to this.... |