Bug 570946 - NFSv4 root is not switched or mounted read only
Summary: NFSv4 root is not switched or mounted read only
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 12
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Harald Hoyer
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 564294 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-03-05 22:25 UTC by Robin Rainton
Modified: 2010-12-03 21:59 UTC (History)
5 users (show)

Fixed In Version: dracut-005-2.fc12
Doc Type: Bug Fix
Doc Text:
Clone Of: 537969
Environment:
Last Closed: 2010-12-03 21:59:14 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Patch so /etc/passwd in initramfs gets an entry for root. (541 bytes, patch)
2010-04-14 23:07 UTC, Ian Dall
no flags Details | Diff

Description Robin Rainton 2010-03-05 22:25:56 UTC
+++ This bug was initially created as a clone of Bug #537969 +++

When converting diskless machines (PXE boot, NFSv4 root) from FC11 to FC12 saw many problems.

Firstly, boot arguments are discussed here:

http://fedoraproject.org/wiki/Dracut/Options#NFS

But those such as this do not work:

root=dhcp root-path=nfs4:server:/root/path

Nor this:

root=/dev/nfs nfsroot=server:/root/path

However, this almost works but gives the similar issues as described in bug #537969:

root=nfs4:server:/root/path

Using that the message/errors below are given:

---snip ---
dracut: Mounted root filesystem server:/root/path
dracut: Switching root
mount: only root can do that
---snip ---

So the root path is mounted, and the system proceeds to boot, but because root filesystem is not read/write at this point nothing works.

I don't get this... it would appear that mount thinks it's not being run as the root superuser which sounds very odd.

Comment 1 Andrew McNabb 2010-03-05 22:34:53 UTC
A while back I spent several hours trying to track down the cause of this error.  It turns out that it's because the mount binary has the setuid bit set.  Since idmapd isn't running, the mount binary is owned by nfsnobody, so mount is setuid nfsnobody.  When root runs mount, the effective uid switches to nfsnobody, which makes mount give up.

Comment 2 Andrew McNabb 2010-03-05 22:38:58 UTC
Anyway, either the init scripts need to be drastically changed, or Dracut needs to leave idmapd running through the pivot.  I tried making idmapd start immediately after pivot, but I ran into lots of problems and eventually gave up.  I think this needs to be worked on by someone who knows Dracut better than I do.

Comment 3 Robin Rainton 2010-04-04 23:20:15 UTC
Is there anything I can do to help progress a fix here? Am willing to try various things in the init scripts, etc. if that helps.

This is becoming urgent as until fixed one cannot update the kernel on machines with an NFS root.

The NFS root machines I have are still on an FC11 kernel as initrd created images work, but Dracut ones do not.

Comment 4 Robin Rainton 2010-04-14 12:11:40 UTC
More updates on this.

1. Andrew McNabb suggested this problem is due to 'mount' being SUID. I removed SUID from both the below on the initramfs image:

/sbin/mount.nfs
/bin/mount

This had no effect.

2. I then looked at the /sbin/nfsroot script and put '-x' on the first line to see what it was doing. It definitely executes rpc.idmapd prior to mounting root.

3. I also saw in the nfsroot script it looks for an optional 'nfsrw=rw' argument so put that in the kernel args to boot. I placed a 'mount' command right at the end of the /sbin/nfsroot script and confirmed that the root was indeed being mounted RW prior to switch_root. This didn't help.

4. What I find odd in all this is that the NFS server is not showing any mount requests in the logs. What is in there is:

rpc.idmapd[1732]: nss_getpwnam: name 'nobody' does not map into domain 'my.domain.name'

5. I then put '-x' on the init script itself. I can see the correct arguments are being passed to switch_root, and can see that the 'mount: only root can do that' message comes out after switch_root has been exec'd. Given that at this point /sysroot is mounted RW as required what's the big deal?

6. Noticing #5, I thought perhaps it's the 'mount' on the NFS root, instead of the ramdisk that's causing the issue so removed SUID on that. The has the effect of removing the 'mount: only root can do that' message and init starts. There are couple of errors ion chgrp but then the system hangs. The last message is 'Starting HAL daemon [OK]'

Is any of this helping?

Comment 5 Andrew McNabb 2010-04-14 15:33:33 UTC
Robin, thanks for adding your experiences about the problem.  As you noted in item 6, the problems are post-pivoting, so the mount-SUID problem is in the NFS root, not the initramfs.  I hadn't tried making root non-SUID, but I'm not shocked that it still hangs, since rpc.idmapd probably still isn't starting early enough.  This really is a frustrating problem.

Comment 6 Ian Dall 2010-04-14 23:05:43 UTC
The problem is that /etc/passwd (in the initramfs) has no entry for root.

Comment 7 Ian Dall 2010-04-14 23:07:59 UTC
Created attachment 406658 [details]
Patch so /etc/passwd in initramfs gets an entry for root.

Comment 8 Andrew McNabb 2010-04-14 23:13:58 UTC
Ian, does that actually work?  It doesn't seem to me like it would solve the problem, but I haven't actually tried it.  Have you tested it?

Comment 9 Ian Dall 2010-04-15 01:17:43 UTC
Absolutely. You can boot with rdbreak on the command line and you get a shell prompt juse before the switch root. Do an ls -l /sysroot and you see that for the mounted root file system, practically everything is owned by MAX_INT (nfsnobody). The groups are all OK because the the full /etc/group gets copied across.

When you think about it, idmapd tries to map the user name "root" from across the wire to a uid locally, but it doesn't know what the uid is because there is no entry in /etc/passwd. The result is that the mount executable on the new root is SUID nfsnobody.

Installations using ldap or something might not need the patch (but it would do not harm). This would require that "root" and other "system' uid's be in LDAP which I am not sure is standard practice.

[I'm not sure what standard practice is with nfs4 and ldap. I'm used to user's uid's coming from NIS or YP or LDAP, but having "system' uid's still locally defined in /etc/passwd, which causes problems with nfs4 when the system uid's get out of sync. Whilst OT for this particular bug, it is an issue which stops nfs4 being a drop-in replacement for nfs3.]

Comment 10 Andrew McNabb 2010-04-15 03:37:41 UTC
The errors we've reported, such as "mount: only root can do that" occur post-pivot, so it's very surprising that the initramfs /etc/passwd (pre-pivot) would be related.  To make sure I'm understanding you correctly, do you mean that you added root to /etc/passwd in the initramfs, and this made the "mount: only root can do that" error go away and the system boot correctly?

Comment 11 Ian Dall 2010-04-15 04:17:31 UTC
You understand me correctly! Pre-pivot, the mount executable used is the one in the initramfs and it runs as root. Post pivot the mount which runs is on the root partition which (without the patch) is SUID nfsnobody. Now maybe there is a different solution which involves using chroot to run rpc.idmapd, restarting it or giving it a kick to read the new /etc/passwd early in rc.sysinit, but the patch as given works, at least enough to get the readonly-root rwtab and statetab magic done and get to a login prompt with at least most services running. The "mount: only root can do that" are definitely gone!

I'll check when I get a chance whether there is any legacy effect whereby rpc.ipdmapd only recognizes uid's in the intramfs version of /etc/passwd.

Comment 12 Ian Dall 2010-04-15 04:39:16 UTC
Reading the previous comments eg #2, I should perhaps clarify. rpc.idmapd IS started early enough and it IS left running through the pivot (all with dracut-004-4). The sunrpc vfs is moved to the new root prior to the pivot.

Comment 14 Robin Rainton 2010-04-15 11:01:59 UTC
It's true... I can confirm if one places a line for 'root' in the initfs /etc/passwd incredibly the system boots correctly!

Does anyone know when this is likely to be rolled into an FC12 release?

Comment 15 Harald Hoyer 2010-04-15 11:06:27 UTC
yes, I should release an update for F12... maybe even tomorrow

Comment 16 Fedora Update System 2010-04-15 14:40:54 UTC
dracut-005-2.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/dracut-005-2.fc12

Comment 17 Fedora Update System 2010-04-15 14:48:55 UTC
dracut-005-2.fc12 has been submitted as an update for Fedora 12.
http://admin.fedoraproject.org/updates/dracut-005-2.fc12

Comment 18 Harald Hoyer 2010-04-15 14:50:13 UTC
*** Bug 564294 has been marked as a duplicate of this bug. ***

Comment 19 Ian Dall 2010-04-15 23:12:33 UTC
However celebration should be muted!

I have discovered that bind mounts don't work, which breaks the rwtab and statetab stuff if you are using readonly-root. I think this is really a separate bug and probably a kernel one. I have reported it as 
https://bugzilla.kernel.org/show_bug.cgi?id=15789

Comment 20 Fedora Update System 2010-04-16 23:44:11 UTC
dracut-005-2.fc12 has been pushed to the Fedora 12 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update dracut'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/dracut-005-2.fc12

Comment 21 Ian Dall 2010-04-27 10:51:21 UTC
For nfs4 root, note the kernel bug
https://bugzilla.kernel.org/show_bug.cgi?id=15854

It seems that nfs4 root won't work ATM.

Comment 22 Fedora Update System 2010-05-11 19:46:01 UTC
dracut-005-2.fc12 has been pushed to the Fedora 12 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 23 Robin Rainton 2010-05-15 12:12:57 UTC
I just upgraded NFS FC12 box to dracut-005-2.fc12.noarch to try this. Date on the binary is:

# ls -la `which dracut`
-rwxr-xr-x 1 root root 11073 2010-04-15 23:47 /sbin/dracut

I removed and re-installed an FC12 kernel on the diskless machine to regenerate initrd image.

Using the NFS boot args (TFTPd config) "root=nfs4:server:/path/to/root" does not work. On boot one gets an error pertaining to rpc.statd not running when attempting to remount root read/write.

However, using args "root=nfs4:server:/path/to/root nfsrw=rw" seems to work, but...

While the machine appears OK, it probably isn't because the message log has many errors such as:

May 15 22:04:02 iiwi rpc.statd[1305]: creat(/var/lib/nfs/statd/sm/hsem.my.domain) failed: Permission denied
May 15 22:04:02 iiwi rpc.statd[1305]: STAT_FAIL to iiwi.my.domain for SM_MON of 10.1.0.6
May 15 22:04:02 iiwi kernel: lockd: cannot monitor hsem

In this example, 'hsem' is the serving host (10.1.0.6) and 'iiwi' is the diskless, remote booting NFS root host.

Comment 24 Robin Rainton 2010-06-14 21:37:41 UTC
Does anyone know if this bug persists in FC13?

Comment 25 Bug Zapper 2010-11-03 20:42:27 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 26 Bug Zapper 2010-12-03 21:59:14 UTC
Fedora 12 changed to end-of-life (EOL) status on 2010-12-02. Fedora 12 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.