Bug 922031 - NFSroot does not boot with nfsv4 & cannot progress past run level 1 using nfsv3
Summary: NFSroot does not boot with nfsv4 & cannot progress past run level 1 using nfsv3
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: dracut
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: dracut-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-03-15 12:40 UTC by Robin Rainton
Modified: 2018-09-01 23:40 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-17 14:51:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Robin Rainton 2013-03-15 12:40:21 UTC
Description of problem:

Way back when I built a PXE booting FC14 system with NFS root. This no longer appears possible with FC18.

How reproducible: Consistent behaviour

Steps to Reproduce:
1. Create a new, blank install on HDD.
2. Copy HDD root file system to NFS export on server.
3. Mount NFS root (nfsv4) in a temporary location to check permissions, idmapd, etc. are all working as expected.
4. Copy kernel image and initrd (created with NFS modules installed) to relevant TFTP locations on server.
5. Modify fstab on NFS export to reflect NFS location.
6. Configure DHCPD, etc. to enable network boot.
7. Disconnect HDD on client and network boot.
  
Actual results:

Using nfsv4 (kernel arguments root=/dev/nfs nfsroot=server:/netboot/client) various mounts fail during boot process (sys-kernel-config.mount, dev-hugepages.mount, etc). System hangs.

Using nfsv3 (kernel arguments root=/dev/nfs nfsroot=server:/netboot/client,nfsvers=3) one is able to boot a system, but only to run level 1, which appears usable. Trying to progress to run level 3 is not possible and system hangs.

For the nfsv4 problem I have no idea why the mounts fail. Have tried the alternate kernel argument syntax "nfs4:<serverip/hostname>/<location>" without luck.

For nfsv3 the problem may be related to missing support for extended attributes? Running 'yum update' while in single user mode finds similar errors as described https://bugzilla.redhat.com/show_bug.cgi?id=699897

Expected results:

System should boot as usual.

Additional info:

NFS server is Centos 6.3.

Comment 1 Bill Nottingham 2013-03-15 19:14:04 UTC
Moving to systemd, although also could be dracut, or the xattr issue as you mention. Does disabling SELinux change things?

Comment 2 Robin Rainton 2013-03-15 21:05:32 UTC
selinux is disabled on both server and client.

Comment 3 Harald Hoyer 2013-03-18 10:21:44 UTC
IIRC, we don't support root on NFS officially.

IIRC, nfs4 heavily relies on idmapd.conf being setup properly and
"rd.nfs.domain=<NFSv4 domain name>" being set on the kernel command line.

Comment 4 Robin Rainton 2013-03-18 11:16:43 UTC
Thank you for the suggestion with nfsv4. The "rd.nfs.domain=..." suggestion does allow the sys-kernel-config.mount, dev-hugepages.mount, etc to complete successfully.

However, the system still hangs moments later, just after presenting a message saying raw IDs are not supported by the NFS server and that ID mapper will be re-enabled.

I will try and read more about how to force raw ID mapping and report back.

Comment 5 Robin Rainton 2013-03-18 21:02:29 UTC
Seems that the only way round this is to make sure rpc.idmapd is started before trying to mount the root FS.

However, I don't seem to be able to get Dracut to create an image that will do that.

Perhaps this is the underlying problem?

Comment 6 Lennart Poettering 2013-05-06 16:52:03 UTC
Reassigning since htis is now about dracut including the necessary NFS tools?

Comment 7 Jóhann B. Guðmundsson 2013-05-06 16:57:15 UTC
probably needs new nfs unit file as well...

Comment 8 Harald Hoyer 2013-05-07 08:52:37 UTC
(In reply to comment #0)
> Description of problem:
> 
> Way back when I built a PXE booting FC14 system with NFS root. This no
> longer appears possible with FC18.
> 
> How reproducible: Consistent behaviour
> 
> Steps to Reproduce:
> 1. Create a new, blank install on HDD.
> 2. Copy HDD root file system to NFS export on server.
> 3. Mount NFS root (nfsv4) in a temporary location to check permissions,
> idmapd, etc. are all working as expected.
> 4. Copy kernel image and initrd (created with NFS modules installed) to
> relevant TFTP locations on server.
> 5. Modify fstab on NFS export to reflect NFS location.
> 6. Configure DHCPD, etc. to enable network boot.
> 7. Disconnect HDD on client and network boot.
>   
> Actual results:
> 
> Using nfsv4 (kernel arguments root=/dev/nfs nfsroot=server:/netboot/client)
> various mounts fail during boot process (sys-kernel-config.mount,
> dev-hugepages.mount, etc). System hangs.
> 
> Using nfsv3 (kernel arguments root=/dev/nfs
> nfsroot=server:/netboot/client,nfsvers=3) one is able to boot a system, but
> only to run level 1, which appears usable. Trying to progress to run level 3
> is not possible and system hangs.
> 


Please do not use root=/dev/nfs!

See dracut.cmdline(7):

NFSv4: root=server:/netboot/client rd.nfs.domain=<NFSv4 domain name>
NFSv3: root=server:/netboot/client,nfsvers=3

Comment 9 Harald Hoyer 2013-05-07 08:54:43 UTC
(In reply to comment #8)
> (In reply to comment #0)
> > Description of problem:
> > 
> > Way back when I built a PXE booting FC14 system with NFS root. This no
> > longer appears possible with FC18.
> > 
> > How reproducible: Consistent behaviour
> > 
> > Steps to Reproduce:
> > 1. Create a new, blank install on HDD.
> > 2. Copy HDD root file system to NFS export on server.
> > 3. Mount NFS root (nfsv4) in a temporary location to check permissions,
> > idmapd, etc. are all working as expected.
> > 4. Copy kernel image and initrd (created with NFS modules installed) to
> > relevant TFTP locations on server.
> > 5. Modify fstab on NFS export to reflect NFS location.
> > 6. Configure DHCPD, etc. to enable network boot.
> > 7. Disconnect HDD on client and network boot.
> >   
> > Actual results:
> > 
> > Using nfsv4 (kernel arguments root=/dev/nfs nfsroot=server:/netboot/client)
> > various mounts fail during boot process (sys-kernel-config.mount,
> > dev-hugepages.mount, etc). System hangs.
> > 
> > Using nfsv3 (kernel arguments root=/dev/nfs
> > nfsroot=server:/netboot/client,nfsvers=3) one is able to boot a system, but
> > only to run level 1, which appears usable. Trying to progress to run level 3
> > is not possible and system hangs.
> > 
> 
> 
> Please do not use root=/dev/nfs!
> 
> See dracut.cmdline(7):
> 
> NFSv4: root=server:/netboot/client rd.nfs.domain=<NFSv4 domain name>
> NFSv3: root=server:/netboot/client,nfsvers=3

NFSv4: root=nfs4:server:/netboot/client rd.nfs.domain=<NFSv4 domain name>

Comment 10 Harald Hoyer 2013-05-29 12:26:03 UTC
any progress?

Comment 11 Robin Rainton 2013-07-12 12:31:06 UTC
Sorry for the delay. Sadly still not working but here's what happens:

This...

root=server:/netboot/client rd.nfs.domain=<NFSv4 domain name>

... doesn't work. The client says the root target is not recognised.

This...

root=nfs4:server:/netboot/client rd.nfs.domain=<NFSv4 domain name>

... doesn't cause the client to drop into an emergency shell. The system doesn't boot successfully either. At least it writes /var/log/boot.log on the NFS root. The client can be seen calling idmapd for users 'nobody' and 'utmp'. It's promising, but still not working.

The last line on the client console before the system hangs is about systemd-journald receiving SIGUSR1

This...

root=nfs:server:/netboot/client rd.nfs.domain=<NFSv4 domain name>

... is pretty much the same, but hangs a little sooner. I don't see 'utmp' user being processed by the server idmapd.

Comment 12 Robin Rainton 2013-07-21 07:10:34 UTC
As a heads up... I tried to make NFS root work with FC19.

While the "root=nfs4:server:/netboot/client" option works the root FS still doesn't appear to work properly.

I think this is something to do with ID mapper but am not sure.

Does anyone know how to debug NFS or ID mapper issues? On either the client or server? Ideally we need to see on the server when a client connects and when files are requested. I know this could create a tonne of logging, but at least one could then see what is and isn't working.

Comment 13 Fedora End Of Life 2013-12-21 12:10:26 UTC
This message is a reminder that Fedora 18 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 18. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '18'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 18's end of life.

Thank you for reporting this issue and we are sorry that we may not be 
able to fix it before Fedora 18 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior to Fedora 18's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 14 Mick 2013-12-28 06:02:21 UTC
I have been able to verify it works using FC20

nfsv3

Create a new root file system:
 yum --releasever=20 groupinstall "Minimal Install" --installroot=/srv/nfsroot

Disable selinux inside it.

exports:
/srv/nfsroot 192.168.1.0/24(rw,no_root_squash,no_subtree_check,fsid=root)

Make new initramfs:
dracut -M --force -m "nfs base network kernel-modules bash ifcfg drm terminfo udev-rules systemd usrmount fs-lib shutdown" initramfs-nfs-only.img

Create a pxeboot/dhcp/tftp setup for it:
default vesamenu.c32
prompt 1
timeout 5

label linux
  menu label ^fedpxeyum nfsroot
  menu default
  kernel vmlinuz
  append initrd=initramfs.img ip=dhcp root=nfs:192.168.1.27:/srv/nfsroot

V4 works as well however don't ask me how you make it work if you hang multiple nodes off the same export like i would like to. The root fs thing leaves me scratching my head.

Only changes listed.

exports:
/srv/nfsroot/fedpxeyum  192.168.1.0/24(rw,no_root_squash,no_subtree_check,fsid=root)

Create a pxeboot/dhcp/tftp setup for it:
default vesamenu.c32
prompt 1
timeout 5

label linux
  menu label ^fedpxeyum nfsroot
  menu default
  kernel vmlinuz
  append initrd=initramfs.img ip=dhcp root=nfs4:192.168.1.27:/ rd.nfs.domain=yourdomain.net

Comment 15 Mick 2013-12-28 07:32:25 UTC
Actually v4 not so hard.

Exports:
/srv/nfsroot        192.168.1.0/24(rw,no_root_squash,no_subtree_check,fsid=root)
/srv/nfsroot/fedpxe 192.168.1.0/24(rw,no_root_squash,no_subtree_check,nohide)
/srv/nfsroot/fedpxeyum  192.168.1.0/24(rw,no_root_squash,no_subtree_check,nohide)
.
.
.
etc

Then append line becomes
append initrd=initramfs.img ip=dhcp root=nfs4:192.168.1.27:/fedpxeyum rd.nfs.domain=yourdomain.net

Comment 16 Fedora End Of Life 2015-01-09 17:46:24 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 17 Fedora End Of Life 2015-02-17 14:51:58 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 18 Roland Mainz 2015-02-19 22:27:56 UTC
Is this still an issue on Fedora F22 ?

Comment 19 Rick 2018-09-01 23:40:36 UTC
FWIW, in case anyone finds this via google, I was able to get NFS root to work in Fedora 28, but not in Fedora 25.  Super weird, no idea what was different in my experiments.

I used the default initrd and vmlinuz that shipped with the distribution, no need to use dracut to customize the former.

I specifically used NFSv3, not v4.  (4 flat out didn't work, and trying to make it work ended up requiring a reboot of my NFS server.)

I had to disable SELinux at the kernel command line (with it on, wouldn't get past network configuration).

Per #14 above, I did have to create the base chroot via:

dnf groupinstall "minimal*" --releasever=28 --installroot=/srv/nfsroot/fedora28

I also needed to disable SELinux on the chroot serving host before I could chroot into it and set passwords for users successfully.

HTH.


Note You need to log in before you can comment on or make changes to this bug.