Bug 1422675 - nfs v4 with kerberos fails to mount
Summary: nfs v4 with kerberos fails to mount
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nfs-utils
Version: 7.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Yongcheng Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-02-15 20:20 UTC by chris vogan
Modified: 2023-08-14 12:11 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1192806
Environment:
Last Closed: 2020-01-07 18:49:44 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description chris vogan 2017-02-15 20:20:19 UTC
+++ This bug was initially created as a clone of Bug #1192806 +++

Description of problem:
I have 2 system with FC21, one new PC and an older notebook
both system exports NFS4 resources and can mount the exports
from the other system.

On the notebook, mounting the exported directories from the PC
fail:
Feb 15 13:56:01 nb rpc.gssd[948]: ERROR: unable to resolve 2001:db8:1::4 ...wn
Feb 15 13:56:01 nb rpc.gssd[948]: ERROR: failed to read service info
Feb 15 13:56:01 nb rpc.gssd[948]: ERROR: unable to resolve 192.168.1.4 to...wn
Feb 15 13:56:01 nb rpc.gssd[948]: ERROR: failed to read service info

If I restart rpc.gssd or nfs-secure on the notebook the following mount command do his job.

On the PC mounting the exports from the notebook always work.

Version-Release number of selected component (if applicable):
nfs-utils-1.3.1-6.0.fc21.x86_64
kernel 3.18.5-201.fc21.x86_64

How reproducible:

I don't know how to reproduce this on other systems.

Steps to Reproduce:
1. boot both system
2. issue "mount /mnt/pc" on the notebook 
3.

Actual results:
mount print mount.nfs: access denied by server while mounting pc:/

Expected results:
no error

Additional info:

The mount lines within /etc/fstab look like:

pc:/ /mnt/pc nfs sec=krb5p,rw,noauto 0 0

The /etc/exorts look like:

/srv/nfs4 gss/krb5p(rw,fsid=0,insecure,subtree_check,async)
/srv/nfs4/exports gss/krb5p(rw,nohide,insecure,no_subtree_check,async)

Kerberos 5 is installed on an other system (debian based) which
act also as dns/dhcp/dhcpv6 server
The files /etc/krb5.keymap on all systems are identical

--- Additional comment from Steve Dickson on 2015-02-26 18:07:38 EST ---

(In reply to Jean-Jacques Sarton from comment #0)
> Description of problem:
> I have 2 system with FC21, one new PC and an older notebook
> both system exports NFS4 resources and can mount the exports
> from the other system.
> 
> On the notebook, mounting the exported directories from the PC
> fail:
> Feb 15 13:56:01 nb rpc.gssd[948]: ERROR: unable to resolve 2001:db8:1::4
> ...wn
> Feb 15 13:56:01 nb rpc.gssd[948]: ERROR: failed to read service info
> Feb 15 13:56:01 nb rpc.gssd[948]: ERROR: unable to resolve 192.168.1.4
> to...wn
> Feb 15 13:56:01 nb rpc.gssd[948]: ERROR: failed to read service info
> 
This looks like a DNS issued... Why can't those address be resolved?

--- Additional comment from Jean-Jacques Sarton on 2015-02-27 01:50:05 EST ---

DNS work as expected. At mount time the names are resolved as required (Call to host pc). If I insert the pc addresses to the /etc/hosts file the problem remain.

If I look ate the time where the daemon are started on both systems, there are no differences which can explain the problem.

The configuration of both pc and notebook are practically identical.

--- Additional comment from Jean-Jacques Sarton on 2015-02-27 06:57:31 EST ---

If the /etc/fstab file contain the IP Address for the server the same problem occur so that the problem can't be DNS.

--- Additional comment from Steve Dickson on 2015-03-22 11:14:16 EDT ---

(In reply to Jean-Jacques Sarton from comment #3)
> If the /etc/fstab file contain the IP Address for the server the same
> problem occur so that the problem can't be DNS.

Here is the code that is failing 

    err = getnameinfo(sa, addrlen, hbuf, sizeof(hbuf), NULL, 0,
              NI_NAMEREQD);
    if (err) {
        printerr(0, "ERROR: unable to resolve %s to hostname: %s\n",
             addr, err == EAI_SYSTEM ? strerror(errno) :
                           gai_strerror(err));
        return NULL;
    }
What is the entire error message? I'm looking for string that
either gai_strerror() or strerror() logged.

--- Additional comment from Jean-Jacques Sarton on 2015-04-01 12:49:53 EDT ---

Sorry for the delay, the system nb what defekt so I had to buy a new system and to install the old software ti the new one.

The message are:

rpc.gssd[###]: ERROR: unable to resolve 2001:db8:1::4 to hostname: Name or service not known
rpc.gssd[###]: ERROR: failed to read service info

--- Additional comment from Steve Dickson on 2015-04-02 07:50:13 EDT ---

(In reply to Jean-Jacques Sarton from comment #5)
> Sorry for the delay, the system nb what defekt so I had to buy a new system
> and to install the old software ti the new one.
> 
> The message are:
> 
> rpc.gssd[###]: ERROR: unable to resolve 2001:db8:1::4 to hostname: Name or
> service not known
> rpc.gssd[###]: ERROR: failed to read service info
Does the ipv4 address for 2001:db8:1::4 work?

--- Additional comment from Jean-Jacques Sarton on 2015-04-02 11:17:03 EDT ---

for IPv4 this exactly the same.

--- Additional comment from Jean-Jacques Sarton on 2015-04-02 12:13:44 EDT ---

This result to the same problem. For Ipv4 there are no behaviour changes.

--- Additional comment from Fedora End Of Life on 2015-11-04 07:50:02 EST ---

This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

--- Additional comment from Fedora End Of Life on 2015-12-02 04:05:01 EST ---

Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

--- Additional comment from Robbert Eggermont on 2016-12-13 20:33:26 EST ---

FYI: I just ran into this bug on my freshly upgraded CentOS 7.3 machine.

The bug seems to be triggered when rpc.gssd is started before the network is fully configured.

I did an strace on the failing rpc.gssd process while trying to do a mount, and it first looks at /etc/hosts and then it tries to query a (non-existant) DNS server at 127.0.0.1(?). 

Note that at this point the network, DNS and all other processes were working fine. But the rpc.gssd process seems to be stuck in some weird (glibc?) host lookup fallback scenario were it doesn't look at /etc/resolv.conf anymore.

After a restart of rpc.gssd everything works and I can see it looking at /etc/resolv.conf and querying the specified DNS servers.

As a workaround I've now delayed the start of rpc.gssd.service until after the network-online.target is reached.

Comment 1 chris vogan 2017-02-15 20:26:56 UTC
I have been hit by this issue on the more current RHEL 7.3.
After a system reboot, kerberos mount no longer work until I restart rpcgssd.
I am using nfs-utils-1.3.0-0.33.el7.x86_64
Kernel: 3.10.0-514.2.2.el7.x86_64

I was able to resolve the issue by having rpc-gssd.service wait for network-online.target.

Comment 2 Tomasz Kepczynski 2017-02-23 21:57:07 UTC
I have the same issue on some of my systems (and why on those and the other work is a mystery to me). The logged message is a bit different however:

Feb 23 22:39:07 gklab-20-082 rpc.gssd[1537]: ERROR: unable to resolve 172.28.168.84 to hostname: Name or service not known
Feb 23 22:39:07 gklab-20-082 rpc.gssd[1537]: ERROR: failed to parse nfs/clnt0/info
Feb 23 22:39:07 gklab-20-082 rpc.gssd[1537]: ERROR: can't openat nfs/clnt0: No such file or directory
Feb 23 22:39:07 gklab-20-082 rpc.gssd[1537]: ERROR: unable to resolve 172.28.168.84 to hostname: Name or service not known
Feb 23 22:39:07 gklab-20-082 rpc.gssd[1537]: ERROR: failed to parse nfs/clnt1/info
Feb 23 22:39:07 gklab-20-082 rpc.gssd[1537]: ERROR: unable to resolve 172.28.168.84 to hostname: Name or service not known
Feb 23 22:39:07 gklab-20-082 rpc.gssd[1537]: ERROR: failed to parse nfs/clnt1/info

rpc.gssd restart helps here as well.

/etc/hosts only holds 127.0.0.1 and ::1 addresses. dig, host and nslookup correctly resolve the above mentioned address to the nfs server name and forward resolution points back to the same address.

The client uses DHCP for interface configuration and I wouldn't be surprised if the problem weas a contention between the DHCP client address assignment to the interface and configuration updates (/etc/resolv.conf) and rpc.gssd starting so early it doesn't get the DNS server addresses from yet non-existent (or old) /etc/resolv.conf.

Comment 3 Tomasz Kepczynski 2017-02-23 22:08:39 UTC
Ok, the easiest way to reproduce:

1. Edit /etc/resolv.conf so it is empty or move it somewhere else.
2. Restart rpc.gssd (I did systemctl restart nfs-secure).
3. Restore /etc/resolv.conf to working condition so the resolution works.
4. Attempt to mount nfs share with kerberos security.
5. The mount fails and the log file contains the above mentioned entries.

Apparently rpc.gssd only reads /etc/resolv.conf when it starts and caches this somewhere. I am not sure this is how it is supposed to work and apparently is error prone.

Comment 4 Mark Crossland 2018-01-16 12:19:44 UTC
I can reproduce this on RHEL 7.4 Clients that use DHCP from a fresh boot. I can confirm that restarting the rpc-gssd service then fixes the issue until the box is rebooted again.

I can also reproduce using the steps outlined by Tomasz Kepczynski.

Comment 5 Yongcheng Yang 2018-01-17 02:18:34 UTC
JFYI. recently we update some systemd scripts of nfs-utils to resolve the similar issue:
https://bugzilla.redhat.com/show_bug.cgi?id=1409012#c2

My 2 cents is that maybe rpc-gssd.service needs that update also.

P.S. just checked that current upstream code is the same as RHEL 7.4. Possibly we need some experts to submit an upstream patch firstly.

Comment 6 Rinku 2018-01-26 05:15:48 UTC
Increasing the bug priority as it has been quiet sometime that the bug is opened and customer needs a fix.

Comment 7 Steve Dickson 2018-06-21 13:46:34 UTC
(In reply to Rinku from comment #6)
> Increasing the bug priority as it has been quiet sometime that the bug is
> opened and customer needs a fix.

I see this error when rpc.gssd is start w/out a /etc/resolv.conf
ERROR: unable to resolve 172.31.1.60 to hostname: Name or service not known
ERROR: failed to parse nfs/clntb/info

but when I restore /etc/resolv.conf and do a krb5 mount... it works!

Comment 9 Brian J. Murrell 2018-10-05 13:59:55 UTC
I see this all the time for IPv6 SLAAC configured hosts.  These are typically not going to have hostnames resolved from their IP address.

What exactly is rpc.gssd trying to resolve IP addresses into names for?

What triggers this resolution to need to happen?

Comment 10 Tomasz Kepczynski 2018-10-05 16:47:58 UTC
To cut the story short - Kerberos HEAVILY depends on name resolution.

Comment 11 Brian J. Murrell 2018-10-05 17:01:22 UTC
So TL;DR: rpc.gssd and therefore NFS4, etc. are all going to be quite incompatible with SLAAC unless there is some mechanism in SLAAC to register reverse address records from SLAAC obtained IP addresses.

Comment 14 Mark Crossland 2020-01-08 06:16:05 UTC
Please could you expand on why this is "NOTABUG"? It fails to work from a fressh boot unless you do a manual restart of the rpc.gssd service. Which means that mounting NFS mounts using keberos also does not work without manual intervantion. meaing that this functionalilty fails to work in a DHCP environment (ours connects to Active Directory) where /etc/resolv.conf is generated by NetworkManager.

Comment 15 Brian J. Murrell 2020-01-08 11:49:15 UTC
Agree with Mark Crossland.  My most recent question(s) were not even answered.

Comment 16 Dave Wysochanski 2020-01-08 12:30:43 UTC
(In reply to Mark Crossland from comment #14)
> Please could you expand on why this is "NOTABUG"? It fails to work from a
> fressh boot unless you do a manual restart of the rpc.gssd service. Which
> means that mounting NFS mounts using keberos also does not work without
> manual intervantion. meaing that this functionalilty fails to work in a DHCP
> environment (ours connects to Active Directory) where /etc/resolv.conf is
> generated by NetworkManager.

Can you still reproduce this problem with:
- the latest RHEL7 release (7.7)
- a working DNS environment

This bug was reported 3 years ago, on a very early version of RHEL7 and cloned from a fedora bug.  It had no activity for over a year.  Normally critical bugs seen by many users have more activity than this.

Comment 17 Brian J. Murrell 2020-01-08 13:15:21 UTC
Can you define:

- a working DNS environment

Does that just mean there is a resolver available or does it mean that it has to have (valid and accurate) reverse mappings for any host that tries to use NFS?

The latter is likely not/never (but I never say never) going to exist with things like IPv6 SLAAC.

Comment 18 Waheed Barghouthi 2020-05-04 13:15:48 UTC
I think it would be great to get an answer on the below questions asked by Brian J. Murrell.

What exactly is rpc.gssd trying to resolve IP addresses into names for?

What triggers this resolution to need to happen?

Comment 19 JianHong Yin 2023-08-14 12:11:30 UTC
(In reply to Waheed Barghouthi from comment #18)
> I think it would be great to get an answer on the below questions asked by
> Brian J. Murrell.
> 
> What exactly is rpc.gssd trying to resolve IP addresses into names for?
> 
> What triggers this resolution to need to happen?

I'm also curious, can anyone kindly answer these questions :)


Note You need to log in before you can comment on or make changes to this bug.