Bug 1523866

Summary: autofs with NIS logs add_host_addrs: hostname lookup failed: Name or service not known/No address associated with hostname
Product: [Fedora] Fedora Reporter: RobbieTheK <rkudyba>
Component: autofsAssignee: Ian Kent <ikent>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 27CC: ikent
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: autofs-5.1.4-5.fc26 autofs-5.1.4-5.fc27 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-30 21:38:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
tcpdump cap file none

Description RobbieTheK 2017-12-08 21:08:48 UTC
All of our Fedora 26 servers and workstations log the below several times a day. I tried enabling debug logs with autofs, no warnings, no errors. I linked an old bug that might be related. 

automount[xxxx]: add_host_addrs: hostname lookup failed: Name or service not known as well as add_host_addrs: hostname lookup failed: No address associated with hostname

In /etc/hosts we commentoed IPV6 just in case it was causing this: 
#::1    localhost.localdomain   localhost.localdomain   localhost6      localhost6.localdomain6 localhost   

/etc/hostname has the correct value with the FQDN. Could that be related?

/etc/nsswitch.conf has:
automount:  files nis

Anything else I can provide?

Comment 1 Ian Kent 2017-12-09 01:24:14 UTC
Can you provide the debug log, I might be able to get
a better picture of what is going on at the time from
it.

I take it your saying this is new in that it suddenly
started happening?

Comment 2 RobbieTheK 2017-12-09 01:41:57 UTC
(In reply to Ian Kent from comment #1)
> Can you provide the debug log, I might be able to get
> a better picture of what is going on at the time from
> it.

Have a look here: https://pastebin.com/vgP1ZNWJ

> I take it your saying this is new in that it suddenly
> started happening?

Oh no I've been seeing this log for a couple of years. I could never reproduce it or figure out why finally decided to open a bug report to see if someone knew about it.

Comment 3 Ian Kent 2017-12-09 01:50:34 UTC
(In reply to RobbieTheK from comment #2)
> (In reply to Ian Kent from comment #1)
> > Can you provide the debug log, I might be able to get
> > a better picture of what is going on at the time from
> > it.
> 
> Have a look here: https://pastebin.com/vgP1ZNWJ

That's not really useful.

What I was hoping for is a longer term log to try and get
a picture of what happens as time apsses in case there is
something unusual.

> 
> > I take it your saying this is new in that it suddenly
> > started happening?
> 
> Oh no I've been seeing this log for a couple of years. I could never
> reproduce it or figure out why finally decided to open a bug report to see
> if someone knew about it.

I'll probably need to add some instrumentation around those
calls to check what's being passed to them to start with.

Ian

Comment 4 RobbieTheK 2017-12-09 01:56:44 UTC
> What I was hoping for is a longer term log to try and get
> a picture of what happens as time apsses in case there is
> something unusual.

Should I just let the logs run with debug enabled for a couple of days?


> > > I take it your saying this is new in that it suddenly
> > > started happening?
> > 
> > Oh no I've been seeing this log for a couple of years. I could never
> > reproduce it or figure out why finally decided to open a bug report to see
> > if someone knew about it.
> 
> I'll probably need to add some instrumentation around those
> calls to check what's being passed to them to start with.

Should I wait for a debug RPM to install?

Comment 5 Ian Kent 2017-12-09 02:15:34 UTC
(In reply to RobbieTheK from comment #4)
> > What I was hoping for is a longer term log to try and get
> > a picture of what happens as time apsses in case there is
> > something unusual.
> 
> Should I just let the logs run with debug enabled for a couple of days?

Yes please.

> 
> 
> > > > I take it your saying this is new in that it suddenly
> > > > started happening?
> > > 
> > > Oh no I've been seeing this log for a couple of years. I could never
> > > reproduce it or figure out why finally decided to open a bug report to see
> > > if someone knew about it.
> > 
> > I'll probably need to add some instrumentation around those
> > calls to check what's being passed to them to start with.
> 
> Should I wait for a debug RPM to install?

Yes, I'll put something together.

As you probably guested this isn't the first time this has been
seen.

Looking through my list of upstream patches (I usually make
changes to the upstream source and then apply them to the
downstream package) I see a patch from quite a while ago for
this problem.

It's way down near the bottom of my patch list because I
never got any feedback about whether it helped.

It is kind of a long shot because it is based on the
assumption there's a bug in getaddrinfo(3) which is
unlikely.

I'll add a couple more prints around the calls your seeing
the problem with and build an rpm that includes it.

Ian

Comment 6 Ian Kent 2017-12-09 02:57:45 UTC
(In reply to Ian Kent from comment #5)
> > 
> > Should I wait for a debug RPM to install?
> 
> Yes, I'll put something together.

Could you give this package a try please:
https://koji.fedoraproject.org/koji/taskinfo?taskID=23603751

snip ...

> 
> I'll add a couple more prints around the calls your seeing
> the problem with and build an rpm that includes it.

Assuming this isn't a bug in getaddrinfo(3) what I'm looking
for is if there are any extra characters in the name being
looked up (or garbage which might require more work to check).
So keep an eye on the log prints for the name lookup failure
sites.

The other possibility is that a lookup is coming in at
shutdown at just the wrong time and the name has already
been freed but that's much harder to check out. An overview
of what's going on from the debug log might help in checking
if this is the case too.

Anyway, this is (and has been) difficult to diagnose so lets
hope for the best, ;)

Ian

Comment 7 RobbieTheK 2017-12-09 03:06:20 UTC
(In reply to Ian Kent from comment #6)
> (In reply to Ian Kent from comment #5)
> > > 
> > > Should I wait for a debug RPM to install?
> > 
> > Yes, I'll put something together.
> 
> Could you give this package a try please:
> https://koji.fedoraproject.org/koji/taskinfo?taskID=23603751
> 
> snip ...

Done and installed/running.

> > I'll add a couple more prints around the calls your seeing
> > the problem with and build an rpm that includes it.
> 
> Assuming this isn't a bug in getaddrinfo(3) what I'm looking
> for is if there are any extra characters in the name being
> looked up (or garbage which might require more work to check).
> So keep an eye on the log prints for the name lookup failure
> sites.
> 
> The other possibility is that a lookup is coming in at
> shutdown at just the wrong time and the name has already
> been freed but that's much harder to check out. An overview
> of what's going on from the debug log might help in checking
> if this is the case too.
> 
> Anyway, this is (and has been) difficult to diagnose so lets
> hope for the best, ;)

I noted the other BZ 1187165 that had a reference to getaddrinfo. Should any other services be set to debug logs like NIS/ypbind?

Comment 8 RobbieTheK 2017-12-09 03:22:07 UTC
I put the RPM on our NIS server and enabled debug. I mention this as the NIS master gets these errors too: ypserv[xxxxx]: refused connect from x.x.x.x.:33934 to procedure ypproc_match (ourdomain,shadow.byname;-1) which may or may not be related/red herring.

Comment 9 RobbieTheK 2017-12-09 03:29:54 UTC
Also possibly related or not is the Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=710490

I emailed the submitter and asked him if he ever got a resolution and here was his reply:
"If my analysis is right the cause is autofs/automount or whatever doing host name lookups before the network is ready and caching the host name lookup failure.

The quick and dirty fix is to restart autofs when the network is really ready."

But he also used DHCP.

Comment 10 Ian Kent 2017-12-09 03:45:20 UTC
That certainly gives us some more things to check.

Problems relating the network not being available at startup
have been around for a while too, in various forms, and have
proven difficult to deal with, that's for sure.

There are some changes in autofs 5.1.3 that attempt to
work around some cases of the "network not ready at startup"
type problem but there were a couple of annoying regressions
introduced too.

I'm still struggling to get autofs 5.1.4 out which should
fix the regressions and allow us to use the network not
ready changes.

I think I'm fairly close to getting 5.1.4 out.

In the meantime I'd really like to know if the getaddrinfo()
usage change makes a difference.

I'm pretty sure that, at the time the glibc folks said there
was no bug like this in getaddrinfo() but I wanted to check
anyway (but failed to get any feedback).

Ian

Comment 11 Ian Kent 2017-12-09 03:51:08 UTC
(In reply to Ian Kent from comment #10)
> 
> Problems relating the network not being available at startup
> have been around for a while too, in various forms, and have
> proven difficult to deal with, that's for sure.
> 

The other thing is that, IIRC, in at least one other bug (and
this was quite a while ago) there was no evidence in debug logs
I looked at that this was related to autofs startup. The logged
error messages appeared intermittent and occurred well after
autofs had been started.....

Comment 12 RobbieTheK 2017-12-09 03:58:55 UTC
(In reply to Ian Kent from comment #11)
> (In reply to Ian Kent from comment #10)
> > 
> > Problems relating the network not being available at startup
> > have been around for a while too, in various forms, and have
> > proven difficult to deal with, that's for sure.
> > 
> 
> The other thing is that, IIRC, in at least one other bug (and
> this was quite a while ago) there was no evidence in debug logs
> I looked at that this was related to autofs startup. The logged
> error messages appeared intermittent and occurred well after
> autofs had been started.....

This one? https://bugzilla.redhat.com/show_bug.cgi?id=682833 You replied in the thread.

Comment 13 RobbieTheK 2017-12-09 04:03:26 UTC
Does this mount failed with error code 16, retrying with the -f option mean anything useful?

Dec  8 23:01:17 dsm automount[26568]: st_expire: state 1 path /u
Dec  8 23:01:17 dsm automount[26568]: expire_proc: exp_proc = 140023695636224 path /u
Dec  8 23:01:17 dsm automount[26568]: expire_proc_indirect: expire /u/erdos/kudyba/.gvfs
Dec  8 23:01:17 dsm automount[26568]: handle_packet: type = 4
Dec  8 23:01:17 dsm automount[26568]: handle_packet_expire_indirect: token 94742, name pascal
Dec  8 23:01:17 dsm automount[26568]: expiring path /u/pascal
Dec  8 23:01:17 dsm automount[26568]: umount_multi: path /u/pascal incl 1
Dec  8 23:01:17 dsm automount[26568]: umount_subtree_mounts: unmounting dir = /u/pascal
Dec  8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is busy
Dec  8 23:01:17 dsm automount[26568]: spawn_umount: mount failed with error code 16, retrying with the -f option
Dec  8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is busy
Dec  8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is busy
Dec  8 23:01:17 dsm automount[26568]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Dec  8 23:01:17 dsm automount[26568]: expired /u/pascal
Dec  8 23:01:17 dsm automount[26568]: dev_ioctl_send_ready: token = 94742
Dec  8 23:01:17 dsm automount[26568]: expire_proc_indirect: expire /u/pascal

Comment 14 Ian Kent 2017-12-09 04:07:34 UTC
(In reply to RobbieTheK from comment #8)
> I put the RPM on our NIS server and enabled debug. I mention this as the NIS
> master gets these errors too: ypserv[xxxxx]: refused connect from
> x.x.x.x.:33934 to procedure ypproc_match (ourdomain,shadow.byname;-1) which
> may or may not be related/red herring.

Yeah, but that looks like a name lookup failure, possibly due
an access refusal within a securenet setup.

ypmatch is used in the lookup module to get the NIS map entry
based on the key name. If that fails then I don't think autofs
will have a map entry location so it should fail at that point.

The call site where these name lookups are failing is where the
map entry location is being broken into a list of entries that
can be tried as mounts. Without a map entry location (the NIS
entry value field) I don't think autofs will even get to the
name lookup calls we are seeing the problem with.

But if we do see invalid or garbage characters in the names
autofs is trying to lookup (and failing) then it will be worth
looking into this further.

Ian

Comment 15 Ian Kent 2017-12-09 04:27:12 UTC
(In reply to RobbieTheK from comment #12)
> (In reply to Ian Kent from comment #11)
> > (In reply to Ian Kent from comment #10)
> > > 
> > > Problems relating the network not being available at startup
> > > have been around for a while too, in various forms, and have
> > > proven difficult to deal with, that's for sure.
> > > 
> > 
> > The other thing is that, IIRC, in at least one other bug (and
> > this was quite a while ago) there was no evidence in debug logs
> > I looked at that this was related to autofs startup. The logged
> > error messages appeared intermittent and occurred well after
> > autofs had been started.....
> 
> This one? https://bugzilla.redhat.com/show_bug.cgi?id=682833 You replied in
> the thread.

Ha, not that one, but that is interesting.

The suggestion is that autofs handles transient network failures
badly.

But, as seen in the bug, I was unable to find out why that occurred
and if the the host name information can't be retrieved the list
of potential mounts can't be constructed.

If this is in fact what's happening in your case perhaps some retry
logic is needed to cover short network not available events ....

Comment 16 Ian Kent 2017-12-09 04:35:08 UTC
(In reply to RobbieTheK from comment #13)
> Does this mount failed with error code 16, retrying with the -f option mean
> anything useful?

That's actually a umount not a mount.
The message text is wrong and needs to be fixed.
I'll do that for 5.1.4.

> 
> Dec  8 23:01:17 dsm automount[26568]: st_expire: state 1 path /u
> Dec  8 23:01:17 dsm automount[26568]: expire_proc: exp_proc =
> 140023695636224 path /u
> Dec  8 23:01:17 dsm automount[26568]: expire_proc_indirect: expire
> /u/erdos/kudyba/.gvfs
> Dec  8 23:01:17 dsm automount[26568]: handle_packet: type = 4
> Dec  8 23:01:17 dsm automount[26568]: handle_packet_expire_indirect: token
> 94742, name pascal
> Dec  8 23:01:17 dsm automount[26568]: expiring path /u/pascal
> Dec  8 23:01:17 dsm automount[26568]: umount_multi: path /u/pascal incl 1
> Dec  8 23:01:17 dsm automount[26568]: umount_subtree_mounts: unmounting dir
> = /u/pascal
> Dec  8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is
> busy
> Dec  8 23:01:17 dsm automount[26568]: spawn_umount: mount failed with error
> code 16, retrying with the -f option
> Dec  8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is
> busy
> Dec  8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is
> busy
> Dec  8 23:01:17 dsm automount[26568]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Dec  8 23:01:17 dsm automount[26568]: expired /u/pascal
> Dec  8 23:01:17 dsm automount[26568]: dev_ioctl_send_ready: token = 94742
> Dec  8 23:01:17 dsm automount[26568]: expire_proc_indirect: expire /u/pascal

I don't see how an expire that failed because the mount to
be umounted became busy could lead to the name lookup problem.

If we do need to investigate this further then we would need a
core of automount to check the structure contents and so forth.
And enough information for me to setup a machine close enough
to the environment where the core was taken.

Comment 17 RobbieTheK 2017-12-10 03:00:58 UTC
Well after 24 hours no error besides the same "spawn_umount: mount failed with error code 16", I'll keep it going.

Dec  9 21:49:02 dsm automount[14325]: handle_packet: type = 4
Dec  9 21:49:02 dsm automount[14325]: handle_packet_expire_indirect: token 95768, name sobolev
Dec  9 21:49:02 dsm automount[14325]: expiring path /u/sobolev
Dec  9 21:49:02 dsm automount[14325]: umount_multi: path /u/sobolev incl 1
Dec  9 21:49:02 dsm automount[14325]: umount_subtree_mounts: unmounting dir = /u/sobolev
Dec  9 21:49:02 dsm automount[14325]: rm_unwanted_fn: removing directory /u/sobolev
Dec  9 21:49:02 dsm automount[14325]: expired /u/sobolev
Dec  9 21:49:02 dsm automount[14325]: dev_ioctl_send_ready: token = 95768
Dec  9 21:49:02 dsm automount[14325]: handle_packet: type = 4
Dec  9 21:49:02 dsm automount[14325]: handle_packet_expire_indirect: token 95769, name erdos
Dec  9 21:49:02 dsm automount[14325]: expiring path /u/erdos
Dec  9 21:49:02 dsm automount[14325]: umount_multi: path /u/erdos incl 1
Dec  9 21:49:02 dsm automount[14325]: umount_subtree_mounts: unmounting dir = /u/erdos
Dec  9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy
Dec  9 21:49:02 dsm automount[14325]: spawn_umount: mount failed with error code 16, retrying with the -f option
Dec  9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy
Dec  9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy
Dec  9 21:49:02 dsm automount[14325]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Dec  9 21:49:02 dsm automount[14325]: expired /u/erdos
Dec  9 21:49:02 dsm automount[14325]: dev_ioctl_send_ready: token = 95769
Dec  9 21:49:02 dsm automount[14325]: expire_proc_indirect: expire /u/sobolev
Dec  9 21:49:02 dsm automount[14325]: expire_proc_indirect: expire /u/erdos
Dec  9 21:49:02 dsm automount[14325]: 2 remaining in /u
Dec  9 21:49:02 dsm automount[14325]: expire_cleanup: got thid 140137134475008 path /u stat 5
Dec  9 21:49:02 dsm automount[14325]: expire_cleanup: sigchld: exp 140137134475008 finished, switching from 2 to 1

Comment 18 RobbieTheK 2017-12-10 03:03:53 UTC
And you replied in this other thread for SSSD being the culprit: https://bugzilla.redhat.com/show_bug.cgi?id=1189767

Comment 19 Ian Kent 2017-12-10 09:28:17 UTC
(In reply to RobbieTheK from comment #17)
> Well after 24 hours no error besides the same "spawn_umount: mount failed
> with error code 16", I'll keep it going.
> 
> Dec  9 21:49:02 dsm automount[14325]: handle_packet: type = 4
> Dec  9 21:49:02 dsm automount[14325]: handle_packet_expire_indirect: token
> 95768, name sobolev
> Dec  9 21:49:02 dsm automount[14325]: expiring path /u/sobolev
> Dec  9 21:49:02 dsm automount[14325]: umount_multi: path /u/sobolev incl 1
> Dec  9 21:49:02 dsm automount[14325]: umount_subtree_mounts: unmounting dir
> = /u/sobolev
> Dec  9 21:49:02 dsm automount[14325]: rm_unwanted_fn: removing directory
> /u/sobolev
> Dec  9 21:49:02 dsm automount[14325]: expired /u/sobolev
> Dec  9 21:49:02 dsm automount[14325]: dev_ioctl_send_ready: token = 95768
> Dec  9 21:49:02 dsm automount[14325]: handle_packet: type = 4
> Dec  9 21:49:02 dsm automount[14325]: handle_packet_expire_indirect: token
> 95769, name erdos
> Dec  9 21:49:02 dsm automount[14325]: expiring path /u/erdos
> Dec  9 21:49:02 dsm automount[14325]: umount_multi: path /u/erdos incl 1
> Dec  9 21:49:02 dsm automount[14325]: umount_subtree_mounts: unmounting dir
> = /u/erdos
> Dec  9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is
> busy
> Dec  9 21:49:02 dsm automount[14325]: spawn_umount: mount failed with error
> code 16, retrying with the -f option
> Dec  9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is
> busy
> Dec  9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is
> busy
> Dec  9 21:49:02 dsm automount[14325]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Dec  9 21:49:02 dsm automount[14325]: expired /u/erdos
> Dec  9 21:49:02 dsm automount[14325]: dev_ioctl_send_ready: token = 95769
> Dec  9 21:49:02 dsm automount[14325]: expire_proc_indirect: expire /u/sobolev
> Dec  9 21:49:02 dsm automount[14325]: expire_proc_indirect: expire /u/erdos
> Dec  9 21:49:02 dsm automount[14325]: 2 remaining in /u
> Dec  9 21:49:02 dsm automount[14325]: expire_cleanup: got thid
> 140137134475008 path /u stat 5
> Dec  9 21:49:02 dsm automount[14325]: expire_cleanup: sigchld: exp
> 140137134475008 finished, switching from 2 to 1

This is almost certainly a completely different problem.

It's might be related to mount propogation to private namespaces
which you may not even be aware you are using. If you are aware
your using private namespaces in some way then for them to be
created properly when other mounts (including autofs mounts) are
present then what you are using must clean up unused mounts as
part of the namespace creation (or you need to do it yourself),
the kernel can't do that.

The autofs kernel module will still select dentries for expire
in these cases. So far I have been unable to come up with an
acceptable method to correctly check this type of dentry for
expiry.

But it is a false positive which shouldn't cause other problems.

What is the additional problem this causes, I don't see any
explanation of what that is here and it's way off topic for the
bug too.

Ian

Comment 20 Ian Kent 2017-12-10 09:37:38 UTC
(In reply to RobbieTheK from comment #18)
> And you replied in this other thread for SSSD being the culprit:
> https://bugzilla.redhat.com/show_bug.cgi?id=1189767

Culprit of DNS lookup falure?
It doesn't look like that's what I said and ...

That's a completely different bug, it's nothing to do with
the problem being discussed here.

Note that the sssd network not ready problem hasn't been
resolved in RHEL-7 and is very unlikely to find it way back
to earlier releases when (and if) it is.

Workarounds in autofs for this that went into RHEL-7.4 and
and received further fixes in RHEL-7.5 are not likely to
find their way into earlier releases either because the
changes are to too large to meet z-stream release inclusion
policy.

Comment 21 Ian Kent 2017-12-10 09:43:16 UTC
(In reply to Ian Kent from comment #20)
> (In reply to RobbieTheK from comment #18)
> > And you replied in this other thread for SSSD being the culprit:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1189767
> 
> Culprit of DNS lookup falure?
> It doesn't look like that's what I said and ...
> 
> That's a completely different bug, it's nothing to do with
> the problem being discussed here.
> 
> Note that the sssd network not ready problem hasn't been
> resolved in RHEL-7 and is very unlikely to find it way back
> to earlier releases when (and if) it is.
> 
> Workarounds in autofs for this that went into RHEL-7.4 and
> and received further fixes in RHEL-7.5 are not likely to
> find their way into earlier releases either because the
> changes are to too large to meet z-stream release inclusion
> policy.

Oops, I digress, this is Fedora.

The sssd "Network not available at startup" fixes are in Fedora
26 I believe so this shouldn't be a problem.

Whether name lookups still fail because sss is not yet started
is not and autofs problem.

Comment 22 RobbieTheK 2017-12-11 19:51:21 UTC
so after 3 days now still no errors other than those reported previously: 
Dec 11 09:25:52 dsm automount[14325]: expiring path /u/erdos
Dec 11 09:25:52 dsm automount[14325]: umount_multi: path /u/erdos incl 1
Dec 11 09:25:52 dsm automount[14325]: umount_subtree_mounts: unmounting dir = /u/erdos
Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy
Dec 11 09:25:52 dsm automount[14325]: spawn_umount: mount failed with error code 16, retrying with the -f option
Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy
Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy
Dec 11 09:25:52 dsm automount[14325]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ
Dec 11 09:25:52 dsm automount[14325]: expired /u/erdos
Dec 11 09:25:52 dsm automount[14325]: dev_ioctl_send_ready: token = 97331
Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/sobolev
Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/pascal
Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/erdos
Dec 11 09:25:53 dsm automount[14325]: 3 remaining in /u

Note that erdos is our NFS file server.

There are several instances of this error:
ypserv[xxxxx]: refused connect from erdos:54550 to procedure ypproc_match (divscimath,shadow.byname;-1)

Notice the server hostname is the same as the umount (erdos being our file server) above. Another red herring?

Comment 23 Ian Kent 2017-12-12 03:10:03 UTC
(In reply to RobbieTheK from comment #22)
> so after 3 days now still no errors other than those reported previously: 
> Dec 11 09:25:52 dsm automount[14325]: expiring path /u/erdos
> Dec 11 09:25:52 dsm automount[14325]: umount_multi: path /u/erdos incl 1
> Dec 11 09:25:52 dsm automount[14325]: umount_subtree_mounts: unmounting dir
> = /u/erdos
> Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is
> busy
> Dec 11 09:25:52 dsm automount[14325]: spawn_umount: mount failed with error
> code 16, retrying with the -f option
> Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is
> busy
> Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is
> busy
> Dec 11 09:25:52 dsm automount[14325]: Unable to update the mtab file,
> /proc/mounts and /etc/mtab will differ
> Dec 11 09:25:52 dsm automount[14325]: expired /u/erdos
> Dec 11 09:25:52 dsm automount[14325]: dev_ioctl_send_ready: token = 97331
> Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/sobolev
> Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/pascal
> Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/erdos
> Dec 11 09:25:53 dsm automount[14325]: 3 remaining in /u
> 
> Note that erdos is our NFS file server.
> 
> There are several instances of this error:
> ypserv[xxxxx]: refused connect from erdos:54550 to procedure ypproc_match
> (divscimath,shadow.byname;-1)

I think I've already said that I don't know what's causing these.
They look like NIS access denials and are unlikely to be related.

> 
> Notice the server hostname is the same as the umount (erdos being our file
> server) above. Another red herring?

All I can say about the above log messages is to offer explanations
of causes that I'm aware of which I've done.

Another point though is a mount being seen as in use when a umount
is attempted is local to the host, it shouldn't relate to any network
connection problem or name lookup problem, transient or otherwise.

Comment 24 RobbieTheK 2017-12-12 15:50:09 UTC
Is it possible that the debug version you gave me has something that fixed the "add_host_addrs: hostname lookup failed: Name or service not known/No address associated with hostname" errors? I have not seen them on the server and workstation that I installed the RPM you shared with me, but the other servers continue to log them.

Not sure if it's worth mentioning but NetworkManager is disabled and network is running but I don't think there's any correlation to the error.

As for that ypproc_match error you gave me a clue as well as the comment at http://www.thegeekstuff.com/2011/11/linux-nis-server-and-client#comment-4486354. In /etc/ypserv.conf I changed:
IP-of-NIS-server : * : shadow.byname : port 
to:
*                        : *	   : shadow.byname    : none

And I believe those errors are gone.

Comment 25 Ian Kent 2017-12-12 17:16:26 UTC
(In reply to RobbieTheK from comment #24)
> Is it possible that the debug version you gave me has something that fixed
> the "add_host_addrs: hostname lookup failed: Name or service not known/No
> address associated with hostname" errors? I have not seen them on the server
> and workstation that I installed the RPM you shared with me, but the other
> servers continue to log them.

Well, yes, I thought that's what we were testing plus I added
some logging around the parameters to getaddrinfo(3) to see if
there was a problem with them.

See the description of the resurrected patch I added:
"autofs-5.1.3 - workaround getaddrinfo(3) ai_canonname bug
    
 There appears to be a bug in some versions of getaddrinfo() where the
 addrinfo structure gets garbage in ai_canonname when it is not requested
 by the lookup flags causing a crash.
    
 Try and work around this by always requesting ai_canonname for lookups
 and hope this doesn't affect the semantics of the autofs lookup usage."

I can't remember what lead me to this conclusion when I originally
wrote this patch but, IIRC, I was told by glibc folks this wasn't
the case but I wanted to check anyway.

We don't see a crash here but the assumption is that garbage in
ai_canonname may be causing an error return in this case.

I never got any test feedback for the change at that time.

Ian

Comment 26 RobbieTheK 2017-12-12 20:06:20 UTC
I think for our purposes your patch fixes the issue. Thanks so much for addressing this. I sometimes feel like these bug reports just go into the ether and auto-close once the Fedora release is EOL'ed (like many did this morning).

Is the next step for you to put this in the testing repo?

Comment 27 Ian Kent 2017-12-12 23:39:17 UTC
(In reply to RobbieTheK from comment #26)
> I think for our purposes your patch fixes the issue. Thanks so much for
> addressing this. I sometimes feel like these bug reports just go into the
> ether and auto-close once the Fedora release is EOL'ed (like many did this
> morning).

Yes, there's some of that going with autofs too.

Often the bugs are fixed but not closed.
The 5.1.4 release should fix most known problems that have
been hanging around a bit too long .....

> 
> Is the next step for you to put this in the testing repo?

I'm hoping to release 5.1.4 around the end of the week so
I'd like to wait until then.

I didn't have this patch in the 5.1.4 patch queue when I
posted them to the autofs list for review although I'm
thinking of including it anyway now.

In any case the patch will be added to the rpm.

Ian

Comment 28 Fedora Update System 2017-12-19 23:38:45 UTC
autofs-5.1.4-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-21cb07e1d3

Comment 29 Fedora Update System 2017-12-19 23:44:10 UTC
autofs-5.1.4-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-593cc5120d

Comment 30 Fedora Update System 2017-12-21 20:25:05 UTC
autofs-5.1.4-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-21cb07e1d3

Comment 31 Fedora Update System 2017-12-21 21:10:33 UTC
autofs-5.1.4-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-593cc5120d

Comment 32 Fedora Update System 2017-12-22 03:20:53 UTC
autofs-5.1.4-4.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-eda74692fc

Comment 33 Fedora Update System 2017-12-22 03:21:24 UTC
autofs-5.1.4-4.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-164b3ee23a

Comment 34 Fedora Update System 2017-12-22 12:43:12 UTC
autofs-5.1.4-4.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-eda74692fc

Comment 35 Fedora Update System 2017-12-22 13:16:02 UTC
autofs-5.1.4-4.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-164b3ee23a

Comment 36 RobbieTheK 2018-01-30 20:33:42 UTC
This still hasn't been pushed to the stable repos?

Comment 37 Fedora Update System 2018-02-01 07:17:28 UTC
autofs-5.1.4-5.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-eda74692fc

Comment 38 Fedora Update System 2018-02-01 07:41:59 UTC
autofs-5.1.4-5.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-164b3ee23a

Comment 39 Fedora Update System 2018-02-01 19:09:15 UTC
autofs-5.1.4-5.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-164b3ee23a

Comment 40 Fedora Update System 2018-02-01 19:30:09 UTC
autofs-5.1.4-5.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-eda74692fc

Comment 41 Fedora Update System 2018-02-09 11:27:26 UTC
autofs-5.1.4-5.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 42 Fedora Update System 2018-02-09 16:29:25 UTC
autofs-5.1.4-5.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 43 RobbieTheK 2018-02-21 15:48:10 UTC
Still seeing automount[31776]: add_host_addrs: hostname lookup failed: No address associated with hostname on all NIS clients (not the server).

Comment 44 Ian Kent 2018-02-22 00:51:33 UTC
(In reply to RobbieTheK from comment #43)
> Still seeing automount[31776]: add_host_addrs: hostname lookup failed: No
> address associated with hostname on all NIS clients (not the server).

That's a shame.

The change that was included in the package that you tested
is definitely included in the current Fedora autofs rpm so
I don't know what is causing this.

I'm wondering what we can do to collect useful information.

I'm not even sure that a core taken at the time this happens
will be helpful as I've looked at one before.

I wonder if a network packet trace would give us anything
useful ..... It would be difficult to sort through that but
is worth a try. Lets try this first.

I also wonder if this is some sort of transient error where
a retry (or two) would fix it but that really shouldn't be
needed ....

Comment 45 RobbieTheK 2018-02-22 16:14:39 UTC
> I wonder if a network packet trace would give us anything
> useful ..... It would be difficult to sort through that but
> is worth a try. Lets try this first.

So run a tcpdump? For how long? Any specific packets to capture with the -i flag? Or number of packets with -c?

Comment 46 Ian Kent 2018-02-22 23:43:52 UTC
(In reply to RobbieTheK from comment #45)
> > I wonder if a network packet trace would give us anything
> > useful ..... It would be difficult to sort through that but
> > is worth a try. Lets try this first.
> 
> So run a tcpdump? For how long? Any specific packets to capture with the -i
> flag? Or number of packets with -c?

Yes, tcpdump would do.

Run it on a machine until you see one of those error messages
in the log.

Just grab everything we can filter the packet log when read.
It would be bad to miss something because of filtering.

Maybe something like:
tcpdump -i <net interface> -w <packet trace file>

Really, all I want to do is verify a DNS query is being
made, but I should be able to check the name being queried
is valid as well (in case it has become corrupt in some way).

TBH I'm pretty sure I have checked the validity of the name
long ago but I have to check again.

Ian

Comment 47 RobbieTheK 2018-02-24 05:56:32 UTC
Created attachment 1400139 [details]
tcpdump cap file

I can't believe it I was able to catch it! Oddly it was just from a SSH that uses key authentication. The tcpdump aligns with the second occurrence of the error at Feb 24 00:19:19. 

Feb 24 00:16:37 workstation automount[7758]: add_host_addrs: hostname lookup failed: No address associated with hostname
Feb 24 00:16:39 workstation dbus-daemon[902]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.26454' (uid=0 pid=4803 comm="su - ")
Feb 24 00:16:39 workstation systemd[1]: Starting Fingerprint Authentication Daemon...
Feb 24 00:16:40 workstation dbus-daemon[902]: [system] Successfully activated service 'net.reactivated.Fprint'
Feb 24 00:16:40 workstation systemd[1]: Started Fingerprint Authentication Daemon.
Feb 24 00:16:44 workstation su[4803]: (to root) kudyba on pts/0
Feb 24 00:19:05 workstation kernel: device eno1 entered promiscuous mode
Feb 24 00:19:10 workstation sssd[kcm][2462]: Shutting down
Feb 24 00:19:12 workstation systemd[1]: Started SSSD Kerberos Cache Manager.
Feb 24 00:19:12 workstation sssd[kcm][6058]: Starting up
Feb 24 00:19:19 workstation automount[7758]: add_host_addrs: hostname lookup failed: No address associated with hostname

I do see these errors from tshark but don't want to send you down the wrong path/red herring:
Yellow Pages Service MATCH reply Error:YP_NOMAP
Yellow Pages Service MATCH reply Error:YP_NOKEY

Yellow Pages Service MATCH reply Error:YP_NOMAP
    [Program Version: 2]
    [V2 Procedure: MATCH (3)]
    Status: YP_NOMAP (-1)
    Value: <EMPTY>
        length: 0
        contents: <EMPTY>

Comment 48 Ian Kent 2018-03-01 23:36:41 UTC
(In reply to RobbieTheK from comment #47)
> Created attachment 1400139 [details]
> tcpdump cap file
> 
> I can't believe it I was able to catch it! Oddly it was just from a SSH that
> uses key authentication. The tcpdump aligns with the second occurrence of
> the error at Feb 24 00:19:19. 
> 
> Feb 24 00:16:37 workstation automount[7758]: add_host_addrs: hostname lookup
> failed: No address associated with hostname
> Feb 24 00:16:39 workstation dbus-daemon[902]: [system] Activating via
> systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
> requested by ':1.26454' (uid=0 pid=4803 comm="su - ")
> Feb 24 00:16:39 workstation systemd[1]: Starting Fingerprint Authentication
> Daemon...
> Feb 24 00:16:40 workstation dbus-daemon[902]: [system] Successfully
> activated service 'net.reactivated.Fprint'
> Feb 24 00:16:40 workstation systemd[1]: Started Fingerprint Authentication
> Daemon.
> Feb 24 00:16:44 workstation su[4803]: (to root) kudyba on pts/0
> Feb 24 00:19:05 workstation kernel: device eno1 entered promiscuous mode
> Feb 24 00:19:10 workstation sssd[kcm][2462]: Shutting down
> Feb 24 00:19:12 workstation systemd[1]: Started SSSD Kerberos Cache Manager.
> Feb 24 00:19:12 workstation sssd[kcm][6058]: Starting up
> Feb 24 00:19:19 workstation automount[7758]: add_host_addrs: hostname lookup
> failed: No address associated with hostname

I can't see anything in the packet trace that looks like
a problem and nothing name translation related.

> 
> I do see these errors from tshark but don't want to send you down the wrong
> path/red herring:
> Yellow Pages Service MATCH reply Error:YP_NOMAP
> Yellow Pages Service MATCH reply Error:YP_NOKEY
> 
> Yellow Pages Service MATCH reply Error:YP_NOMAP
>     [Program Version: 2]
>     [V2 Procedure: MATCH (3)]
>     Status: YP_NOMAP (-1)
>     Value: <EMPTY>
>         length: 0
>         contents: <EMPTY>

Yeah, AFAICS the YP_NOMAP errors are coming from shadow lookups.

It implies the shadow file isn't setup as one of the NIS maps
except that there are successful shadow map lookups in the trace.
I don't understand that but it shouldn't affect host name lookups.
Are you sure all your NIS replicas are receiving the shadow map?

The YP_NOKEY error reports look to have come from a lookup
of key .cargo in map auto.home which then fell back to a
wildcard map entry in auto.home followed by a failed host
name lookup for the same name. That's probably normal.

Not sure what to do with this now.

I tried logging the host name that's being looked up in the
past to make sure the name isn't corrupt or is validly not
resolving.

I have to wonder why I didn't improve that message at the
time, it's not really good enough.

I'll do that and update the package (and put it in my upstream
patch queue for the next release) and see if we can get more
information about what's going on.

Ian

Comment 49 Fedora Update System 2018-03-02 00:15:05 UTC
autofs-5.1.4-6.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-21690ee57d

Comment 50 Ian Kent 2018-03-02 00:19:49 UTC
Can you try:
https://koji.fedoraproject.org/koji/buildinfo?buildID=1052431

and see what we get for the name being looked up for
these failed name resolutions.

Comment 51 RobbieTheK 2018-03-02 04:27:08 UTC
(In reply to Ian Kent from comment #50)
> Can you try:
> https://koji.fedoraproject.org/koji/buildinfo?buildID=1052431
> 
> and see what we get for the name being looked up for
> these failed name resolutions.

Just making sure, I enabled debug logging in /etc/autofs.conf, you don't need another tcpdump, correct?

Comment 52 RobbieTheK 2018-03-02 16:06:20 UTC
> The YP_NOKEY error reports look to have come from a lookup
> of key .cargo in map auto.home which then fell back to a
> wildcard map entry in auto.home followed by a failed host
> name lookup for the same name. That's probably normal.

Well here is the contents of  /etc/auto.home
# key           mount-options           location
#
#  Any special cases go here
#

# All others: /u/host is on host:/home/users
*               -rw,nolock              &:/home/users

Just making sure I have the correct RPM installed:
rpm -q autofs
autofs-5.1.4-6.fc27.x86_64

Here are logs around the time of the subsequent error:
Mar  2 10:08:27 workstation systemd[1]: Started SSSD Kerberos Cache Manager.
Mar  2 10:08:27 workstation sssd[kcm][29512]: Starting up
Mar  2 10:08:32 workstation automount[23876]: st_expire: state 1 path /cifs
Mar  2 10:08:32 workstation automount[23876]: expire_proc: exp_proc = 139742147716864 path /cifs
Mar  2 10:08:32 workstation automount[23876]: expire_cleanup: got thid 139742147716864 path /cifs stat 0
Mar  2 10:08:32 workstation automount[23876]: expire_cleanup: sigchld: exp 139742147716864 finished, switching from 2 to 1
Mar  2 10:08:32 workstation automount[23876]: st_ready: st_ready(): state = 2 path /cifs
Mar  2 10:08:38 workstation automount[23876]: handle_packet: type = 3
Mar  2 10:08:38 workstation automount[23876]: handle_packet_missing_indirect: token 29747, name mail, request pid 29514
Mar  2 10:08:38 workstation automount[23876]: attempting to mount entry /local/mail
Mar  2 10:08:38 workstation automount[23876]: lookup_mount: lookup(yp): looking up mail
Mar  2 10:08:38 workstation automount[23876]: lookup_mount: lookup(yp): mail -> -rw#011#011#011ourhostname:/var/spool/mail
Mar  2 10:08:38 workstation automount[23876]: parse_mount: parse(sun): expanded entry: -rw#011#011#011ourhostname:/var/spool/mail
Mar  2 10:08:38 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw
Mar  2 10:08:38 workstation automount[23876]: parse_mount: parse(sun): dequote("ourhostname:/var/spool/mail") -> ourhostname:/var/spool/mail
Mar  2 10:08:38 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw, loc=ourhostname:/var/spool/mail
Mar  2 10:08:38 workstation automount[23876]: sun_mount: parse(sun): mounting root /local, mountpoint mail, what ourhostname:/var/spool/mail, fstype nfs, options rw
Mar  2 10:08:38 workstation automount[23876]: mount_mount: mount(nfs): root=/local name=mail what=ourhostname:/var/spool/mail, fstype=nfs, options=rw
Mar  2 10:08:38 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw", nobind=0, nosymlink=0, ro=0
Mar  2 10:08:38 workstation automount[23876]: mount_mount: mount(nfs): calling mkdir_path /local/mail
Mar  2 10:08:38 workstation automount[23876]: mount_mount: mount(nfs): calling mount -t nfs -s -o rw ourhostname:/var/spool/mail /local/mail
Mar  2 10:08:39 workstation automount[23876]: mount_mount: mount(nfs): mounted ourhostname:/var/spool/mail on /local/mail
Mar  2 10:08:39 workstation automount[23876]: dev_ioctl_send_ready: token = 29747
Mar  2 10:08:39 workstation automount[23876]: mounted /local/mail
Mar  2 10:08:39 workstation automount[23876]: st_expire: state 1 path /drobo-rh
Mar  2 10:08:39 workstation automount[23876]: expire_proc: exp_proc = 139742147716864 path /drobo-rh
Mar  2 10:08:39 workstation automount[23876]: st_expire: state 1 path /drobo-lc
Mar  2 10:08:39 workstation automount[23876]: expire_proc: exp_proc = 139742163220224 path /drobo-lc
Mar  2 10:08:39 workstation automount[23876]: expire_cleanup: got thid 139742147716864 path /drobo-rh stat 0
Mar  2 10:08:39 workstation automount[23876]: expire_cleanup: sigchld: exp 139742147716864 finished, switching from 2 to 1
Mar  2 10:08:39 workstation automount[23876]: st_ready: st_ready(): state = 2 path /drobo-rh
Mar  2 10:08:39 workstation automount[23876]: expire_cleanup: got thid 139742163220224 path /drobo-lc stat 0
Mar  2 10:08:39 workstation automount[23876]: expire_cleanup: sigchld: exp 139742163220224 finished, switching from 2 to 1
Mar  2 10:08:39 workstation automount[23876]: st_ready: st_ready(): state = 2 path /drobo-lc
Mar  2 10:08:41 workstation automount[23876]: handle_packet: type = 3
Mar  2 10:08:41 workstation automount[23876]: handle_packet_missing_indirect: token 29748, name config, request pid 29514
Mar  2 10:08:41 workstation automount[23876]: attempting to mount entry /u/config
Mar  2 10:08:41 workstation automount[23876]: lookup_mount: lookup(yp): looking up config
Mar  2 10:08:41 workstation automount[23876]: lookup_mount: lookup(yp): config -> -rw,nolock#011#011&:/home/users
Mar  2 10:08:41 workstation automount[23876]: parse_mount: parse(sun): expanded entry: -rw,nolock#011#011config:/home/users
Mar  2 10:08:41 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw,nolock
Mar  2 10:08:41 workstation automount[23876]: parse_mount: parse(sun): dequote("config:/home/users") -> config:/home/users
Mar  2 10:08:41 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw,nolock, loc=config:/home/users
Mar  2 10:08:41 workstation automount[23876]: sun_mount: parse(sun): mounting root /u, mountpoint config, what config:/home/users, fstype nfs, options rw,nolock
Mar  2 10:08:41 workstation automount[23876]: mount_mount: mount(nfs): root=/u name=config what=config:/home/users, fstype=nfs, options=rw,nolock
Mar  2 10:08:41 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw,nolock", nobind=0, nosymlink=0, ro=0
Mar  2 10:08:41 workstation automount[23876]: add_host_addrs: hostname lookup for config failed: No address associated with hostname
Mar  2 10:08:41 workstation automount[23876]: mount(nfs): no hosts available
Mar  2 10:08:41 workstation automount[23876]: dev_ioctl_send_fail: token = 29748
Mar  2 10:08:41 workstation automount[23876]: failed to mount /u/config
Mar  2 10:08:42 workstation automount[23876]: handle_packet: type = 3
Mar  2 10:08:42 workstation automount[23876]: handle_packet_missing_indirect: token 29749, name ourhostname, request pid 29514
Mar  2 10:08:42 workstation automount[23876]: attempting to mount entry /local/ourhostname
Mar  2 10:08:42 workstation automount[23876]: lookup_mount: lookup(yp): looking up ourhostname
Mar  2 10:08:42 workstation automount[23876]: lookup_mount: lookup(yp): ourhostname -> &:/usr/local
Mar  2 10:08:42 workstation automount[23876]: parse_mount: parse(sun): expanded entry: ourhostname:/usr/local
Mar  2 10:08:42 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw
Mar  2 10:08:42 workstation automount[23876]: parse_mount: parse(sun): dequote("ourhostname:/usr/local") -> ourhostname:/usr/local
Mar  2 10:08:42 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw, loc=ourhostname:/usr/local
Mar  2 10:08:42 workstation automount[23876]: sun_mount: parse(sun): mounting root /local, mountpoint ourhostname, what ourhostname:/usr/local, fstype nfs, options rw
Mar  2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): root=/local name=ourhostname what=ourhostname:/usr/local, fstype=nfs, options=rw
Mar  2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw", nobind=0, nosymlink=0, ro=0
Mar  2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): calling mkdir_path /local/ourhostname
Mar  2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): calling mount -t nfs -s -o rw ourhostname:/usr/local /local/ourhostname
Mar  2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): mounted ourhostname:/usr/local on /local/ourhostname
Mar  2 10:08:42 workstation automount[23876]: dev_ioctl_send_ready: token = 29749
Mar  2 10:08:42 workstation automount[23876]: mounted /local/ourhostname
Mar  2 10:08:42 workstation automount[23876]: st_readmap: state 1 path /local
Mar  2 10:08:42 workstation automount[23876]: re-reading map for /local
Mar  2 10:08:42 workstation automount[23876]: lookup_nss_read_map: reading map files auto.local
Mar  2 10:08:42 workstation automount[23876]: lookup_nss_read_map: reading map nis auto.local
Mar  2 10:08:42 workstation automount[23876]: do_init: parse(sun): init gathered global options: rw
Mar  2 10:08:42 workstation automount[23876]: lookup_read_map: map read not needed, so not done
Mar  2 10:08:42 workstation automount[23876]: lookup_prune_one_cache: prune check posponed, /local/mail mounted
Mar  2 10:08:42 workstation automount[23876]: st_ready: st_ready(): state = 4 path /local
Mar  2 10:08:43 workstation automount[23876]: handle_packet: type = 3
Mar  2 10:08:43 workstation automount[23876]: handle_packet_missing_indirect: token 29750, name config, request pid 29514
Mar  2 10:08:43 workstation automount[23876]: dev_ioctl_send_fail: token = 29750
Mar  2 10:08:47 workstation automount[23876]: st_expire: state 1 path /cifs
Mar  2 10:08:47 workstation automount[23876]: expire_proc: exp_proc = 139742147716864 path /cifs
Mar  2 10:08:47 workstation automount[23876]: expire_cleanup: got thid 139742147716864 path /cifs stat 0
Mar  2 10:08:47 workstation automount[23876]: expire_cleanup: sigchld: exp 139742147716864 finished, switching from 2 to 1
Mar  2 10:08:47 workstation automount[23876]: st_ready: st_ready(): state = 2 path /cifs
Mar  2 10:08:54 workstation automount[23876]: st_expire: state 1 path /drobo-rh
Mar  2 10:08:54 workstation automount[23876]: expire_proc: exp_proc = 139742147716864 path /drobo-rh
Mar  2 10:08:54 workstation automount[23876]: st_expire: state 1 path /drobo-lc
Mar  2 10:08:54 workstation automount[23876]: expire_proc: exp_proc = 139742163220224 path /drobo-lc
[...]
Mar  2 10:09:47 workstation automount[23876]: expire_cleanup: got thid 139742147716864 path /cifs stat 0
Mar  2 10:09:47 workstation automount[23876]: expire_cleanup: sigchld: exp 139742147716864 finished, switching from 2 to 1
Mar  2 10:09:47 workstation automount[23876]: st_ready: st_ready(): state = 2 path /cifs
Mar  2 10:09:50 workstation sssd[secrets][27107]: Shutting down
Mar  2 10:09:50 workstation automount[23876]: handle_packet: type = 3
Mar  2 10:09:50 workstation automount[23876]: handle_packet_missing_indirect: token 29751, name config, request pid 29514
Mar  2 10:09:50 workstation automount[23876]: attempting to mount entry /u/config
Mar  2 10:09:50 workstation automount[23876]: lookup_mount: lookup(yp): looking up config
Mar  2 10:09:50 workstation automount[23876]: lookup_mount: lookup(yp): config -> -rw,nolock#011#011&:/home/users
Mar  2 10:09:50 workstation automount[23876]: parse_mount: parse(sun): expanded entry: -rw,nolock#011#011config:/home/users
Mar  2 10:09:50 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw,nolock
Mar  2 10:09:50 workstation automount[23876]: parse_mount: parse(sun): dequote("config:/home/users") -> config:/home/users
Mar  2 10:09:50 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw,nolock, loc=config:/home/users
Mar  2 10:09:50 workstation automount[23876]: sun_mount: parse(sun): mounting root /u, mountpoint config, what config:/home/users, fstype nfs, options rw,nolo
ck
Mar  2 10:09:50 workstation automount[23876]: mount_mount: mount(nfs): root=/u name=config what=config:/home/users, fstype=nfs, options=rw,nolock
Mar  2 10:09:50 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw,nolock", nobind=0, nosymlink=0, ro=0
Mar  2 10:09:50 workstation automount[23876]: add_host_addrs: hostname lookup for config failed: No address associated with hostname
Mar  2 10:09:50 workstation automount[23876]: mount(nfs): no hosts available
Mar  2 10:09:50 workstation automount[23876]: dev_ioctl_send_fail: token = 29751
Mar  2 10:09:50 workstation automount[23876]: failed to mount /u/config
Mar  2 10:09:50 workstation automount[23876]: st_readmap: state 1 path /u
Mar  2 10:09:50 workstation automount[23876]: re-reading map for /u
Mar  2 10:09:50 workstation automount[23876]: lookup_nss_read_map: reading map files auto.home
Mar  2 10:09:50 workstation automount[23876]: lookup_nss_read_map: reading map nis auto.home
Mar  2 10:09:50 workstation automount[23876]: do_init: parse(sun): init gathered global options: rw
Mar  2 10:09:50 workstation automount[23876]: lookup_read_map: map read not needed, so not done
Mar  2 10:09:50 workstation automount[23876]: lookup_prune_one_cache: prune check posponed, /u/workstation mounted
Mar  2 10:09:50 workstation automount[23876]: st_ready: st_ready(): state = 4 path /u
Mar  2 10:09:52 workstation systemd[1]: Started SSSD Secrets Service responder.
Mar  2 10:09:52 workstation sssd[secrets][29650]: Starting up

Comment 53 Fedora Update System 2018-03-02 17:34:10 UTC
autofs-5.1.4-6.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-21690ee57d

Comment 54 Ian Kent 2018-03-02 23:52:11 UTC
(In reply to RobbieTheK from comment #51)
> (In reply to Ian Kent from comment #50)
> > Can you try:
> > https://koji.fedoraproject.org/koji/buildinfo?buildID=1052431
> > 
> > and see what we get for the name being looked up for
> > these failed name resolutions.
> 
> Just making sure, I enabled debug logging in /etc/autofs.conf, you don't
> need another tcpdump, correct?

No I don't think it will get any information.
There wasn't anything unusual in the previous one.

Comment 55 Ian Kent 2018-03-03 00:03:19 UTC
attempting to mount entry /u/config
lookup_mount: lookup(yp): looking up config
lookup_mount: lookup(yp): config -> -rw,nolock#011#011&:/home/users
parse_mount: parse(sun): expanded entry: -rw,nolock#011#011config:/home/users
parse_mount: parse(sun): gathered options: rw,nolock
parse_mount: parse(sun): dequote("config:/home/users") -> config:/home/users
parse_mount: parse(sun): core of entry: options=rw,nolock, loc=config:/home/users
sun_mount: parse(sun): mounting root /u, mountpoint config, what config:/home/users, fstype nfs, options rw,nolock
mount_mount: mount(nfs): root=/u name=config what=config:/home/users, fstype=nfs, options=rw,nolock
mount_mount: mount(nfs): nfs options="rw,nolock", nobind=0, nosymlink=0, ro=0
add_host_addrs: hostname lookup for config failed: No address associated with hostname
mount(nfs): no hosts available
dev_ioctl_send_fail: token = 29748
failed to mount /u/config

This looks like a perfectly valid lookup to me and would
succeed if "config" was a valid host name, should it be?

Comment 56 RobbieTheK 2018-03-03 02:36:42 UTC
(In reply to Ian Kent from comment #55)
> attempting to mount entry /u/config
> add_host_addrs: hostname lookup for config failed: No address associated
> with hostname
> mount(nfs): no hosts available
> dev_ioctl_send_fail: token = 29748
> failed to mount /u/config
> 
> This looks like a perfectly valid lookup to me and would
> succeed if "config" was a valid host name, should it be?

Hmm what config file would this be coming from? This is not a name of a mount nor share that I know of nor expect?

Comment 57 Ian Kent 2018-03-04 01:19:37 UTC
(In reply to RobbieTheK from comment #56)
> (In reply to Ian Kent from comment #55)
> > attempting to mount entry /u/config
> > add_host_addrs: hostname lookup for config failed: No address associated
> > with hostname
> > mount(nfs): no hosts available
> > dev_ioctl_send_fail: token = 29748
> > failed to mount /u/config
> > 
> > This looks like a perfectly valid lookup to me and would
> > succeed if "config" was a valid host name, should it be?
> 
> Hmm what config file would this be coming from? This is not a name of a
> mount nor share that I know of nor expect?

That's the problem with using the wilcard map entry.

# All others: /u/host is on host:/home/users
*               -rw,nolock              &:/home/users

The kernel responds to path lookups, aka. /u/config.
It's duty bound to call back to the daemon when a process tries
to access a directory within the managed automount, /u is this
case.

The daemon then tries to match the requested key to a map entry
but in this case the wildcard map entry matches "any" otherwise
unmatched entry.

So it looks like this error is valid for this case.

If, by coincidence, the key did correspond to a valid host
name that didn't have a /home/users export it would fail at
that time instead.

Ian

Comment 58 RobbieTheK 2018-03-04 03:27:53 UTC
So right after I login using SSH key authentication this gets logged:
Mar  3 22:21:46 workstation automount[23876]: handle_packet: type = 3
Mar  3 22:21:46 workstation automount[23876]: handle_packet_missing_indirect: token 29841, name .cargo, request pid 28233
Mar  3 22:21:46 workstation automount[23876]: attempting to mount entry /u/.cargo
Mar  3 22:21:46 workstation automount[23876]: lookup_mount: lookup(yp): looking up .cargo
Mar  3 22:21:46 workstation automount[23876]: lookup_mount: lookup(yp): .cargo -> -rw,nolock#011#011&:/home/users
Mar  3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): expanded entry: -rw,nolock#011#011.cargo:/home/users
Mar  3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw,nolock
Mar  3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): dequote(".cargo:/home/users") -> .cargo:/home/users
Mar  3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw,nolock, loc=.cargo:/home/users
Mar  3 22:21:46 workstation automount[23876]: sun_mount: parse(sun): mounting root /u, mountpoint .cargo, what .cargo:/home/users, fstype nfs, options rw,nolock
Mar  3 22:21:46 workstation automount[23876]: mount_mount: mount(nfs): root=/u name=.cargo what=.cargo:/home/users, fstype=nfs, options=rw,nolock
Mar  3 22:21:46 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw,nolock", nobind=0, nosymlink=0, ro=0
Mar  3 22:21:46 workstation automount[23876]: add_host_addrs: hostname lookup for .cargo failed: No address associated with hostname
Mar  3 22:21:46 workstation automount[23876]: mount(nfs): no hosts available
Mar  3 22:21:46 workstation automount[23876]: dev_ioctl_send_fail: token = 29841
Mar  3 22:21:46 workstation automount[23876]: failed to mount /u/.cargo
Mar  3 22:21:48 workstation dbus-daemon[902]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.33064' (uid=0 pid=28272 comm="su - ")
Mar  3 22:21:48 workstation systemd[1]: Starting Fingerprint Authentication Daemon...
Mar  3 22:21:48 workstation dbus-daemon[902]: [system] Successfully activated service 'net.reactivated.Fprint'
Mar  3 22:21:48 workstation systemd[1]: Started Fingerprint Authentication Daemon.

We don't have a hostname .cargo? Notice the leading dot.

Comment 59 Ian Kent 2018-03-04 04:50:21 UTC
(In reply to RobbieTheK from comment #58)
> So right after I login using SSH key authentication this gets logged:
> Mar  3 22:21:46 workstation automount[23876]: handle_packet: type = 3
> Mar  3 22:21:46 workstation automount[23876]:
> handle_packet_missing_indirect: token 29841, name .cargo, request pid 28233
> Mar  3 22:21:46 workstation automount[23876]: attempting to mount entry
> /u/.cargo
> Mar  3 22:21:46 workstation automount[23876]: lookup_mount: lookup(yp):
> looking up .cargo
> Mar  3 22:21:46 workstation automount[23876]: lookup_mount: lookup(yp):
> .cargo -> -rw,nolock#011#011&:/home/users
> Mar  3 22:21:46 workstation automount[23876]: parse_mount: parse(sun):
> expanded entry: -rw,nolock#011#011.cargo:/home/users
> Mar  3 22:21:46 workstation automount[23876]: parse_mount: parse(sun):
> gathered options: rw,nolock
> Mar  3 22:21:46 workstation automount[23876]: parse_mount: parse(sun):
> dequote(".cargo:/home/users") -> .cargo:/home/users
> Mar  3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): core
> of entry: options=rw,nolock, loc=.cargo:/home/users
> Mar  3 22:21:46 workstation automount[23876]: sun_mount: parse(sun):
> mounting root /u, mountpoint .cargo, what .cargo:/home/users, fstype nfs,
> options rw,nolock
> Mar  3 22:21:46 workstation automount[23876]: mount_mount: mount(nfs):
> root=/u name=.cargo what=.cargo:/home/users, fstype=nfs, options=rw,nolock
> Mar  3 22:21:46 workstation automount[23876]: mount_mount: mount(nfs): nfs
> options="rw,nolock", nobind=0, nosymlink=0, ro=0
> Mar  3 22:21:46 workstation automount[23876]: add_host_addrs: hostname
> lookup for .cargo failed: No address associated with hostname
> Mar  3 22:21:46 workstation automount[23876]: mount(nfs): no hosts available
> Mar  3 22:21:46 workstation automount[23876]: dev_ioctl_send_fail: token =
> 29841
> Mar  3 22:21:46 workstation automount[23876]: failed to mount /u/.cargo
> Mar  3 22:21:48 workstation dbus-daemon[902]: [system] Activating via
> systemd: service name='net.reactivated.Fprint' unit='fprintd.service'
> requested by ':1.33064' (uid=0 pid=28272 comm="su - ")
> Mar  3 22:21:48 workstation systemd[1]: Starting Fingerprint Authentication
> Daemon...
> Mar  3 22:21:48 workstation dbus-daemon[902]: [system] Successfully
> activated service 'net.reactivated.Fprint'
> Mar  3 22:21:48 workstation systemd[1]: Started Fingerprint Authentication
> Daemon.
> 
> We don't have a hostname .cargo? Notice the leading dot.

What's your question?
This is another example of what I described in comment #57.

Comment 60 RobbieTheK 2018-03-05 16:19:34 UTC
> What's your question?
> This is another example of what I described in comment #57.

Apologies. I'm a little confused by this statement: "The kernel responds to path lookups, aka. /u/config". Where is /u/config set in order for the kernel to respond? Same question for /u/.cargo.

For example running:
find /etc -type f -exec fgrep /u/config '{}' ';' -print

or 

find /etc -type f -exec fgrep /u/\.cargo '{}' ';' -print

no results.

FWIW, I have yet to see the original error: automount[xxxx]: add_host_addrs: hostname lookup failed: No address associated with hostname

I also added the new RPM to another server so far only this error:
Mar  5 10:47:33 ourserver automount[31690]: lookup(file): failed to read included master map auto.master

I've also enabled autofs-debuginfo-5.1.4-6.fc27.ppc64.rpm but if there's no crash, there's nothing I can run GDB on, correct?

Comment 61 Ian Kent 2018-03-05 23:18:01 UTC
(In reply to RobbieTheK from comment #60)
> > What's your question?
> > This is another example of what I described in comment #57.
> 
> Apologies. I'm a little confused by this statement: "The kernel responds to
> path lookups, aka. /u/config". Where is /u/config set in order for the
> kernel to respond? Same question for /u/.cargo.
> 
> For example running:
> find /etc -type f -exec fgrep /u/config '{}' ';' -print
> 
> or 
> 
> find /etc -type f -exec fgrep /u/\.cargo '{}' ';' -print
> 
> no results.

When an application attempts to access a path, eg. /u/config,
the kernel walks along the path one component at a time.

When it finds a component that is marked as a possible automount
it calls back to the daemon to attempt to perform a mount.

When the daemon receives the path component (eg config) it tries
to match it to a map entry key.

In the cases here the wildcard map entry:
*               -rw,nolock              &:/home/users
matches the component so autoumout tries to mount it eventually
resulting in the error you see.

The kernel can't know if the path component is valid or not, it's
obligated to call back to the daemon, and the wildcard map entry
matching tells the daemon it might be valid so it must try and
mount it.

This type of false positive matching has always been a problem
with using the wildcard map entry.

There's not really anything I can do about it.

Ian

Comment 62 Fedora Update System 2018-03-11 22:18:44 UTC
autofs-5.1.4-6.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 63 Fedora Update System 2018-05-17 01:54:16 UTC
autofs-5.1.4-6.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-135356546d

Comment 64 Ben Cotton 2018-11-27 15:41:16 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 65 Ben Cotton 2018-11-30 21:38:15 UTC
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.