Bug 1523866
Summary: | autofs with NIS logs add_host_addrs: hostname lookup failed: Name or service not known/No address associated with hostname | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | RobbieTheK <rkudyba> | ||||
Component: | autofs | Assignee: | Ian Kent <ikent> | ||||
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 27 | CC: | ikent | ||||
Target Milestone: | --- | Keywords: | Reopened | ||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | autofs-5.1.4-5.fc26 autofs-5.1.4-5.fc27 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2018-11-30 21:38:15 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
RobbieTheK
2017-12-08 21:08:48 UTC
Can you provide the debug log, I might be able to get a better picture of what is going on at the time from it. I take it your saying this is new in that it suddenly started happening? (In reply to Ian Kent from comment #1) > Can you provide the debug log, I might be able to get > a better picture of what is going on at the time from > it. Have a look here: https://pastebin.com/vgP1ZNWJ > I take it your saying this is new in that it suddenly > started happening? Oh no I've been seeing this log for a couple of years. I could never reproduce it or figure out why finally decided to open a bug report to see if someone knew about it. (In reply to RobbieTheK from comment #2) > (In reply to Ian Kent from comment #1) > > Can you provide the debug log, I might be able to get > > a better picture of what is going on at the time from > > it. > > Have a look here: https://pastebin.com/vgP1ZNWJ That's not really useful. What I was hoping for is a longer term log to try and get a picture of what happens as time apsses in case there is something unusual. > > > I take it your saying this is new in that it suddenly > > started happening? > > Oh no I've been seeing this log for a couple of years. I could never > reproduce it or figure out why finally decided to open a bug report to see > if someone knew about it. I'll probably need to add some instrumentation around those calls to check what's being passed to them to start with. Ian > What I was hoping for is a longer term log to try and get > a picture of what happens as time apsses in case there is > something unusual. Should I just let the logs run with debug enabled for a couple of days? > > > I take it your saying this is new in that it suddenly > > > started happening? > > > > Oh no I've been seeing this log for a couple of years. I could never > > reproduce it or figure out why finally decided to open a bug report to see > > if someone knew about it. > > I'll probably need to add some instrumentation around those > calls to check what's being passed to them to start with. Should I wait for a debug RPM to install? (In reply to RobbieTheK from comment #4) > > What I was hoping for is a longer term log to try and get > > a picture of what happens as time apsses in case there is > > something unusual. > > Should I just let the logs run with debug enabled for a couple of days? Yes please. > > > > > > I take it your saying this is new in that it suddenly > > > > started happening? > > > > > > Oh no I've been seeing this log for a couple of years. I could never > > > reproduce it or figure out why finally decided to open a bug report to see > > > if someone knew about it. > > > > I'll probably need to add some instrumentation around those > > calls to check what's being passed to them to start with. > > Should I wait for a debug RPM to install? Yes, I'll put something together. As you probably guested this isn't the first time this has been seen. Looking through my list of upstream patches (I usually make changes to the upstream source and then apply them to the downstream package) I see a patch from quite a while ago for this problem. It's way down near the bottom of my patch list because I never got any feedback about whether it helped. It is kind of a long shot because it is based on the assumption there's a bug in getaddrinfo(3) which is unlikely. I'll add a couple more prints around the calls your seeing the problem with and build an rpm that includes it. Ian (In reply to Ian Kent from comment #5) > > > > Should I wait for a debug RPM to install? > > Yes, I'll put something together. Could you give this package a try please: https://koji.fedoraproject.org/koji/taskinfo?taskID=23603751 snip ... > > I'll add a couple more prints around the calls your seeing > the problem with and build an rpm that includes it. Assuming this isn't a bug in getaddrinfo(3) what I'm looking for is if there are any extra characters in the name being looked up (or garbage which might require more work to check). So keep an eye on the log prints for the name lookup failure sites. The other possibility is that a lookup is coming in at shutdown at just the wrong time and the name has already been freed but that's much harder to check out. An overview of what's going on from the debug log might help in checking if this is the case too. Anyway, this is (and has been) difficult to diagnose so lets hope for the best, ;) Ian (In reply to Ian Kent from comment #6) > (In reply to Ian Kent from comment #5) > > > > > > Should I wait for a debug RPM to install? > > > > Yes, I'll put something together. > > Could you give this package a try please: > https://koji.fedoraproject.org/koji/taskinfo?taskID=23603751 > > snip ... Done and installed/running. > > I'll add a couple more prints around the calls your seeing > > the problem with and build an rpm that includes it. > > Assuming this isn't a bug in getaddrinfo(3) what I'm looking > for is if there are any extra characters in the name being > looked up (or garbage which might require more work to check). > So keep an eye on the log prints for the name lookup failure > sites. > > The other possibility is that a lookup is coming in at > shutdown at just the wrong time and the name has already > been freed but that's much harder to check out. An overview > of what's going on from the debug log might help in checking > if this is the case too. > > Anyway, this is (and has been) difficult to diagnose so lets > hope for the best, ;) I noted the other BZ 1187165 that had a reference to getaddrinfo. Should any other services be set to debug logs like NIS/ypbind? I put the RPM on our NIS server and enabled debug. I mention this as the NIS master gets these errors too: ypserv[xxxxx]: refused connect from x.x.x.x.:33934 to procedure ypproc_match (ourdomain,shadow.byname;-1) which may or may not be related/red herring. Also possibly related or not is the Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=710490 I emailed the submitter and asked him if he ever got a resolution and here was his reply: "If my analysis is right the cause is autofs/automount or whatever doing host name lookups before the network is ready and caching the host name lookup failure. The quick and dirty fix is to restart autofs when the network is really ready." But he also used DHCP. That certainly gives us some more things to check. Problems relating the network not being available at startup have been around for a while too, in various forms, and have proven difficult to deal with, that's for sure. There are some changes in autofs 5.1.3 that attempt to work around some cases of the "network not ready at startup" type problem but there were a couple of annoying regressions introduced too. I'm still struggling to get autofs 5.1.4 out which should fix the regressions and allow us to use the network not ready changes. I think I'm fairly close to getting 5.1.4 out. In the meantime I'd really like to know if the getaddrinfo() usage change makes a difference. I'm pretty sure that, at the time the glibc folks said there was no bug like this in getaddrinfo() but I wanted to check anyway (but failed to get any feedback). Ian (In reply to Ian Kent from comment #10) > > Problems relating the network not being available at startup > have been around for a while too, in various forms, and have > proven difficult to deal with, that's for sure. > The other thing is that, IIRC, in at least one other bug (and this was quite a while ago) there was no evidence in debug logs I looked at that this was related to autofs startup. The logged error messages appeared intermittent and occurred well after autofs had been started..... (In reply to Ian Kent from comment #11) > (In reply to Ian Kent from comment #10) > > > > Problems relating the network not being available at startup > > have been around for a while too, in various forms, and have > > proven difficult to deal with, that's for sure. > > > > The other thing is that, IIRC, in at least one other bug (and > this was quite a while ago) there was no evidence in debug logs > I looked at that this was related to autofs startup. The logged > error messages appeared intermittent and occurred well after > autofs had been started..... This one? https://bugzilla.redhat.com/show_bug.cgi?id=682833 You replied in the thread. Does this mount failed with error code 16, retrying with the -f option mean anything useful? Dec 8 23:01:17 dsm automount[26568]: st_expire: state 1 path /u Dec 8 23:01:17 dsm automount[26568]: expire_proc: exp_proc = 140023695636224 path /u Dec 8 23:01:17 dsm automount[26568]: expire_proc_indirect: expire /u/erdos/kudyba/.gvfs Dec 8 23:01:17 dsm automount[26568]: handle_packet: type = 4 Dec 8 23:01:17 dsm automount[26568]: handle_packet_expire_indirect: token 94742, name pascal Dec 8 23:01:17 dsm automount[26568]: expiring path /u/pascal Dec 8 23:01:17 dsm automount[26568]: umount_multi: path /u/pascal incl 1 Dec 8 23:01:17 dsm automount[26568]: umount_subtree_mounts: unmounting dir = /u/pascal Dec 8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is busy Dec 8 23:01:17 dsm automount[26568]: spawn_umount: mount failed with error code 16, retrying with the -f option Dec 8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is busy Dec 8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is busy Dec 8 23:01:17 dsm automount[26568]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ Dec 8 23:01:17 dsm automount[26568]: expired /u/pascal Dec 8 23:01:17 dsm automount[26568]: dev_ioctl_send_ready: token = 94742 Dec 8 23:01:17 dsm automount[26568]: expire_proc_indirect: expire /u/pascal (In reply to RobbieTheK from comment #8) > I put the RPM on our NIS server and enabled debug. I mention this as the NIS > master gets these errors too: ypserv[xxxxx]: refused connect from > x.x.x.x.:33934 to procedure ypproc_match (ourdomain,shadow.byname;-1) which > may or may not be related/red herring. Yeah, but that looks like a name lookup failure, possibly due an access refusal within a securenet setup. ypmatch is used in the lookup module to get the NIS map entry based on the key name. If that fails then I don't think autofs will have a map entry location so it should fail at that point. The call site where these name lookups are failing is where the map entry location is being broken into a list of entries that can be tried as mounts. Without a map entry location (the NIS entry value field) I don't think autofs will even get to the name lookup calls we are seeing the problem with. But if we do see invalid or garbage characters in the names autofs is trying to lookup (and failing) then it will be worth looking into this further. Ian (In reply to RobbieTheK from comment #12) > (In reply to Ian Kent from comment #11) > > (In reply to Ian Kent from comment #10) > > > > > > Problems relating the network not being available at startup > > > have been around for a while too, in various forms, and have > > > proven difficult to deal with, that's for sure. > > > > > > > The other thing is that, IIRC, in at least one other bug (and > > this was quite a while ago) there was no evidence in debug logs > > I looked at that this was related to autofs startup. The logged > > error messages appeared intermittent and occurred well after > > autofs had been started..... > > This one? https://bugzilla.redhat.com/show_bug.cgi?id=682833 You replied in > the thread. Ha, not that one, but that is interesting. The suggestion is that autofs handles transient network failures badly. But, as seen in the bug, I was unable to find out why that occurred and if the the host name information can't be retrieved the list of potential mounts can't be constructed. If this is in fact what's happening in your case perhaps some retry logic is needed to cover short network not available events .... (In reply to RobbieTheK from comment #13) > Does this mount failed with error code 16, retrying with the -f option mean > anything useful? That's actually a umount not a mount. The message text is wrong and needs to be fixed. I'll do that for 5.1.4. > > Dec 8 23:01:17 dsm automount[26568]: st_expire: state 1 path /u > Dec 8 23:01:17 dsm automount[26568]: expire_proc: exp_proc = > 140023695636224 path /u > Dec 8 23:01:17 dsm automount[26568]: expire_proc_indirect: expire > /u/erdos/kudyba/.gvfs > Dec 8 23:01:17 dsm automount[26568]: handle_packet: type = 4 > Dec 8 23:01:17 dsm automount[26568]: handle_packet_expire_indirect: token > 94742, name pascal > Dec 8 23:01:17 dsm automount[26568]: expiring path /u/pascal > Dec 8 23:01:17 dsm automount[26568]: umount_multi: path /u/pascal incl 1 > Dec 8 23:01:17 dsm automount[26568]: umount_subtree_mounts: unmounting dir > = /u/pascal > Dec 8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is > busy > Dec 8 23:01:17 dsm automount[26568]: spawn_umount: mount failed with error > code 16, retrying with the -f option > Dec 8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is > busy > Dec 8 23:01:17 dsm automount[26568]: >> umount.nfs4: /u/pascal: device is > busy > Dec 8 23:01:17 dsm automount[26568]: Unable to update the mtab file, > /proc/mounts and /etc/mtab will differ > Dec 8 23:01:17 dsm automount[26568]: expired /u/pascal > Dec 8 23:01:17 dsm automount[26568]: dev_ioctl_send_ready: token = 94742 > Dec 8 23:01:17 dsm automount[26568]: expire_proc_indirect: expire /u/pascal I don't see how an expire that failed because the mount to be umounted became busy could lead to the name lookup problem. If we do need to investigate this further then we would need a core of automount to check the structure contents and so forth. And enough information for me to setup a machine close enough to the environment where the core was taken. Well after 24 hours no error besides the same "spawn_umount: mount failed with error code 16", I'll keep it going. Dec 9 21:49:02 dsm automount[14325]: handle_packet: type = 4 Dec 9 21:49:02 dsm automount[14325]: handle_packet_expire_indirect: token 95768, name sobolev Dec 9 21:49:02 dsm automount[14325]: expiring path /u/sobolev Dec 9 21:49:02 dsm automount[14325]: umount_multi: path /u/sobolev incl 1 Dec 9 21:49:02 dsm automount[14325]: umount_subtree_mounts: unmounting dir = /u/sobolev Dec 9 21:49:02 dsm automount[14325]: rm_unwanted_fn: removing directory /u/sobolev Dec 9 21:49:02 dsm automount[14325]: expired /u/sobolev Dec 9 21:49:02 dsm automount[14325]: dev_ioctl_send_ready: token = 95768 Dec 9 21:49:02 dsm automount[14325]: handle_packet: type = 4 Dec 9 21:49:02 dsm automount[14325]: handle_packet_expire_indirect: token 95769, name erdos Dec 9 21:49:02 dsm automount[14325]: expiring path /u/erdos Dec 9 21:49:02 dsm automount[14325]: umount_multi: path /u/erdos incl 1 Dec 9 21:49:02 dsm automount[14325]: umount_subtree_mounts: unmounting dir = /u/erdos Dec 9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy Dec 9 21:49:02 dsm automount[14325]: spawn_umount: mount failed with error code 16, retrying with the -f option Dec 9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy Dec 9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy Dec 9 21:49:02 dsm automount[14325]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ Dec 9 21:49:02 dsm automount[14325]: expired /u/erdos Dec 9 21:49:02 dsm automount[14325]: dev_ioctl_send_ready: token = 95769 Dec 9 21:49:02 dsm automount[14325]: expire_proc_indirect: expire /u/sobolev Dec 9 21:49:02 dsm automount[14325]: expire_proc_indirect: expire /u/erdos Dec 9 21:49:02 dsm automount[14325]: 2 remaining in /u Dec 9 21:49:02 dsm automount[14325]: expire_cleanup: got thid 140137134475008 path /u stat 5 Dec 9 21:49:02 dsm automount[14325]: expire_cleanup: sigchld: exp 140137134475008 finished, switching from 2 to 1 And you replied in this other thread for SSSD being the culprit: https://bugzilla.redhat.com/show_bug.cgi?id=1189767 (In reply to RobbieTheK from comment #17) > Well after 24 hours no error besides the same "spawn_umount: mount failed > with error code 16", I'll keep it going. > > Dec 9 21:49:02 dsm automount[14325]: handle_packet: type = 4 > Dec 9 21:49:02 dsm automount[14325]: handle_packet_expire_indirect: token > 95768, name sobolev > Dec 9 21:49:02 dsm automount[14325]: expiring path /u/sobolev > Dec 9 21:49:02 dsm automount[14325]: umount_multi: path /u/sobolev incl 1 > Dec 9 21:49:02 dsm automount[14325]: umount_subtree_mounts: unmounting dir > = /u/sobolev > Dec 9 21:49:02 dsm automount[14325]: rm_unwanted_fn: removing directory > /u/sobolev > Dec 9 21:49:02 dsm automount[14325]: expired /u/sobolev > Dec 9 21:49:02 dsm automount[14325]: dev_ioctl_send_ready: token = 95768 > Dec 9 21:49:02 dsm automount[14325]: handle_packet: type = 4 > Dec 9 21:49:02 dsm automount[14325]: handle_packet_expire_indirect: token > 95769, name erdos > Dec 9 21:49:02 dsm automount[14325]: expiring path /u/erdos > Dec 9 21:49:02 dsm automount[14325]: umount_multi: path /u/erdos incl 1 > Dec 9 21:49:02 dsm automount[14325]: umount_subtree_mounts: unmounting dir > = /u/erdos > Dec 9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is > busy > Dec 9 21:49:02 dsm automount[14325]: spawn_umount: mount failed with error > code 16, retrying with the -f option > Dec 9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is > busy > Dec 9 21:49:02 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is > busy > Dec 9 21:49:02 dsm automount[14325]: Unable to update the mtab file, > /proc/mounts and /etc/mtab will differ > Dec 9 21:49:02 dsm automount[14325]: expired /u/erdos > Dec 9 21:49:02 dsm automount[14325]: dev_ioctl_send_ready: token = 95769 > Dec 9 21:49:02 dsm automount[14325]: expire_proc_indirect: expire /u/sobolev > Dec 9 21:49:02 dsm automount[14325]: expire_proc_indirect: expire /u/erdos > Dec 9 21:49:02 dsm automount[14325]: 2 remaining in /u > Dec 9 21:49:02 dsm automount[14325]: expire_cleanup: got thid > 140137134475008 path /u stat 5 > Dec 9 21:49:02 dsm automount[14325]: expire_cleanup: sigchld: exp > 140137134475008 finished, switching from 2 to 1 This is almost certainly a completely different problem. It's might be related to mount propogation to private namespaces which you may not even be aware you are using. If you are aware your using private namespaces in some way then for them to be created properly when other mounts (including autofs mounts) are present then what you are using must clean up unused mounts as part of the namespace creation (or you need to do it yourself), the kernel can't do that. The autofs kernel module will still select dentries for expire in these cases. So far I have been unable to come up with an acceptable method to correctly check this type of dentry for expiry. But it is a false positive which shouldn't cause other problems. What is the additional problem this causes, I don't see any explanation of what that is here and it's way off topic for the bug too. Ian (In reply to RobbieTheK from comment #18) > And you replied in this other thread for SSSD being the culprit: > https://bugzilla.redhat.com/show_bug.cgi?id=1189767 Culprit of DNS lookup falure? It doesn't look like that's what I said and ... That's a completely different bug, it's nothing to do with the problem being discussed here. Note that the sssd network not ready problem hasn't been resolved in RHEL-7 and is very unlikely to find it way back to earlier releases when (and if) it is. Workarounds in autofs for this that went into RHEL-7.4 and and received further fixes in RHEL-7.5 are not likely to find their way into earlier releases either because the changes are to too large to meet z-stream release inclusion policy. (In reply to Ian Kent from comment #20) > (In reply to RobbieTheK from comment #18) > > And you replied in this other thread for SSSD being the culprit: > > https://bugzilla.redhat.com/show_bug.cgi?id=1189767 > > Culprit of DNS lookup falure? > It doesn't look like that's what I said and ... > > That's a completely different bug, it's nothing to do with > the problem being discussed here. > > Note that the sssd network not ready problem hasn't been > resolved in RHEL-7 and is very unlikely to find it way back > to earlier releases when (and if) it is. > > Workarounds in autofs for this that went into RHEL-7.4 and > and received further fixes in RHEL-7.5 are not likely to > find their way into earlier releases either because the > changes are to too large to meet z-stream release inclusion > policy. Oops, I digress, this is Fedora. The sssd "Network not available at startup" fixes are in Fedora 26 I believe so this shouldn't be a problem. Whether name lookups still fail because sss is not yet started is not and autofs problem. so after 3 days now still no errors other than those reported previously: Dec 11 09:25:52 dsm automount[14325]: expiring path /u/erdos Dec 11 09:25:52 dsm automount[14325]: umount_multi: path /u/erdos incl 1 Dec 11 09:25:52 dsm automount[14325]: umount_subtree_mounts: unmounting dir = /u/erdos Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy Dec 11 09:25:52 dsm automount[14325]: spawn_umount: mount failed with error code 16, retrying with the -f option Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is busy Dec 11 09:25:52 dsm automount[14325]: Unable to update the mtab file, /proc/mounts and /etc/mtab will differ Dec 11 09:25:52 dsm automount[14325]: expired /u/erdos Dec 11 09:25:52 dsm automount[14325]: dev_ioctl_send_ready: token = 97331 Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/sobolev Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/pascal Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/erdos Dec 11 09:25:53 dsm automount[14325]: 3 remaining in /u Note that erdos is our NFS file server. There are several instances of this error: ypserv[xxxxx]: refused connect from erdos:54550 to procedure ypproc_match (divscimath,shadow.byname;-1) Notice the server hostname is the same as the umount (erdos being our file server) above. Another red herring? (In reply to RobbieTheK from comment #22) > so after 3 days now still no errors other than those reported previously: > Dec 11 09:25:52 dsm automount[14325]: expiring path /u/erdos > Dec 11 09:25:52 dsm automount[14325]: umount_multi: path /u/erdos incl 1 > Dec 11 09:25:52 dsm automount[14325]: umount_subtree_mounts: unmounting dir > = /u/erdos > Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is > busy > Dec 11 09:25:52 dsm automount[14325]: spawn_umount: mount failed with error > code 16, retrying with the -f option > Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is > busy > Dec 11 09:25:52 dsm automount[14325]: >> umount.nfs4: /u/erdos: device is > busy > Dec 11 09:25:52 dsm automount[14325]: Unable to update the mtab file, > /proc/mounts and /etc/mtab will differ > Dec 11 09:25:52 dsm automount[14325]: expired /u/erdos > Dec 11 09:25:52 dsm automount[14325]: dev_ioctl_send_ready: token = 97331 > Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/sobolev > Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/pascal > Dec 11 09:25:53 dsm automount[14325]: expire_proc_indirect: expire /u/erdos > Dec 11 09:25:53 dsm automount[14325]: 3 remaining in /u > > Note that erdos is our NFS file server. > > There are several instances of this error: > ypserv[xxxxx]: refused connect from erdos:54550 to procedure ypproc_match > (divscimath,shadow.byname;-1) I think I've already said that I don't know what's causing these. They look like NIS access denials and are unlikely to be related. > > Notice the server hostname is the same as the umount (erdos being our file > server) above. Another red herring? All I can say about the above log messages is to offer explanations of causes that I'm aware of which I've done. Another point though is a mount being seen as in use when a umount is attempted is local to the host, it shouldn't relate to any network connection problem or name lookup problem, transient or otherwise. Is it possible that the debug version you gave me has something that fixed the "add_host_addrs: hostname lookup failed: Name or service not known/No address associated with hostname" errors? I have not seen them on the server and workstation that I installed the RPM you shared with me, but the other servers continue to log them. Not sure if it's worth mentioning but NetworkManager is disabled and network is running but I don't think there's any correlation to the error. As for that ypproc_match error you gave me a clue as well as the comment at http://www.thegeekstuff.com/2011/11/linux-nis-server-and-client#comment-4486354. In /etc/ypserv.conf I changed: IP-of-NIS-server : * : shadow.byname : port to: * : * : shadow.byname : none And I believe those errors are gone. (In reply to RobbieTheK from comment #24) > Is it possible that the debug version you gave me has something that fixed > the "add_host_addrs: hostname lookup failed: Name or service not known/No > address associated with hostname" errors? I have not seen them on the server > and workstation that I installed the RPM you shared with me, but the other > servers continue to log them. Well, yes, I thought that's what we were testing plus I added some logging around the parameters to getaddrinfo(3) to see if there was a problem with them. See the description of the resurrected patch I added: "autofs-5.1.3 - workaround getaddrinfo(3) ai_canonname bug There appears to be a bug in some versions of getaddrinfo() where the addrinfo structure gets garbage in ai_canonname when it is not requested by the lookup flags causing a crash. Try and work around this by always requesting ai_canonname for lookups and hope this doesn't affect the semantics of the autofs lookup usage." I can't remember what lead me to this conclusion when I originally wrote this patch but, IIRC, I was told by glibc folks this wasn't the case but I wanted to check anyway. We don't see a crash here but the assumption is that garbage in ai_canonname may be causing an error return in this case. I never got any test feedback for the change at that time. Ian I think for our purposes your patch fixes the issue. Thanks so much for addressing this. I sometimes feel like these bug reports just go into the ether and auto-close once the Fedora release is EOL'ed (like many did this morning). Is the next step for you to put this in the testing repo? (In reply to RobbieTheK from comment #26) > I think for our purposes your patch fixes the issue. Thanks so much for > addressing this. I sometimes feel like these bug reports just go into the > ether and auto-close once the Fedora release is EOL'ed (like many did this > morning). Yes, there's some of that going with autofs too. Often the bugs are fixed but not closed. The 5.1.4 release should fix most known problems that have been hanging around a bit too long ..... > > Is the next step for you to put this in the testing repo? I'm hoping to release 5.1.4 around the end of the week so I'd like to wait until then. I didn't have this patch in the 5.1.4 patch queue when I posted them to the autofs list for review although I'm thinking of including it anyway now. In any case the patch will be added to the rpm. Ian autofs-5.1.4-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-21cb07e1d3 autofs-5.1.4-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-593cc5120d autofs-5.1.4-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-21cb07e1d3 autofs-5.1.4-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-593cc5120d autofs-5.1.4-4.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-eda74692fc autofs-5.1.4-4.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-164b3ee23a autofs-5.1.4-4.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-eda74692fc autofs-5.1.4-4.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-164b3ee23a This still hasn't been pushed to the stable repos? autofs-5.1.4-5.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-eda74692fc autofs-5.1.4-5.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2017-164b3ee23a autofs-5.1.4-5.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-164b3ee23a autofs-5.1.4-5.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-eda74692fc autofs-5.1.4-5.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report. autofs-5.1.4-5.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report. Still seeing automount[31776]: add_host_addrs: hostname lookup failed: No address associated with hostname on all NIS clients (not the server). (In reply to RobbieTheK from comment #43) > Still seeing automount[31776]: add_host_addrs: hostname lookup failed: No > address associated with hostname on all NIS clients (not the server). That's a shame. The change that was included in the package that you tested is definitely included in the current Fedora autofs rpm so I don't know what is causing this. I'm wondering what we can do to collect useful information. I'm not even sure that a core taken at the time this happens will be helpful as I've looked at one before. I wonder if a network packet trace would give us anything useful ..... It would be difficult to sort through that but is worth a try. Lets try this first. I also wonder if this is some sort of transient error where a retry (or two) would fix it but that really shouldn't be needed ....
> I wonder if a network packet trace would give us anything
> useful ..... It would be difficult to sort through that but
> is worth a try. Lets try this first.
So run a tcpdump? For how long? Any specific packets to capture with the -i flag? Or number of packets with -c?
(In reply to RobbieTheK from comment #45) > > I wonder if a network packet trace would give us anything > > useful ..... It would be difficult to sort through that but > > is worth a try. Lets try this first. > > So run a tcpdump? For how long? Any specific packets to capture with the -i > flag? Or number of packets with -c? Yes, tcpdump would do. Run it on a machine until you see one of those error messages in the log. Just grab everything we can filter the packet log when read. It would be bad to miss something because of filtering. Maybe something like: tcpdump -i <net interface> -w <packet trace file> Really, all I want to do is verify a DNS query is being made, but I should be able to check the name being queried is valid as well (in case it has become corrupt in some way). TBH I'm pretty sure I have checked the validity of the name long ago but I have to check again. Ian Created attachment 1400139 [details]
tcpdump cap file
I can't believe it I was able to catch it! Oddly it was just from a SSH that uses key authentication. The tcpdump aligns with the second occurrence of the error at Feb 24 00:19:19.
Feb 24 00:16:37 workstation automount[7758]: add_host_addrs: hostname lookup failed: No address associated with hostname
Feb 24 00:16:39 workstation dbus-daemon[902]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.26454' (uid=0 pid=4803 comm="su - ")
Feb 24 00:16:39 workstation systemd[1]: Starting Fingerprint Authentication Daemon...
Feb 24 00:16:40 workstation dbus-daemon[902]: [system] Successfully activated service 'net.reactivated.Fprint'
Feb 24 00:16:40 workstation systemd[1]: Started Fingerprint Authentication Daemon.
Feb 24 00:16:44 workstation su[4803]: (to root) kudyba on pts/0
Feb 24 00:19:05 workstation kernel: device eno1 entered promiscuous mode
Feb 24 00:19:10 workstation sssd[kcm][2462]: Shutting down
Feb 24 00:19:12 workstation systemd[1]: Started SSSD Kerberos Cache Manager.
Feb 24 00:19:12 workstation sssd[kcm][6058]: Starting up
Feb 24 00:19:19 workstation automount[7758]: add_host_addrs: hostname lookup failed: No address associated with hostname
I do see these errors from tshark but don't want to send you down the wrong path/red herring:
Yellow Pages Service MATCH reply Error:YP_NOMAP
Yellow Pages Service MATCH reply Error:YP_NOKEY
Yellow Pages Service MATCH reply Error:YP_NOMAP
[Program Version: 2]
[V2 Procedure: MATCH (3)]
Status: YP_NOMAP (-1)
Value: <EMPTY>
length: 0
contents: <EMPTY>
(In reply to RobbieTheK from comment #47) > Created attachment 1400139 [details] > tcpdump cap file > > I can't believe it I was able to catch it! Oddly it was just from a SSH that > uses key authentication. The tcpdump aligns with the second occurrence of > the error at Feb 24 00:19:19. > > Feb 24 00:16:37 workstation automount[7758]: add_host_addrs: hostname lookup > failed: No address associated with hostname > Feb 24 00:16:39 workstation dbus-daemon[902]: [system] Activating via > systemd: service name='net.reactivated.Fprint' unit='fprintd.service' > requested by ':1.26454' (uid=0 pid=4803 comm="su - ") > Feb 24 00:16:39 workstation systemd[1]: Starting Fingerprint Authentication > Daemon... > Feb 24 00:16:40 workstation dbus-daemon[902]: [system] Successfully > activated service 'net.reactivated.Fprint' > Feb 24 00:16:40 workstation systemd[1]: Started Fingerprint Authentication > Daemon. > Feb 24 00:16:44 workstation su[4803]: (to root) kudyba on pts/0 > Feb 24 00:19:05 workstation kernel: device eno1 entered promiscuous mode > Feb 24 00:19:10 workstation sssd[kcm][2462]: Shutting down > Feb 24 00:19:12 workstation systemd[1]: Started SSSD Kerberos Cache Manager. > Feb 24 00:19:12 workstation sssd[kcm][6058]: Starting up > Feb 24 00:19:19 workstation automount[7758]: add_host_addrs: hostname lookup > failed: No address associated with hostname I can't see anything in the packet trace that looks like a problem and nothing name translation related. > > I do see these errors from tshark but don't want to send you down the wrong > path/red herring: > Yellow Pages Service MATCH reply Error:YP_NOMAP > Yellow Pages Service MATCH reply Error:YP_NOKEY > > Yellow Pages Service MATCH reply Error:YP_NOMAP > [Program Version: 2] > [V2 Procedure: MATCH (3)] > Status: YP_NOMAP (-1) > Value: <EMPTY> > length: 0 > contents: <EMPTY> Yeah, AFAICS the YP_NOMAP errors are coming from shadow lookups. It implies the shadow file isn't setup as one of the NIS maps except that there are successful shadow map lookups in the trace. I don't understand that but it shouldn't affect host name lookups. Are you sure all your NIS replicas are receiving the shadow map? The YP_NOKEY error reports look to have come from a lookup of key .cargo in map auto.home which then fell back to a wildcard map entry in auto.home followed by a failed host name lookup for the same name. That's probably normal. Not sure what to do with this now. I tried logging the host name that's being looked up in the past to make sure the name isn't corrupt or is validly not resolving. I have to wonder why I didn't improve that message at the time, it's not really good enough. I'll do that and update the package (and put it in my upstream patch queue for the next release) and see if we can get more information about what's going on. Ian autofs-5.1.4-6.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-21690ee57d Can you try: https://koji.fedoraproject.org/koji/buildinfo?buildID=1052431 and see what we get for the name being looked up for these failed name resolutions. (In reply to Ian Kent from comment #50) > Can you try: > https://koji.fedoraproject.org/koji/buildinfo?buildID=1052431 > > and see what we get for the name being looked up for > these failed name resolutions. Just making sure, I enabled debug logging in /etc/autofs.conf, you don't need another tcpdump, correct? > The YP_NOKEY error reports look to have come from a lookup
> of key .cargo in map auto.home which then fell back to a
> wildcard map entry in auto.home followed by a failed host
> name lookup for the same name. That's probably normal.
Well here is the contents of /etc/auto.home
# key mount-options location
#
# Any special cases go here
#
# All others: /u/host is on host:/home/users
* -rw,nolock &:/home/users
Just making sure I have the correct RPM installed:
rpm -q autofs
autofs-5.1.4-6.fc27.x86_64
Here are logs around the time of the subsequent error:
Mar 2 10:08:27 workstation systemd[1]: Started SSSD Kerberos Cache Manager.
Mar 2 10:08:27 workstation sssd[kcm][29512]: Starting up
Mar 2 10:08:32 workstation automount[23876]: st_expire: state 1 path /cifs
Mar 2 10:08:32 workstation automount[23876]: expire_proc: exp_proc = 139742147716864 path /cifs
Mar 2 10:08:32 workstation automount[23876]: expire_cleanup: got thid 139742147716864 path /cifs stat 0
Mar 2 10:08:32 workstation automount[23876]: expire_cleanup: sigchld: exp 139742147716864 finished, switching from 2 to 1
Mar 2 10:08:32 workstation automount[23876]: st_ready: st_ready(): state = 2 path /cifs
Mar 2 10:08:38 workstation automount[23876]: handle_packet: type = 3
Mar 2 10:08:38 workstation automount[23876]: handle_packet_missing_indirect: token 29747, name mail, request pid 29514
Mar 2 10:08:38 workstation automount[23876]: attempting to mount entry /local/mail
Mar 2 10:08:38 workstation automount[23876]: lookup_mount: lookup(yp): looking up mail
Mar 2 10:08:38 workstation automount[23876]: lookup_mount: lookup(yp): mail -> -rw#011#011#011ourhostname:/var/spool/mail
Mar 2 10:08:38 workstation automount[23876]: parse_mount: parse(sun): expanded entry: -rw#011#011#011ourhostname:/var/spool/mail
Mar 2 10:08:38 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw
Mar 2 10:08:38 workstation automount[23876]: parse_mount: parse(sun): dequote("ourhostname:/var/spool/mail") -> ourhostname:/var/spool/mail
Mar 2 10:08:38 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw, loc=ourhostname:/var/spool/mail
Mar 2 10:08:38 workstation automount[23876]: sun_mount: parse(sun): mounting root /local, mountpoint mail, what ourhostname:/var/spool/mail, fstype nfs, options rw
Mar 2 10:08:38 workstation automount[23876]: mount_mount: mount(nfs): root=/local name=mail what=ourhostname:/var/spool/mail, fstype=nfs, options=rw
Mar 2 10:08:38 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw", nobind=0, nosymlink=0, ro=0
Mar 2 10:08:38 workstation automount[23876]: mount_mount: mount(nfs): calling mkdir_path /local/mail
Mar 2 10:08:38 workstation automount[23876]: mount_mount: mount(nfs): calling mount -t nfs -s -o rw ourhostname:/var/spool/mail /local/mail
Mar 2 10:08:39 workstation automount[23876]: mount_mount: mount(nfs): mounted ourhostname:/var/spool/mail on /local/mail
Mar 2 10:08:39 workstation automount[23876]: dev_ioctl_send_ready: token = 29747
Mar 2 10:08:39 workstation automount[23876]: mounted /local/mail
Mar 2 10:08:39 workstation automount[23876]: st_expire: state 1 path /drobo-rh
Mar 2 10:08:39 workstation automount[23876]: expire_proc: exp_proc = 139742147716864 path /drobo-rh
Mar 2 10:08:39 workstation automount[23876]: st_expire: state 1 path /drobo-lc
Mar 2 10:08:39 workstation automount[23876]: expire_proc: exp_proc = 139742163220224 path /drobo-lc
Mar 2 10:08:39 workstation automount[23876]: expire_cleanup: got thid 139742147716864 path /drobo-rh stat 0
Mar 2 10:08:39 workstation automount[23876]: expire_cleanup: sigchld: exp 139742147716864 finished, switching from 2 to 1
Mar 2 10:08:39 workstation automount[23876]: st_ready: st_ready(): state = 2 path /drobo-rh
Mar 2 10:08:39 workstation automount[23876]: expire_cleanup: got thid 139742163220224 path /drobo-lc stat 0
Mar 2 10:08:39 workstation automount[23876]: expire_cleanup: sigchld: exp 139742163220224 finished, switching from 2 to 1
Mar 2 10:08:39 workstation automount[23876]: st_ready: st_ready(): state = 2 path /drobo-lc
Mar 2 10:08:41 workstation automount[23876]: handle_packet: type = 3
Mar 2 10:08:41 workstation automount[23876]: handle_packet_missing_indirect: token 29748, name config, request pid 29514
Mar 2 10:08:41 workstation automount[23876]: attempting to mount entry /u/config
Mar 2 10:08:41 workstation automount[23876]: lookup_mount: lookup(yp): looking up config
Mar 2 10:08:41 workstation automount[23876]: lookup_mount: lookup(yp): config -> -rw,nolock#011#011&:/home/users
Mar 2 10:08:41 workstation automount[23876]: parse_mount: parse(sun): expanded entry: -rw,nolock#011#011config:/home/users
Mar 2 10:08:41 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw,nolock
Mar 2 10:08:41 workstation automount[23876]: parse_mount: parse(sun): dequote("config:/home/users") -> config:/home/users
Mar 2 10:08:41 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw,nolock, loc=config:/home/users
Mar 2 10:08:41 workstation automount[23876]: sun_mount: parse(sun): mounting root /u, mountpoint config, what config:/home/users, fstype nfs, options rw,nolock
Mar 2 10:08:41 workstation automount[23876]: mount_mount: mount(nfs): root=/u name=config what=config:/home/users, fstype=nfs, options=rw,nolock
Mar 2 10:08:41 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw,nolock", nobind=0, nosymlink=0, ro=0
Mar 2 10:08:41 workstation automount[23876]: add_host_addrs: hostname lookup for config failed: No address associated with hostname
Mar 2 10:08:41 workstation automount[23876]: mount(nfs): no hosts available
Mar 2 10:08:41 workstation automount[23876]: dev_ioctl_send_fail: token = 29748
Mar 2 10:08:41 workstation automount[23876]: failed to mount /u/config
Mar 2 10:08:42 workstation automount[23876]: handle_packet: type = 3
Mar 2 10:08:42 workstation automount[23876]: handle_packet_missing_indirect: token 29749, name ourhostname, request pid 29514
Mar 2 10:08:42 workstation automount[23876]: attempting to mount entry /local/ourhostname
Mar 2 10:08:42 workstation automount[23876]: lookup_mount: lookup(yp): looking up ourhostname
Mar 2 10:08:42 workstation automount[23876]: lookup_mount: lookup(yp): ourhostname -> &:/usr/local
Mar 2 10:08:42 workstation automount[23876]: parse_mount: parse(sun): expanded entry: ourhostname:/usr/local
Mar 2 10:08:42 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw
Mar 2 10:08:42 workstation automount[23876]: parse_mount: parse(sun): dequote("ourhostname:/usr/local") -> ourhostname:/usr/local
Mar 2 10:08:42 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw, loc=ourhostname:/usr/local
Mar 2 10:08:42 workstation automount[23876]: sun_mount: parse(sun): mounting root /local, mountpoint ourhostname, what ourhostname:/usr/local, fstype nfs, options rw
Mar 2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): root=/local name=ourhostname what=ourhostname:/usr/local, fstype=nfs, options=rw
Mar 2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw", nobind=0, nosymlink=0, ro=0
Mar 2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): calling mkdir_path /local/ourhostname
Mar 2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): calling mount -t nfs -s -o rw ourhostname:/usr/local /local/ourhostname
Mar 2 10:08:42 workstation automount[23876]: mount_mount: mount(nfs): mounted ourhostname:/usr/local on /local/ourhostname
Mar 2 10:08:42 workstation automount[23876]: dev_ioctl_send_ready: token = 29749
Mar 2 10:08:42 workstation automount[23876]: mounted /local/ourhostname
Mar 2 10:08:42 workstation automount[23876]: st_readmap: state 1 path /local
Mar 2 10:08:42 workstation automount[23876]: re-reading map for /local
Mar 2 10:08:42 workstation automount[23876]: lookup_nss_read_map: reading map files auto.local
Mar 2 10:08:42 workstation automount[23876]: lookup_nss_read_map: reading map nis auto.local
Mar 2 10:08:42 workstation automount[23876]: do_init: parse(sun): init gathered global options: rw
Mar 2 10:08:42 workstation automount[23876]: lookup_read_map: map read not needed, so not done
Mar 2 10:08:42 workstation automount[23876]: lookup_prune_one_cache: prune check posponed, /local/mail mounted
Mar 2 10:08:42 workstation automount[23876]: st_ready: st_ready(): state = 4 path /local
Mar 2 10:08:43 workstation automount[23876]: handle_packet: type = 3
Mar 2 10:08:43 workstation automount[23876]: handle_packet_missing_indirect: token 29750, name config, request pid 29514
Mar 2 10:08:43 workstation automount[23876]: dev_ioctl_send_fail: token = 29750
Mar 2 10:08:47 workstation automount[23876]: st_expire: state 1 path /cifs
Mar 2 10:08:47 workstation automount[23876]: expire_proc: exp_proc = 139742147716864 path /cifs
Mar 2 10:08:47 workstation automount[23876]: expire_cleanup: got thid 139742147716864 path /cifs stat 0
Mar 2 10:08:47 workstation automount[23876]: expire_cleanup: sigchld: exp 139742147716864 finished, switching from 2 to 1
Mar 2 10:08:47 workstation automount[23876]: st_ready: st_ready(): state = 2 path /cifs
Mar 2 10:08:54 workstation automount[23876]: st_expire: state 1 path /drobo-rh
Mar 2 10:08:54 workstation automount[23876]: expire_proc: exp_proc = 139742147716864 path /drobo-rh
Mar 2 10:08:54 workstation automount[23876]: st_expire: state 1 path /drobo-lc
Mar 2 10:08:54 workstation automount[23876]: expire_proc: exp_proc = 139742163220224 path /drobo-lc
[...]
Mar 2 10:09:47 workstation automount[23876]: expire_cleanup: got thid 139742147716864 path /cifs stat 0
Mar 2 10:09:47 workstation automount[23876]: expire_cleanup: sigchld: exp 139742147716864 finished, switching from 2 to 1
Mar 2 10:09:47 workstation automount[23876]: st_ready: st_ready(): state = 2 path /cifs
Mar 2 10:09:50 workstation sssd[secrets][27107]: Shutting down
Mar 2 10:09:50 workstation automount[23876]: handle_packet: type = 3
Mar 2 10:09:50 workstation automount[23876]: handle_packet_missing_indirect: token 29751, name config, request pid 29514
Mar 2 10:09:50 workstation automount[23876]: attempting to mount entry /u/config
Mar 2 10:09:50 workstation automount[23876]: lookup_mount: lookup(yp): looking up config
Mar 2 10:09:50 workstation automount[23876]: lookup_mount: lookup(yp): config -> -rw,nolock#011#011&:/home/users
Mar 2 10:09:50 workstation automount[23876]: parse_mount: parse(sun): expanded entry: -rw,nolock#011#011config:/home/users
Mar 2 10:09:50 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw,nolock
Mar 2 10:09:50 workstation automount[23876]: parse_mount: parse(sun): dequote("config:/home/users") -> config:/home/users
Mar 2 10:09:50 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw,nolock, loc=config:/home/users
Mar 2 10:09:50 workstation automount[23876]: sun_mount: parse(sun): mounting root /u, mountpoint config, what config:/home/users, fstype nfs, options rw,nolo
ck
Mar 2 10:09:50 workstation automount[23876]: mount_mount: mount(nfs): root=/u name=config what=config:/home/users, fstype=nfs, options=rw,nolock
Mar 2 10:09:50 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw,nolock", nobind=0, nosymlink=0, ro=0
Mar 2 10:09:50 workstation automount[23876]: add_host_addrs: hostname lookup for config failed: No address associated with hostname
Mar 2 10:09:50 workstation automount[23876]: mount(nfs): no hosts available
Mar 2 10:09:50 workstation automount[23876]: dev_ioctl_send_fail: token = 29751
Mar 2 10:09:50 workstation automount[23876]: failed to mount /u/config
Mar 2 10:09:50 workstation automount[23876]: st_readmap: state 1 path /u
Mar 2 10:09:50 workstation automount[23876]: re-reading map for /u
Mar 2 10:09:50 workstation automount[23876]: lookup_nss_read_map: reading map files auto.home
Mar 2 10:09:50 workstation automount[23876]: lookup_nss_read_map: reading map nis auto.home
Mar 2 10:09:50 workstation automount[23876]: do_init: parse(sun): init gathered global options: rw
Mar 2 10:09:50 workstation automount[23876]: lookup_read_map: map read not needed, so not done
Mar 2 10:09:50 workstation automount[23876]: lookup_prune_one_cache: prune check posponed, /u/workstation mounted
Mar 2 10:09:50 workstation automount[23876]: st_ready: st_ready(): state = 4 path /u
Mar 2 10:09:52 workstation systemd[1]: Started SSSD Secrets Service responder.
Mar 2 10:09:52 workstation sssd[secrets][29650]: Starting up
autofs-5.1.4-6.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-21690ee57d (In reply to RobbieTheK from comment #51) > (In reply to Ian Kent from comment #50) > > Can you try: > > https://koji.fedoraproject.org/koji/buildinfo?buildID=1052431 > > > > and see what we get for the name being looked up for > > these failed name resolutions. > > Just making sure, I enabled debug logging in /etc/autofs.conf, you don't > need another tcpdump, correct? No I don't think it will get any information. There wasn't anything unusual in the previous one. attempting to mount entry /u/config lookup_mount: lookup(yp): looking up config lookup_mount: lookup(yp): config -> -rw,nolock#011#011&:/home/users parse_mount: parse(sun): expanded entry: -rw,nolock#011#011config:/home/users parse_mount: parse(sun): gathered options: rw,nolock parse_mount: parse(sun): dequote("config:/home/users") -> config:/home/users parse_mount: parse(sun): core of entry: options=rw,nolock, loc=config:/home/users sun_mount: parse(sun): mounting root /u, mountpoint config, what config:/home/users, fstype nfs, options rw,nolock mount_mount: mount(nfs): root=/u name=config what=config:/home/users, fstype=nfs, options=rw,nolock mount_mount: mount(nfs): nfs options="rw,nolock", nobind=0, nosymlink=0, ro=0 add_host_addrs: hostname lookup for config failed: No address associated with hostname mount(nfs): no hosts available dev_ioctl_send_fail: token = 29748 failed to mount /u/config This looks like a perfectly valid lookup to me and would succeed if "config" was a valid host name, should it be? (In reply to Ian Kent from comment #55) > attempting to mount entry /u/config > add_host_addrs: hostname lookup for config failed: No address associated > with hostname > mount(nfs): no hosts available > dev_ioctl_send_fail: token = 29748 > failed to mount /u/config > > This looks like a perfectly valid lookup to me and would > succeed if "config" was a valid host name, should it be? Hmm what config file would this be coming from? This is not a name of a mount nor share that I know of nor expect? (In reply to RobbieTheK from comment #56) > (In reply to Ian Kent from comment #55) > > attempting to mount entry /u/config > > add_host_addrs: hostname lookup for config failed: No address associated > > with hostname > > mount(nfs): no hosts available > > dev_ioctl_send_fail: token = 29748 > > failed to mount /u/config > > > > This looks like a perfectly valid lookup to me and would > > succeed if "config" was a valid host name, should it be? > > Hmm what config file would this be coming from? This is not a name of a > mount nor share that I know of nor expect? That's the problem with using the wilcard map entry. # All others: /u/host is on host:/home/users * -rw,nolock &:/home/users The kernel responds to path lookups, aka. /u/config. It's duty bound to call back to the daemon when a process tries to access a directory within the managed automount, /u is this case. The daemon then tries to match the requested key to a map entry but in this case the wildcard map entry matches "any" otherwise unmatched entry. So it looks like this error is valid for this case. If, by coincidence, the key did correspond to a valid host name that didn't have a /home/users export it would fail at that time instead. Ian So right after I login using SSH key authentication this gets logged: Mar 3 22:21:46 workstation automount[23876]: handle_packet: type = 3 Mar 3 22:21:46 workstation automount[23876]: handle_packet_missing_indirect: token 29841, name .cargo, request pid 28233 Mar 3 22:21:46 workstation automount[23876]: attempting to mount entry /u/.cargo Mar 3 22:21:46 workstation automount[23876]: lookup_mount: lookup(yp): looking up .cargo Mar 3 22:21:46 workstation automount[23876]: lookup_mount: lookup(yp): .cargo -> -rw,nolock#011#011&:/home/users Mar 3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): expanded entry: -rw,nolock#011#011.cargo:/home/users Mar 3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): gathered options: rw,nolock Mar 3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): dequote(".cargo:/home/users") -> .cargo:/home/users Mar 3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): core of entry: options=rw,nolock, loc=.cargo:/home/users Mar 3 22:21:46 workstation automount[23876]: sun_mount: parse(sun): mounting root /u, mountpoint .cargo, what .cargo:/home/users, fstype nfs, options rw,nolock Mar 3 22:21:46 workstation automount[23876]: mount_mount: mount(nfs): root=/u name=.cargo what=.cargo:/home/users, fstype=nfs, options=rw,nolock Mar 3 22:21:46 workstation automount[23876]: mount_mount: mount(nfs): nfs options="rw,nolock", nobind=0, nosymlink=0, ro=0 Mar 3 22:21:46 workstation automount[23876]: add_host_addrs: hostname lookup for .cargo failed: No address associated with hostname Mar 3 22:21:46 workstation automount[23876]: mount(nfs): no hosts available Mar 3 22:21:46 workstation automount[23876]: dev_ioctl_send_fail: token = 29841 Mar 3 22:21:46 workstation automount[23876]: failed to mount /u/.cargo Mar 3 22:21:48 workstation dbus-daemon[902]: [system] Activating via systemd: service name='net.reactivated.Fprint' unit='fprintd.service' requested by ':1.33064' (uid=0 pid=28272 comm="su - ") Mar 3 22:21:48 workstation systemd[1]: Starting Fingerprint Authentication Daemon... Mar 3 22:21:48 workstation dbus-daemon[902]: [system] Successfully activated service 'net.reactivated.Fprint' Mar 3 22:21:48 workstation systemd[1]: Started Fingerprint Authentication Daemon. We don't have a hostname .cargo? Notice the leading dot. (In reply to RobbieTheK from comment #58) > So right after I login using SSH key authentication this gets logged: > Mar 3 22:21:46 workstation automount[23876]: handle_packet: type = 3 > Mar 3 22:21:46 workstation automount[23876]: > handle_packet_missing_indirect: token 29841, name .cargo, request pid 28233 > Mar 3 22:21:46 workstation automount[23876]: attempting to mount entry > /u/.cargo > Mar 3 22:21:46 workstation automount[23876]: lookup_mount: lookup(yp): > looking up .cargo > Mar 3 22:21:46 workstation automount[23876]: lookup_mount: lookup(yp): > .cargo -> -rw,nolock#011#011&:/home/users > Mar 3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): > expanded entry: -rw,nolock#011#011.cargo:/home/users > Mar 3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): > gathered options: rw,nolock > Mar 3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): > dequote(".cargo:/home/users") -> .cargo:/home/users > Mar 3 22:21:46 workstation automount[23876]: parse_mount: parse(sun): core > of entry: options=rw,nolock, loc=.cargo:/home/users > Mar 3 22:21:46 workstation automount[23876]: sun_mount: parse(sun): > mounting root /u, mountpoint .cargo, what .cargo:/home/users, fstype nfs, > options rw,nolock > Mar 3 22:21:46 workstation automount[23876]: mount_mount: mount(nfs): > root=/u name=.cargo what=.cargo:/home/users, fstype=nfs, options=rw,nolock > Mar 3 22:21:46 workstation automount[23876]: mount_mount: mount(nfs): nfs > options="rw,nolock", nobind=0, nosymlink=0, ro=0 > Mar 3 22:21:46 workstation automount[23876]: add_host_addrs: hostname > lookup for .cargo failed: No address associated with hostname > Mar 3 22:21:46 workstation automount[23876]: mount(nfs): no hosts available > Mar 3 22:21:46 workstation automount[23876]: dev_ioctl_send_fail: token = > 29841 > Mar 3 22:21:46 workstation automount[23876]: failed to mount /u/.cargo > Mar 3 22:21:48 workstation dbus-daemon[902]: [system] Activating via > systemd: service name='net.reactivated.Fprint' unit='fprintd.service' > requested by ':1.33064' (uid=0 pid=28272 comm="su - ") > Mar 3 22:21:48 workstation systemd[1]: Starting Fingerprint Authentication > Daemon... > Mar 3 22:21:48 workstation dbus-daemon[902]: [system] Successfully > activated service 'net.reactivated.Fprint' > Mar 3 22:21:48 workstation systemd[1]: Started Fingerprint Authentication > Daemon. > > We don't have a hostname .cargo? Notice the leading dot. What's your question? This is another example of what I described in comment #57. > What's your question?
> This is another example of what I described in comment #57.
Apologies. I'm a little confused by this statement: "The kernel responds to path lookups, aka. /u/config". Where is /u/config set in order for the kernel to respond? Same question for /u/.cargo.
For example running:
find /etc -type f -exec fgrep /u/config '{}' ';' -print
or
find /etc -type f -exec fgrep /u/\.cargo '{}' ';' -print
no results.
FWIW, I have yet to see the original error: automount[xxxx]: add_host_addrs: hostname lookup failed: No address associated with hostname
I also added the new RPM to another server so far only this error:
Mar 5 10:47:33 ourserver automount[31690]: lookup(file): failed to read included master map auto.master
I've also enabled autofs-debuginfo-5.1.4-6.fc27.ppc64.rpm but if there's no crash, there's nothing I can run GDB on, correct?
(In reply to RobbieTheK from comment #60) > > What's your question? > > This is another example of what I described in comment #57. > > Apologies. I'm a little confused by this statement: "The kernel responds to > path lookups, aka. /u/config". Where is /u/config set in order for the > kernel to respond? Same question for /u/.cargo. > > For example running: > find /etc -type f -exec fgrep /u/config '{}' ';' -print > > or > > find /etc -type f -exec fgrep /u/\.cargo '{}' ';' -print > > no results. When an application attempts to access a path, eg. /u/config, the kernel walks along the path one component at a time. When it finds a component that is marked as a possible automount it calls back to the daemon to attempt to perform a mount. When the daemon receives the path component (eg config) it tries to match it to a map entry key. In the cases here the wildcard map entry: * -rw,nolock &:/home/users matches the component so autoumout tries to mount it eventually resulting in the error you see. The kernel can't know if the path component is valid or not, it's obligated to call back to the daemon, and the wildcard map entry matching tells the daemon it might be valid so it must try and mount it. This type of false positive matching has always been a problem with using the wildcard map entry. There's not really anything I can do about it. Ian autofs-5.1.4-6.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report. autofs-5.1.4-6.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-135356546d This message is a reminder that Fedora 27 is nearing its end of life. On 2018-Nov-30 Fedora will stop maintaining and issuing updates for Fedora 27. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '27'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 27 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed. |