Description of problem: Fully patched server last night, after which complaints begun in messages about automount failing to lookup hostnames. No changes were made to the automount configuration. The mounts are working but not always and when the fail the message below appears in the logs. I suspect the GHOST patch but have no evidence for that. Jan 29 11:54:02 SERVER automount[3533]: add_host_addrs:1037: hostname lookup failed: Unknown host All automount configuration is local files (i.e. no LDAP). NSCD is running as well. Mounts are NFSv4. Version-Release number of selected component (if applicable): autofs-5.0.1-0.rc2.184.el5 glibc-2.5-123.el5_11 How reproducible: Patch server, reboot, view automount paths and /var/log/messages. Steps to Reproduce: 1. yum -y update 2. init 6 Actual results: Jan 29 11:54:02 SERVER automount[3533]: add_host_addrs:1037: hostname lookup failed: Unknown host Expected results: No complaints and the mount to be successful. Additional info: We have 4 servers patched at the same time last night, all 4 are exhibiting this behaviour. We have another 4 servers (identical automount setup) which are not patched and do not exhibit this behaviour.
(In reply to MarkS from comment #0) > Description of problem: > Fully patched server last night, after which complaints begun in messages > about automount failing to lookup hostnames. No changes were made to the > automount configuration. The mounts are working but not always and when the > fail the message below appears in the logs. I suspect the GHOST patch but > have no evidence for that. > > Jan 29 11:54:02 SERVER automount[3533]: add_host_addrs:1037: hostname lookup > failed: Unknown host > > All automount configuration is local files (i.e. no LDAP). NSCD is running > as well. Mounts are NFSv4. > > Version-Release number of selected component (if applicable): > autofs-5.0.1-0.rc2.184.el5 > glibc-2.5-123.el5_11 > > How reproducible: > Patch server, reboot, view automount paths and /var/log/messages. Patched from what revision of RHEL and autofs?
RHEL was 5.11 at the time autofs did not get patched, so it is as above: autofs-5.0.1-0.rc2.184.el5.x86_64 glibc was previously: glibc-common-2.5-123.x86_64 glibc-2.5-123.x86_64 glibc-headers-2.5-123.x86_64 glibc-devel-2.5-123.x86_64 glibc-2.5-123.i686 glibc-devel-2.5-123.i386 The complete list of patches applied were: Jan 28 17:37:19 Updated: nfs4-acl-tools-0.3.3-3.el5.x86_64 Jan 28 18:09:47 Updated: glibc-common-2.5-123.el5_11.1.x86_64 Jan 28 18:09:52 Updated: glibc-2.5-123.el5_11.1.x86_64 Jan 28 18:09:52 Updated: openssl-0.9.8e-32.el5_11.x86_64 Jan 28 18:09:52 Updated: nss_db-2.2-38.el5_11.x86_64 Jan 28 18:09:52 Updated: 1:cups-libs-1.3.7-32.el5_11.x86_64 Jan 28 18:09:53 Updated: nscd-2.5-123.el5_11.1.x86_64 Jan 28 18:09:54 Updated: subscription-manager-1.11.3-14.el5_11.x86_64 Jan 28 18:09:54 Updated: glibc-headers-2.5-123.el5_11.1.x86_64 Jan 28 18:09:55 Updated: glibc-devel-2.5-123.el5_11.1.x86_64 Jan 28 18:09:56 Updated: openssl-devel-0.9.8e-32.el5_11.x86_64 Jan 28 18:09:57 Updated: glibc-2.5-123.el5_11.1.i686 Jan 28 18:09:57 Updated: openssl-0.9.8e-32.el5_11.i686 Jan 28 18:09:57 Updated: 1:cups-libs-1.3.7-32.el5_11.i386 Jan 28 18:09:57 Updated: nss_db-2.2-38.el5_11.i386 Jan 28 18:09:58 Updated: openssl-devel-0.9.8e-32.el5_11.i386 Jan 28 18:09:58 Updated: glibc-devel-2.5-123.el5_11.1.i386 Which is why I am guessing its the glibc patch that has caused this behaviour.
(In reply to MarkS from comment #2) > RHEL was 5.11 at the time snip ... > > Which is why I am guessing its the glibc patch that has caused this > behaviour. Yeah, you'd think that this might indicate the problem is with the glibc update: * Mon Jan 19 2015 Siddhesh Poyarekar <siddhesh> - 2.5-123.1 - Fix parsing of numeric hosts in gethostbyname_r (CVE-2015-0235, #1183532). and the code that's now failing is a call to gethostbyname_r() in autofs. We probably should pass this on to the glibc folks.
What error and h_errno does gethostbyname_r return? Also, what hostname is passed to the function? It would be really helpful if this is narrowed down to a simple reproducer that just uses gethostbyname_r.
(In reply to Siddhesh Poyarekar from comment #5) > What error and h_errno does gethostbyname_r return? Also, what hostname is > passed to the function? It would be really helpful if this is narrowed down > to a simple reproducer that just uses gethostbyname_r. The customer won't know how to get that information. I could add a logging statement at the failure point and provide a package to the customer to get that information if you really want it? Would that be OK with you MarkS? The autofs code here is very old and I do remember something like this a long time ago but I can't find any info on it. I'll have another look but don't hold much hope, it was just too long ago. Ian
(In reply to Ian Kent from comment #6) > I could add a logging statement at the failure point and provide > a package to the customer to get that information if you really > want it? That, or instrument gethostbyname_r and log that information to a file. The other alternative is to give an idea of the setup so that I can reproduce the behaviour locally and find out myself what the errors are. There's not a lot of information to go on otherwise.
(In reply to Siddhesh Poyarekar from comment #7) > (In reply to Ian Kent from comment #6) > > I could add a logging statement at the failure point and provide > > a package to the customer to get that information if you really > > want it? > > That, or instrument gethostbyname_r and log that information to a file. The > other alternative is to give an idea of the setup so that I can reproduce > the behaviour locally and find out myself what the errors are. There's not > a lot of information to go on otherwise. I'm not sure that an autofs debug log will tell us what the host name is for the failure but we need to look at it to check.
I am wondering if this report maybe in error. We have not got the servers in the active pool at this moment and they are not exhibiting the behaviour no matter what I attempt with regards to automount paths. Needless to say that suggests something else could be the source of my issue. These are all virtual systems and we have been experiencing some performance related issues with regards to our virtual infrastructure. I have scheduled to put the servers back into service next Wednesday (11th Feb). I think it is appropriate to hold off on any work and we will see if it reoccurs. If it does then I will run automount verbose in foreground to see if it provides anymore information.
Thanks, I'll keep a needinfo on you, which you can clear once you have concluded your verification.
It seems that the initial problem was indeed external to the VMs. I have just put them back in service and they are performing as expected with no errors regarding autofs. I would be happy to have this closed as invalid.
Thanks for letting us know, ;)