1187165 – Autofs fails with add_host_addrs periodically

Bug 1187165 - Autofs fails with add_host_addrs periodically

Summary: Autofs fails with add_host_addrs periodically

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 5
Classification:	Red Hat
Component:	glibc
Sub Component:
Version:	5.11
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Carlos O'Donell
QA Contact:	qe-baseos-tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-01-29 13:10 UTC by MarkS
Modified:	2016-11-24 12:27 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-02-11 10:53:41 UTC
Target Upstream Version:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description MarkS 2015-01-29 13:10:06 UTC

Description of problem:
Fully patched server last night, after which complaints begun in messages about automount failing to lookup hostnames. No changes were made to the automount configuration. The mounts are working but not always and when the fail the message below appears in the logs. I suspect the GHOST patch but have no evidence for that.

Jan 29 11:54:02 SERVER automount[3533]: add_host_addrs:1037: hostname lookup failed: Unknown host

All automount configuration is local files (i.e. no LDAP). NSCD is running as well. Mounts are NFSv4.

Version-Release number of selected component (if applicable):
autofs-5.0.1-0.rc2.184.el5
glibc-2.5-123.el5_11

How reproducible:
Patch server, reboot, view automount paths and /var/log/messages.

Steps to Reproduce:
1. yum -y update
2. init 6

Actual results:
Jan 29 11:54:02 SERVER automount[3533]: add_host_addrs:1037: hostname lookup failed: Unknown host

Expected results:
No complaints and the mount to be successful.

Additional info:
We have 4 servers patched at the same time last night, all 4 are exhibiting this behaviour. We have another 4 servers (identical automount setup) which are not patched and do not exhibit this behaviour.

Comment 1 Ian Kent 2015-02-02 05:19:16 UTC

(In reply to MarkS from comment #0)
> Description of problem:
> Fully patched server last night, after which complaints begun in messages
> about automount failing to lookup hostnames. No changes were made to the
> automount configuration. The mounts are working but not always and when the
> fail the message below appears in the logs. I suspect the GHOST patch but
> have no evidence for that.
> 
> Jan 29 11:54:02 SERVER automount[3533]: add_host_addrs:1037: hostname lookup
> failed: Unknown host
> 
> All automount configuration is local files (i.e. no LDAP). NSCD is running
> as well. Mounts are NFSv4.
> 
> Version-Release number of selected component (if applicable):
> autofs-5.0.1-0.rc2.184.el5
> glibc-2.5-123.el5_11
> 
> How reproducible:
> Patch server, reboot, view automount paths and /var/log/messages.

Patched from what revision of RHEL and autofs?

Comment 2 MarkS 2015-02-02 09:31:36 UTC

RHEL was 5.11 at the time

autofs did not get patched, so it is as above:
 autofs-5.0.1-0.rc2.184.el5.x86_64

glibc was previously:
 glibc-common-2.5-123.x86_64
 glibc-2.5-123.x86_64
 glibc-headers-2.5-123.x86_64
 glibc-devel-2.5-123.x86_64
 glibc-2.5-123.i686
 glibc-devel-2.5-123.i386

The complete list of patches applied were:
Jan 28 17:37:19 Updated: nfs4-acl-tools-0.3.3-3.el5.x86_64
Jan 28 18:09:47 Updated: glibc-common-2.5-123.el5_11.1.x86_64
Jan 28 18:09:52 Updated: glibc-2.5-123.el5_11.1.x86_64
Jan 28 18:09:52 Updated: openssl-0.9.8e-32.el5_11.x86_64
Jan 28 18:09:52 Updated: nss_db-2.2-38.el5_11.x86_64
Jan 28 18:09:52 Updated: 1:cups-libs-1.3.7-32.el5_11.x86_64
Jan 28 18:09:53 Updated: nscd-2.5-123.el5_11.1.x86_64
Jan 28 18:09:54 Updated: subscription-manager-1.11.3-14.el5_11.x86_64
Jan 28 18:09:54 Updated: glibc-headers-2.5-123.el5_11.1.x86_64
Jan 28 18:09:55 Updated: glibc-devel-2.5-123.el5_11.1.x86_64
Jan 28 18:09:56 Updated: openssl-devel-0.9.8e-32.el5_11.x86_64
Jan 28 18:09:57 Updated: glibc-2.5-123.el5_11.1.i686
Jan 28 18:09:57 Updated: openssl-0.9.8e-32.el5_11.i686
Jan 28 18:09:57 Updated: 1:cups-libs-1.3.7-32.el5_11.i386
Jan 28 18:09:57 Updated: nss_db-2.2-38.el5_11.i386
Jan 28 18:09:58 Updated: openssl-devel-0.9.8e-32.el5_11.i386
Jan 28 18:09:58 Updated: glibc-devel-2.5-123.el5_11.1.i386

Which is why I am guessing its the glibc patch that has caused this behaviour.

Comment 3 Ian Kent 2015-02-02 11:09:26 UTC

(In reply to MarkS from comment #2)
> RHEL was 5.11 at the time

snip ...

> 
> Which is why I am guessing its the glibc patch that has caused this
> behaviour.

Yeah, you'd think that this might indicate the problem is
with the glibc update:
* Mon Jan 19 2015 Siddhesh Poyarekar <siddhesh> - 2.5-123.1
- Fix parsing of numeric hosts in gethostbyname_r (CVE-2015-0235, #1183532).

and the code that's now failing is a call to gethostbyname_r() in
autofs.

We probably should pass this on to the glibc folks.

Comment 4 Ian Kent 2015-02-05 05:29:19 UTC

(In reply to MarkS from comment #2)
> RHEL was 5.11 at the time

snip ...

> 
> Which is why I am guessing its the glibc patch that has caused this
> behaviour.

Yeah, you'd think that this might indicate the problem is
with the glibc update:
* Mon Jan 19 2015 Siddhesh Poyarekar <siddhesh> - 2.5-123.1
- Fix parsing of numeric hosts in gethostbyname_r (CVE-2015-0235, #1183532).

and the code that's now failing is a call to gethostbyname_r() in
autofs.

We probably should pass this on to the glibc folks.

Comment 5 Siddhesh Poyarekar 2015-02-05 07:15:37 UTC

What error and h_errno does gethostbyname_r return?  Also, what hostname is passed to the function?  It would be really helpful if this is narrowed down to a simple reproducer that just uses gethostbyname_r.

Comment 6 Ian Kent 2015-02-06 00:27:56 UTC

(In reply to Siddhesh Poyarekar from comment #5)
> What error and h_errno does gethostbyname_r return?  Also, what hostname is
> passed to the function?  It would be really helpful if this is narrowed down
> to a simple reproducer that just uses gethostbyname_r.

The customer won't know how to get that information.

I could add a logging statement at the failure point and provide
a package to the customer to get that information if you really
want it?

Would that be OK with you MarkS?

The autofs code here is very old and I do remember something like
this a long time ago but I can't find any info on it. I'll have
another look but don't hold much hope, it was just too long ago.

Ian

Comment 7 Siddhesh Poyarekar 2015-02-06 02:36:58 UTC

(In reply to Ian Kent from comment #6)
> I could add a logging statement at the failure point and provide
> a package to the customer to get that information if you really
> want it?

That, or instrument gethostbyname_r and log that information to a file.  The other alternative is to give an idea of the setup so that I can reproduce the behaviour locally and find out myself what the errors are.  There's not a lot of information to go on otherwise.

Comment 8 Ian Kent 2015-02-06 02:58:16 UTC

(In reply to Siddhesh Poyarekar from comment #7)
> (In reply to Ian Kent from comment #6)
> > I could add a logging statement at the failure point and provide
> > a package to the customer to get that information if you really
> > want it?
> 
> That, or instrument gethostbyname_r and log that information to a file.  The
> other alternative is to give an idea of the setup so that I can reproduce
> the behaviour locally and find out myself what the errors are.  There's not
> a lot of information to go on otherwise.

I'm not sure that an autofs debug log will tell us what the
host name is for the failure but we need to look at it to check.

Comment 9 MarkS 2015-02-06 09:32:26 UTC

I am wondering if this report maybe in error.

We have not got the servers in the active pool at this moment and they are not exhibiting the behaviour no matter what I attempt with regards to automount paths. Needless to say that suggests something else could be the source of my issue. These are all virtual systems and we have been experiencing some performance related issues with regards to our virtual infrastructure.

I have scheduled to put the servers back into service next Wednesday (11th Feb).

I think it is appropriate to hold off on any work and we will see if it reoccurs. If it does then I will run automount verbose in foreground to see if it provides anymore information.

Comment 10 Siddhesh Poyarekar 2015-02-06 09:43:14 UTC

Thanks, I'll keep a needinfo on you, which you can clear once you have concluded your verification.

Comment 11 MarkS 2015-02-11 09:03:58 UTC

It seems that the initial problem was indeed external to the VMs. I have just put them back in service and they are performing as expected with no errors regarding autofs.

I would be happy to have this closed as invalid.

Comment 12 Ian Kent 2015-02-11 10:53:41 UTC

Thanks for letting us know, ;)

Note You need to log in before you can comment on or make changes to this bug.