Bug 459851 - glibc-2.5-24 x86_64 libresolv fails on /etc/hosts with long lines
Summary: glibc-2.5-24 x86_64 libresolv fails on /etc/hosts with long lines
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: mysql
Version: 5.5
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Tom Lane
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-08-23 00:00 UTC by Jon Jensen
Modified: 2013-07-03 03:19 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-03-06 03:53:30 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
strace with problematic /etc/hosts that causes failing name resolution, see around line 1109 (115.08 KB, text/plain)
2008-08-23 00:00 UTC, Jon Jensen
no flags Details
strace with ok /etc/hosts that shows name resolution succeeding (112.32 KB, text/plain)
2008-08-23 00:00 UTC, Jon Jensen
no flags Details
The /etc/hosts file that causes the failure (2.22 KB, text/plain)
2010-04-08 04:50 UTC, Jon Jensen
no flags Details
An /etc/hosts file that is similar but has no very long line and works (67.07 KB, text/plain)
2010-04-08 04:50 UTC, Jon Jensen
no flags Details
Perl program to demonstrate the resolve failure with DBD::mysql (157 bytes, text/plain)
2010-04-08 04:51 UTC, Jon Jensen
no flags Details
strace of failure (66.90 KB, text/plain)
2010-04-08 04:51 UTC, Jon Jensen
no flags Details
strace of successful run (67.07 KB, text/plain)
2010-04-08 04:52 UTC, Jon Jensen
no flags Details
strace of getent hosts run that succeeded (10.42 KB, text/plain)
2010-04-08 04:52 UTC, Jon Jensen
no flags Details

Description Jon Jensen 2008-08-23 00:00:07 UTC
Created attachment 314850 [details]
strace with problematic /etc/hosts that causes failing name resolution, see around line 1109

We have an /etc/hosts with total size 15409 bytes, 19 lines, longest line is 3738 bytes.

With that /etc/hosts file in place, the MySQL client libraries (used through Perl DBD::mysql) will fail to resolve anything, including DNS names not in /etc/hosts. The process did not segfault, it just confusingly claims there is no such hostname.

Strace makes us suspect this is due to libresolv in glibc failing. When the /etc/hosts lines are shortened (even if a huge overall /etc/hosts of 1.9 MB is in place, but has only short lines), name resolution by the MySQL client works.

Version:

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.2 (Tikanga)
# rpm -q glibc
glibc-2.5-24

How reproducible: Every time.

A possibly related bug was filed by someone with Ubuntu here: https://bugs.launchpad.net/ubuntu/+source/glibc/+bug/130693

It appears that a bug around long lines in /etc/hosts was fixed back in RHEL 3:

https://bugzilla.redhat.com/show_bug.cgi?id=140378
http://sources.redhat.com/ml/libc-hacker/2004-11/msg00058.html

... but either the bug has crept back in from upstream glibc, or it's a different bug with similar effects.

Comment 1 Jon Jensen 2008-08-23 00:00:50 UTC
Created attachment 314851 [details]
strace with ok /etc/hosts that shows name resolution succeeding

Comment 2 Denys Vlasenko 2008-08-28 09:06:17 UTC
(In reply to comment #0)
> We have an /etc/hosts with total size 15409 bytes, 19 lines, longest line is
> 3738 bytes.

Can you attach your /etc/hosts? If you can't attach it as-is because you don't want people to see your hostnames, replace them with some semi-random names.

Comment 3 Jakub Jelinek 2008-09-16 14:43:08 UTC
I've tried to reproduce this, but haven't succeeded with an over 16KB /etc/hosts with ~ 4KB longest line.  We really need your /etc/hosts, perhaps mangled in some way to hide the original host names or IPs, but with the same number of chars, different hostnames, etc.
Also, do you use nscd or not?  Can you reproduce it with simple getent hosts XXX
or getent ahosts XXX ?

Comment 4 Jon Jensen 2010-04-08 04:49:36 UTC
Ok, sorry I forgot about this for so long. I no longer had the original /etc/hosts and had to recreate the problem on a different server.

It still happens on RHEL 5.5 the same way.

# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 5.5 (Tikanga)
# rpm -q glibc
glibc-2.5-49
glibc-2.5-49

I wasn't using nscd before, or now.

"getent hosts corndog" does not cause the problem. (But it doesn't really seem to resolve the address per se, either; it just dumps the matching line from /etc/hosts.)

I will attach all the files I used. Please let me know if you have any trouble reproducing the problem.

Comment 5 Jon Jensen 2010-04-08 04:50:23 UTC
Created attachment 405174 [details]
The /etc/hosts file that causes the failure

Comment 6 Jon Jensen 2010-04-08 04:50:53 UTC
Created attachment 405175 [details]
An /etc/hosts file that is similar but has no very long line and works

Comment 7 Jon Jensen 2010-04-08 04:51:32 UTC
Created attachment 405176 [details]
Perl program to demonstrate the resolve failure with DBD::mysql

Comment 8 Jon Jensen 2010-04-08 04:51:50 UTC
Created attachment 405177 [details]
strace of failure

Comment 9 Jon Jensen 2010-04-08 04:52:16 UTC
Created attachment 405178 [details]
strace of successful run

Comment 10 Jon Jensen 2010-04-08 04:52:35 UTC
Created attachment 405179 [details]
strace of getent hosts run that succeeded

Comment 11 Jon Jensen 2010-04-08 04:54:28 UTC
Oh, and here's what the Perl test script runs look like.

A successful name resolution (but failed MySQL connection because the connection information is bogus):

$ ./resolve-test.pl
DBI connect('database=somedb;host=corndog','someuser',...) failed: Can't connect to MySQL server on 'corndog' (111) at ./resolve-test.pl line 7

A failed name resolution with the fat hosts file:

$ ./resolve-test.pl
DBI connect('database=somedb;host=corndog','someuser',...) failed: Unknown MySQL server host 'corndog' (-1) at ./resolve-test.pl line 7

Comment 12 Jeff Law 2012-01-31 03:25:43 UTC
Reassigning to mysql component.

As far as I can tell this is a problem with mysql-libs.

F14 exhibits this problem and the problem persists regardless of what version of glibc or perl modules are installed.  However, if mysql-libs is updated mysql-5.5.8-10 from F15 the the test works.

A failing test will report something like this:


DBI connect('database=somedb;host=corndog','someuser',...) failed: Unknown MySQL server host 'corndog' (-1) at /tmp/test line 7

Note the "Unknown MySQL server ..."


A succeeding test will (after a long wait) report something like this:

DBI connect('database=somedb;host=corndog','someuser',...) failed: Can't connect to MySQL server on 'corndog' (110) at /tmp/test line 7

Note it was unable to connect.

ps.  Make sure your nssswitch.conf only uses files for host lookups...

Comment 13 Jon Jensen 2012-01-31 03:45:45 UTC
Jeff, very interesting find. Your conclusion seems sound. I'm not using any system with this combination of MySQL + long /etc/hosts anymore, so I guess I'll just say happy day once the latest mysql-libs is everywhere so people won't run into the bug anymore. :) Thanks for the update.

Comment 14 Tom Lane 2012-01-31 04:41:48 UTC
Hm.  While I'm not looking at the mysql code at the moment, it wouldn't surprise me a bit if they had some hand-rolled code in there instead of using libresolv at all.  Jon, had you seen failures with the long /etc/hosts file and any component *other* than mysql?

> I'll just say happy day once the latest mysql-libs is everywhere

That's gonna be a long time as far as RHEL is concerned :-(

Comment 15 Tom Lane 2012-01-31 19:42:19 UTC
I poked around in the mysql 5.1.x sources and found that the "Unknown MySQL server" error is issued if gethostbyname_r() fails, entirely independently of what the actual errno is.  5.5.x has replaced that whole code sequence with a getaddrinfo call, which probably explains the difference in behavior.

Eyeballing the gethostbyname_r() call, my attention is drawn to the buf/buflen arguments.  The gethostbyname_r man page says that it will return ERANGE if the buffer is "too small", which would fit the reported symptom, but nowhere is it suggested what "too small" might be.  mysql 5.1.x is using a fixed buffer size, which is either sizeof(struct hostent_data) or 2048 depending on a nest of #ifdef's that I don't feel like deciphering right now.  If gethostbyname_r() is expecting to fit a line of /etc/hosts into that buffer, then I think we have our explanation.  Anybody know that code offhand?

Comment 16 Jeroen van Bemmel 2012-07-04 20:55:44 UTC
I recently found and fixed a bug in x86_64 glibc which sounds somewhat similar to what is being described here: http://sourceware.org/bugzilla/show_bug.cgi?id=14307

The root cause there was also an ERANGE error, returned because the initial temporary buffer tried was too small (512 bytes, of which 400 were used for some internal struct)

Did you verify if this problem indeed only occurs for x86_64 and not for 32-bit x86? If so, the solution could well be to increase the fixed buffer size used by mysql

Comment 17 Tom Lane 2013-03-06 03:53:30 UTC
Since RHEL5 is now in maintenance mode, this bug is not going to get fixed there. AFAICT newer versions of mysql don't have the issue.


Note You need to log in before you can comment on or make changes to this bug.