From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040803 Description of problem: It seems that somewhere between glibc-2.3.2-95.3 and glibc-2.3.2-95.20, the return value of gethostbyname_r, when passed an unknown host, changed. When the sample code is run on a machine with glibc-2.3.2-95.3 with an unknown host, gethostbyname_r will return a non-zero number indicating an error (per the man page). If I run the same thing on a machine with glibc-2.3.2-95.20 with an unknown host, then gethostbyname_r now returns 0 (which according to the man page means success). Version-Release number of selected component (if applicable): glibc-2.3.2-95.20 How reproducible: Always Steps to Reproduce: 1. Compile the attached code "gcc -g -Wall get_host.c -o get_host" 2. Type "./get_host thisisnotahost" 3. Try running the test with a "host:" entry in nsswitch.conf of a) files dns b) files c) dns Actual Results: On a machine with glibc-2.3.2-95.20 you get the following: # ./get_host thisisnotahost gethostbyname_r returned 0 but also a NULL he. my_errno = 2, str = 'Host name lookup failure' Expected Results: On a machine with glibc-2.3.2-95.3 you get the following: # ./get_host thisisnotahost gethostbyname error for host: thisisnotahost: Host name lookup failure Additional info: If I change the "host:" entry to "files dns nis" and re-run the command on a glibc-2.3.2-95.20 box then you will get the expected output of: # ./get_host thisisnotahost gethostbyname error for host: thisisnotahost: Host name lookup failure
Created attachment 106519 [details] Code to reproduce problem
An unknown host name is no error. The return value must be zero. Success of the lookup must be tested by looking at the pointer used for the return value. If it's NULL, the name is unknown.
Can we get the man page for gethostbyname updated then? Or should I submit a separate bug for that? Currently it says the following: Glibc2 also has reentrant versions gethostbyname_r() and gethostbyname2_r(). These return 0 on success and nonzero on error. The result of the call is now stored in the struct with address ret. After the call, *result will be NULL on error or point to the result on success. Auxiliary data is stored in the buffer buf of length buflen. (If the buffer is too small, these functions will return ERANGE.) No global variable h_errno is modified, but the address of a variable in which to store error numbers is passed in h_errnop. Reading that section, it would seem to imply that result being NULL implies an error but if we get an error than the return code should be nonzero. If it said something like: After the call, *result will be NULL on error or point to the result on success (which can be NULL if the host is not found). I think it would be less ambiguous. I still feel that this is a bug (or at least there is a bug here) though for the following reasons: 1) After calling this with an unknown hostname, there is a correct errno value set in *h_errnop which implies to me that an error happened. If an error happened then the function should return a non-zero error code. 2) The man page lists HOST_NOT_FOUND = "The specified host is unknown" in the ERRORS section. 3) The behavior of this function changes depending on whether or not you use "files dns" or "files dns nis" in your nsswitch.conf file. It should present a consistent return code throughout. On a side note, the reason we noticed this at my company was because we converted our gethostbyname calls to gethostbyname_r. In the process of doing it, we were only testing the return code for error and were accidentally using the hostent structure in (struct hostent *ret) instead of correctly using the **result value. Before the glibc change, everything still worked correctly since it returned a nonzero code for an unknown host. After the glibc change though, querying an unknown host was returning a hostent structure associated with the last host in your /etc/hosts file (and since we were only check the return code, we were using that value). This was clearly not the behavior our customers expected. Obviously that was a bug in our code because we were using this version of gethostbyname_r incorrectly. I had already fixed it to work as I thought you intended it to work (which you corroborated in your comment above) so that's good. Given that most of the various Unix/Linux platforms have completely different implementations of this function though I'd want to make sure that the documentation was as clear as possible. Thanks for your time and your quick response on this issue though.
Reassigned to man-pages.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0408.html