While fixing httpd Bug 954007, I have found out that getaddrinfo returns EAI_SYSTEM error, but errno is set to 0. This looks suspicious to me and I think this is getaddrinfo (or more low-level) bug. For the description of configuration for which, please check the description of Bug 954007. Note that I'm not able to reproduce it myself, but the reporter of Bug 954007 is. In APR (library that httpd uses) code, you can see this particular getaddrinfo call at line 365: http://svn.apache.org/viewvc/apr/apr/trunk/network_io/unix/sockaddr.c?revision=1083931&view=markup#l365 errno returned at line 378 is 0 for the original reporter.
Possibly related: https://bugzilla.redhat.com/show_bug.cgi?id=958934
Could you try this with an isolated reproducer on the system where you're able to replicate the problem. By isolated I mean a program that does nothing other than getaddrinfo and checks the return codes (errno and return value).
Zbigniew, can you please compile attached source code using: gcc addrinfo.c -o addrinfo Then run it like "./addrinfo" Try it on machine where the httpd crashed for you and paste the output here, please.
Created attachment 745981 [details] addrinfo.c
(Original installation with #954007) error: 0 -11 0 error: 2 -2 0 error: 10 -11 0 (Second container with #958934) error: 0 -2 0 error: 2 -2 0 error: 10 -2 0
Siddhesh, as you see it returns EAI_SYSTEM (-11) error with errno 0 for Zbigniew: fprintf(stderr, "error: %d %d %d\n", family, error, errno); error: 0 -11 0 error: 2 -2 0 error: 10 -11 0 Zbigniew, please keep the installation with #954007. I think Siddhesh could have additional questions about environment later.
Thanks, the problem seems to be the same as: http://sourceware.org/bugzilla/show_bug.cgi?id=15339 for which I already have a fix. Zbingniew, would you be able to install and test a scratch package?
This is the scratch build to test: http://koji.fedoraproject.org/koji/taskinfo?taskID=5371240
(In reply to comment #8) > http://koji.fedoraproject.org/koji/taskinfo?taskID=5371240 I installed glibc-common-2.17-7.fc19.0.test.x86_64, glibc-2.17-7.fc19.0.test.x86_64, glibc-debuginfo-common-2.17-7.fc19.0.test.x86_64, glibc-debuginfo-2.17-7.fc19.0.test.x86_64, since I don't have the other packages in the build. I don't see any change: error: 0 -11 0 error: 2 -2 0 error: 10 -11 0 Also, AFAICT, the network in my container is working fine: I have an IP and a route and three reachable nameservers in /etc/resolv.conf, yum downloads packages...
OK, thanks for testing that. Can you run that program under an strace and attach the results? Also, I'd like to know how you've set nsswitch.conf and if you have nscd running. If nscd is running, then please keep it disabled whenever you're doing these tests. Use this strace command: strace -xvv -s 255 ./addrinfo The strace may contain confidential information about your network (dns servers, network configuration, etc.) so I hope you can at least send it to me personally, if not attached to the bug report.
(In reply to comment #10) > Also, I'd like to know how you've set nsswitch.conf and > if you have nscd running. I don't have nscd running. OK, I think I found the culprit: nss-myhostname. I had an old version installed which was broken (missing linking symbol). If I remove myhostname from /etc/nsswitch.conf, I get the following results from addrinfo: error: 0 -2 0 error: 2 -2 0 error: 10 -2 0 Sorry guys for that, it seems to be entirely my fault. I'll attach nsswitch.conf and the straces just in case.
Created attachment 747875 [details] strace with -11 return code
Created attachment 747876 [details] strace with -2 return code
Created attachment 747877 [details] nsswitch.conf
Thanks, could you elaborate on what exactly was wrong with nss-myhostname? I'd like to try and replicate it to make sure that it's not something that glibc ought to have handled. The strace does not show any errors in actually reading the plugin file.
> what exactly was wrong with nss-myhostname? systemd bug fixed in http://cgit.freedesktop.org/systemd/systemd/commit/?id=1e335af70: the .so file was wanting a symbol (log_<something>) which couldn't be resolved and the module could not be loaded. Should be trivial to recreate by adding whatever function call when the function is not defined in any of the libraries. > make sure that it's not something that glibc ought to have handled If anything should be changed, I think that glibc is the only place. The module wasn't even loaded, so it has nothing to say in this matter.
Even easier to reproduce - just move libnss_myhostname.so and put myhostname in nsswitch.conf. I'll take this because this is related to upstream 15339. As I had feared, the fix is not complete.
Please try this build once it is done. My local testing seems to indicate that this is fixed: http://koji.fedoraproject.org/koji/taskinfo?taskID=5384387
(In reply to comment #18) > http://koji.fedoraproject.org/koji/taskinfo?taskID=5384387 error: 0 -2 0 error: 2 -2 0 error: 10 -2 0 > My local testing seems to indicate that this is fixed: So it seems.
The patch is upstream and will make it into rawhide with the next rebase. Do you need an f19 backport? commit 3d04f5db20c8f0d1ba3881b5f5373586a18cf188 Author: Siddhesh Poyarekar <siddhesh> Date: Tue May 21 21:54:41 2013 +0530 Set EAI_SYSTEM only when h_errno is NETDB_INTERNAL Fixes BZ #15339. NSS_STATUS_UNAVAIL may mean that a necessary input resource is not available. This could occur in a number of cases including when the network is down, system runs out of file descriptors, etc. The correct differentiator in such a case is the h_errno, which gives the nature of failure. In case of failures other than a simple 'not found', we set h_errno as NETDB_INTERNAL and let errno be the identifier for the exact error.
i was hoping you will fix that in F19, otherwise I will have to backport APR patch (it's upstream too), which is not a problem, but the real bug is in glibc and there could be other projects using the glibc the way httpd does.
glibc-2.17-11.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/glibc-2.17-11.fc19
Package glibc-2.17-11.fc19: * should fix your issue, * was pushed to the Fedora 19 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing glibc-2.17-11.fc19' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2013-11737/glibc-2.17-11.fc19 then log in and leave karma (feedback).
glibc-2.17-11.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.