Bug 1330973

Summary: getaddrinfo hangs for 25 seconds if ipv6 is disabled
Product: Red Hat Enterprise Linux 7 Reporter: Krzysztof Pawłowski <krzysztof.pawlowski>
Component: systemdAssignee: systemd-maint
Status: CLOSED ERRATA QA Contact: Robin Hack <rhack>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.3CC: ashankar, bblaskov, dkochuka, dmoessne, fweimer, jsynacek, krzysztof.pawlowski, mnewsome, pfrankli, rhack, systemd-maint-list
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: systemd-219-22.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-04 00:53:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Krzysztof Pawłowski 2016-04-27 11:37:32 UTC
Description of problem:

Below code needs 25 seconds to execute when ipv6 is disabled. Problem appeared after upgrade to latest version of glibc. Problem occur only when query is about hostname.

time python -c 'import socket; print socket.getaddrinfo("'`hostname -f`'", None, socket.AF_INET6)'
[(10, 1, 6, '', ('::1', 0, 0, 0)), (10, 2, 17, '', ('::1', 0, 0, 0)), (10, 3, 0, '', ('::1', 0, 0, 0))]

real    0m25.078s
user    0m0.029s
sys     0m0.007s

Different domain than hostname:

time python -c 'import socket; print socket.getaddrinfo("nask.pl", None, socket.AF_INET6)'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
socket.gaierror: [Errno -2] Name or service not known

real    0m0.069s
user    0m0.046s
sys     0m0.013s


Version-Release number of selected component (if applicable):
glibc-2.17-106.el7_2.4.x86_64

How reproducible:
Always

Steps to Reproduce:
1. sysctl net.ipv6.conf.all.disable_ipv6=1
2. sysctl net.ipv6.conf.default.disable_ipv6=1
3. remove ::1 line from /etc/hosts
4. python -c 'import socket; print socket.getaddrinfo("'`hostname -f`'", None,socket.AF_INET6)'

Actual results:
Hang for 25 seconds

Expected results:
Return immediately

Additional info:
Problem can be workarounded by adding below line to /etc/hosts:
::1 <hostname>

Comment 1 Florian Weimer 2016-04-27 11:55:15 UTC
I can reproduce this with the instructions provided.  The hang is actually in nss_myhostname, which is provided by systemd:

#0  0x00007ffff6e1aca9 in ppoll () from /lib64/libc.so.6
#1  0x00007ffff7e5d1af in sd_rtnl_call.constprop.10 () from /lib64/libnss_myhostname.so.2
#2  0x00007ffff7e5ef72 in local_addresses.constprop.4 () from /lib64/libnss_myhostname.so.2
#3  0x00007ffff7e60717 in _nss_myhostname_gethostbyname3_r () from /lib64/libnss_myhostname.so.2
#4  0x00007ffff7e609e4 in _nss_myhostname_gethostbyname2_r () from /lib64/libnss_myhostname.so.2
#5  0x00007ffff6e0b612 in gaih_inet () from /lib64/libc.so.6
#6  0x00007ffff6e0e86d in getaddrinfo () from /lib64/libc.so.6

Krzysztof, you said that the problem appeared after a glibc upgrade.  I tried downgrading to glibc-2.17-106.el7_2.1.x86_64, and the problem persist.

Did you pick up a systemd update at the same time as you upgraded glibc?  Thanks.

Comment 2 Krzysztof Pawłowski 2016-04-27 12:46:19 UTC
I did only:
yum update glibc

Systemd version on broken host:
systemd-libs-219-19.el7.x86_64
systemd-219-19.el7.x86_64
systemd-python-219-19.el7.x86_64
systemd-sysv-219-19.el7.x86_64

Systemd version on good host:
systemd-sysv-208-20.el7_1.5.x86_64
systemd-208-20.el7_1.5.x86_64
systemd-python-208-20.el7_1.5.x86_64
systemd-libs-208-20.el7_1.5.x86_64

I have only upgraded glibc on good host:
glibc-2.17-106.el7_2.4.x86_64
glibc-common-2.17-106.el7_2.4.x86_64

And problem didn't appear.

I have upgraded systemd to the same latest version as on broken host and problem appeared. 

I've checked another thing. I've upgraded also kernel pkg and it has systemd dep.

So i've made another test. I've upgraded only systemd without glibc and problem appeared.

So You are right that the real source of problem is inside systemd.

Comment 3 Florian Weimer 2016-04-27 12:48:40 UTC
(In reply to Krzysztof Pawłowski from comment #2)
> So You are right that the real source of problem is inside systemd.

Thanks, reassigning to systemd.

Comment 4 Lukáš Nykrýn 2016-04-27 13:10:01 UTC
Just to be completely sure, can you try to remove nss_myhostname from /etc/nsswitch.conf?

Comment 6 Krzysztof Pawłowski 2016-04-28 10:16:57 UTC
I've change line in /etc/nsswitch.conf:

From:
hosts:      files dns myhostname

To:
hosts:      files dns

And then result is instant:

time python -c 'import socket; print socket.getaddrinfo("'`hostname -f`'", None, socket.AF_INET6)'
hostname: Name or service not known
Traceback (most recent call last):
  File "<string>", line 1, in <module>
socket.gaierror: [Errno -2] Name or service not known

real    0m0.071s
user    0m0.041s
sys     0m0.021s

Comment 7 Krzysztof Pawłowski 2016-05-16 15:24:22 UTC
Any chances for fix ?

Comment 8 Jan Synacek 2016-05-17 13:39:12 UTC
I debugged this a bit and I wonder... Why does _nss_myhostname_gethostbyname3_r() get af=10 as its argument? The 10 means PF_INET6 (AF_INET6). Why is glibc passing that when it knows that IPv6 is disabled?

Comment 9 Florian Weimer 2016-05-18 07:32:38 UTC
(In reply to Jan Synacek from comment #8)
> I debugged this a bit and I wonder... Why does
> _nss_myhostname_gethostbyname3_r() get af=10 as its argument? The 10 means
> PF_INET6 (AF_INET6). Why is glibc passing that when it knows that IPv6 is
> disabled?

The reproducer in comment 6 explicitly asks for an IPv6 address, so glibc tries to obtain it.

Comment 10 Jan Synacek 2016-05-18 07:43:41 UTC
Oh, my bad. I forgot to mention that, as a reproducer, I simply used 'getent hosts $(hostname -f)', which triggers the timeout as well.

Comment 11 Florian Weimer 2016-05-18 08:04:40 UTC
(In reply to Jan Synacek from comment #10)
> Oh, my bad. I forgot to mention that, as a reproducer, I simply used 'getent
> hosts $(hostname -f)', which triggers the timeout as well.

getent performs an AF_INET6 lookup followed by an AF_INET lookup, so that's not surprising at all.

Comment 12 Jan Synacek 2016-05-20 10:44:38 UTC
https://github.com/lnykryn/systemd-rhel/pull/20

Comment 13 Branislav Blaškovič 2016-06-06 11:14:35 UTC
QA acking.

Comment 14 Lukáš Nykrýn 2016-06-09 08:14:01 UTC
pushed to staging ->
https://github.com/lnykryn/systemd-rhel/commit/6e5117b83af5998359916f276a9b32f755c0e6f4
-> post

Comment 18 errata-xmlrpc 2016-11-04 00:53:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2216.html