Bug 1405071
Summary: | getaddrinfo looses internal lock with deferred cancellation. | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Keyue Hu <rwindz0> | ||||
Component: | glibc | Assignee: | glibc team <glibc-bugzilla> | ||||
Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-tools-bugs | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.0 | CC: | ashankar, codonell, fweimer, mnewsome, pfrankli | ||||
Target Milestone: | pre-dev-freeze | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-06-18 19:34:34 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
in the source code of glibc, sysdeps/unix/sysv/linux/check_pf.c between L322-L356, there are pthread cancellation point in __socket, __bind, or make_request. If we get pthread_cancel, when code goes in L322-L356 the check_pf lock is left locked. by the way the upstream glibc seems has no such issue. to be correct, the upstream might have the same issue. There are no cancellation points in __socket or __bind. Cancellation points in those functions would violate the POSIX requirements that no additional cancellation points be present other than those here: 2.9.5 Thread Cancellation http://pubs.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_09.html "An implementation shall not introduce cancellation points into any other functions specified in this volume of IEEE Std 1003.1-2001." However, in make_request, there is a __sendto, __recvmsg, and__netlink_assert_response, all of which could be cancellable and that would cause the lock to be lost and the subsequent __check_pf to hang. There is a _lot_ of code running in make_request, the simplest solution is to push a cleanup handler to unlock the lock. I've filed an upstream bug for this. https://sourceware.org/bugzilla/show_bug.cgi?id=20975 Thanks for the bug report. Yeah, only __sendto and __recvmsg are cancellable. and it is kind of you to fillup upstream bug. thanks! Red Hat Enterprise Linux 7 is entering Maintenance Phase Support 1 this year and as such this issue will not be considered for fixing in RHEL 7 and is being closed. If you still encounter this issue with Red Hat Enterprise Linux 8, then please open a new issue with such details. Note that the upstream issue will remain for upstream tracking. |
Created attachment 1232178 [details] backtrace of deadlock Description of problem: when pthread_cancel() on the thread calling getaddrinfo(), the libc lock in check_pf.c might be left without being unlocked. and then the next getaddrinfo call hangs forever. Version-Release number of selected component (if applicable): glibc-2.17-106.el7_2.8.x86_64 How reproducible: easy to reproduce. Steps to Reproduce: 1. start thread calling zookeeper_init on 127.0.0.1 which calls getaddrinfo 2. call pthread_cancel on this thread 3. repeat 1-2 Actual results: get hanged on getaddrinfo Expected results: never hangs Additional info: [root@3b3cfab6b378 /]# uname -a Linux 3b3cfab6b378 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux [root@3b3cfab6b378 /]# rpm -q glibc glibc-2.17-106.el7_2.8.x86_64