Bug 1974550
| Summary: | glibc: Deadlock in atfork handler which calls dlclose | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | yili <33186108> |
| Component: | glibc | Assignee: | glibc team <glibc-bugzilla> |
| Status: | CLOSED WONTFIX | QA Contact: | qe-baseos-tools-bugs |
| Severity: | urgent | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 7.6 | CC: | ashankar, codonell, dj, fweimer, mnewsome, pfrankli, sipoyare |
| Target Milestone: | rc | Keywords: | Bugfix |
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-05-13 17:39:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
https://bugzilla.redhat.com/show_bug.cgi?id=1746928 tell The problematic commit went into 2.28 and is therefore included in Red Hat Enterprise Linux 8. The issue fixed in bug 1746928 is about calling fork from an asynchronous signal handler and that doesn't appear to be the case here. The thread in question, thread #26, is currently deregistering a fork handler as part of dlclose, and another thread might be calling fork (and running ForkedChild registered by /lib64/libsoftokn3.so). The dlclose operation will block until the current user of the handler completes the fork call. Is thread #26 inside lll_futex_wait? Can you determine exactly in the instruction stream where thread #26 is waiting? Please be aware that Red Hat Enterprise Linux 7 is currently in Maintenance Support 2 Phase and support is limited in scope to Urgent priority bug fixes. Any fixes in the area of locking for fork and dlclose would likely be considered unsuitable for Maintenance Support 2 Phase. Have you been able to reproduce this in Red Hat Enterprise Linux 8? In RHEL8 the fork handler implementation was simplified significantly and the handler deletion is done with the lock held which serializes the deletion and so it happens before the fork or after the fork entirely. This removes some of the complexity present in the implementation for RHEL7. (In reply to Carlos O'Donell from comment #4) > The issue fixed in bug 1746928 is about calling fork from an asynchronous > signal handler and that doesn't appear to be the case here. > > The thread in question, thread #26, is currently deregistering a fork > handler as part of dlclose, and another thread might be calling fork (and > running ForkedChild registered by /lib64/libsoftokn3.so). > > The dlclose operation will block until the current user of the handler > completes the fork call. > > Is thread #26 inside lll_futex_wait? > > Can you determine exactly in the instruction stream where thread #26 is > waiting? > (gdb) thread 26 [Switching to thread 26 (Thread 0x7fc3bf5c8700 (LWP 20024))] #0 __unregister_atfork (dso_handle=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/unregister-atfork.c:115 115 while ((val = deleted->handler->refcntr) != 0) (gdb) list 110 111 /* Decrement the reference counter. If it does not reach zero 112 wait for the last user. */ 113 atomic_decrement (&deleted->handler->refcntr); 114 unsigned int val; 115 while ((val = deleted->handler->refcntr) != 0) 116 lll_futex_wait (&deleted->handler->refcntr, val, LLL_PRIVATE); 117 118 deleted = deleted->next; 119 } (gdb) p *val Cannot access memory at address 0x1 (gdb) p val $1 = 1 (gdb) p *deleted->handler $2 = {next = 0x7fc3ce84de78 <fork_handler_pool+56>, prepare_handler = 0x0, parent_handler = 0x0, child_handler = 0x7fc3ac0238f0 <ForkedChild>, dso_handle = 0x7fc3ac254708, refcntr = 0, need_signal = 1} (gdb) the val is 1 and the deleted->handler->refcntr = 0 > Please be aware that Red Hat Enterprise Linux 7 is currently in Maintenance > Support 2 Phase and support is limited in scope to Urgent priority bug fixes. > > Any fixes in the area of locking for fork and dlclose would likely be > considered unsuitable for Maintenance Support 2 Phase. > > Have you been able to reproduce this in Red Hat Enterprise Linux 8? In RHEL8 > the fork handler implementation was simplified significantly and the handler > deletion is done with the lock held which serializes the deletion and so it > happens before the fork or after the fork entirely. This removes some of the > complexity present in the implementation for RHEL7. i will check the RHEL8, but maybe we can dig more about this issue. 0x00007fc3ce58fa4d <+285>: lock decl 0x28(%rax) 0x00007fc3ce58fa51 <+289>: mov (%r8),%rdi 0x00007fc3ce58fa54 <+292>: mov 0x28(%rdi),%edx 0x00007fc3ce58fa57 <+295>: test %edx,%edx 0x00007fc3ce58fa59 <+297>: je 0x7fc3ce58fa76 <__unregister_atfork+326> 0x00007fc3ce58fa5b <+299>: nopl 0x0(%rax,%rax,1) 0x00007fc3ce58fa60 <+304>: xor %r10d,%r10d 0x00007fc3ce58fa63 <+307>: add $0x28,%rdi 0x00007fc3ce58fa67 <+311>: mov %r9d,%eax 0x00007fc3ce58fa6a <+314>: syscall => 0x00007fc3ce58fa6c <+316>: mov (%r8),%rdi 0x00007fc3ce58fa6f <+319>: mov 0x28(%rdi),%edx 0x00007fc3ce58fa72 <+322>: test %edx,%edx 0x00007fc3ce58fa74 <+324>: jne 0x7fc3ce58fa60 <__unregister_atfork+304> 0x00007fc3ce58fa76 <+326>: mov 0x8(%r8),%r8 0x00007fc3ce58fa7a <+330>: test %r8,%r8 0x00007fc3ce58fa7d <+333>: jne 0x7fc3ce58fa40 <__unregister_atfork+272> 0x00007fc3ce58fa7f <+335>: leaveq 0x00007fc3ce58fa80 <+336>: retq 0x00007fc3ce58fa81 <+337>: mov (%rdx),%rcx 0x00007fc3ce58fa84 <+340>: mov %rdx,%rax 0x00007fc3ce58fa87 <+343>: cmpl $0x0,%fs:0x18 0x00007fc3ce58fa90 <+352>: je 0x7fc3ce58fa93 <__unregister_atfork+355> ---Type <return> to continue, or q <return> to quit--- 0x00007fc3ce58fa92 <+354>: lock cmpxchg %rcx,0x2c02fd(%rip) # 0x7fc3ce84fd98 <__fork_handlers> 0x00007fc3ce58fa9b <+363>: cmp %rax,%rdx 0x00007fc3ce58fa9e <+366>: je 0x7fc3ce58f9c0 <__unregister_atfork+144> 0x00007fc3ce58faa4 <+372>: mov 0x2c02ed(%rip),%rdx # 0x7fc3ce84fd98 <__fork_handlers> 0x00007fc3ce58faab <+379>: jmpq 0x7fc3ce58f9ab <__unregister_atfork+123> End of assembler dump. (gdb) (gdb) info reg rax 0xfffffffffffffe00 -512 rbx 0x7fc3ac2545d0 140478383474128 rcx 0x7fc3ce58fa6c 140478957288044 rdx 0x1 1 rsi 0x80 128 rdi 0x7fc3ce84ded0 140478960164560 rbp 0x7fc3bf5c6870 0x7fc3bf5c6870 rsp 0x7fc3bf5c6850 0x7fc3bf5c6850 r8 0x7fc3bf5c6850 140478705854544 r9 0xca 202 r10 0x0 0 r11 0x212 530 r12 0x7fc370008640 140477374432832 r13 0x6d 109 r14 0x0 0 r15 0x6d 109 rip 0x7fc3ce58fa6c 0x7fc3ce58fa6c <__unregister_atfork+316> eflags 0x212 [ AF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 according this: thread #26 inside lll_futex_wait RHEL 7 is currently in Maintenance Support 2 Phase. During this phase only urgent priority bug fixes will be considered. This particular issue has always been present given the current implementation of fork and pthread_atfork handlers. We do not plan to fix this for RHEL 7, but we are actively evaluating a solution for RHEL 8 in Bug 1888660. I am marking this as CLOSED / WONTFIX for RHEL 7. Please feel free to follow bug 1888660 for the RHEL8 fix. |
Description of problem: atfork handler which calls dlclose will cause the libvirtd hang. Version-Release number of selected component (if applicable): glibc-2.17-317.el7.x86_64 libvirt 4.5 Steps to Reproduce: I have not find the method that how to reporduce. it just happened occasionally Additional info: (gdb) bt #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007fc3ce85ae9b in _L_lock_883 () from /lib64/libpthread.so.0 #2 0x00007fc3ce85ad68 in __GI___pthread_mutex_lock (mutex=0x7fc3840017b0) at ../nptl/pthread_mutex_lock.c:78 #3 0x00007fc3d19d95a5 in virMutexLock (m=<optimized out>) at util/virthread.c:89 #4 0x00007fc3d19b190b in virObjectLock (anyobj=anyobj@entry=0x7fc3840017a0) at util/virobject.c:429 #5 0x00007fc3d1a61241 in virStoragePoolObjListSearchCb (payload=0x7fc3840017a0, name=<optimized out>, opaque=0x7fc38bffe9b0) at conf/virstorageobj.c:495 #6 0x00007fc3d1983855 in virHashSearch (ctable=0x7fc3a40e90a0, iter=iter@entry=0x7fc3d1a61230 <virStoragePoolObjListSearchCb>, data=data@entry=0x7fc38bffe9b0, name=name@entry=0x0) at util/virhash.c:696 #7 0x00007fc3d1a624dc in virStoragePoolObjListSearch (pools=0x7fc3a40e9ca0, searcher=searcher@entry=0x7fc3bb49ea20 <storageVolLookupByPathCallback>, opaque=opaque@entry=0x7fc38bffea10) at conf/virstorageobj.c:525 #8 0x00007fc3bb49e8b5 in storageVolLookupByPath (conn=0x7fc31c000ef0, path=0x7fc31c000fe0 "hygonstoragepool/1014ce0d-28d9-4312-9ad2-1dd288f1f12c_10") at storage/storage_driver.c:1676 #9 0x00007fc3d1b98dd7 in virStorageVolLookupByPath (conn=0x7fc31c000ef0, path=0x7fc31c000fe0 "hygonstoragepool/1014ce0d-28d9-4312-9ad2-1dd288f1f12c_10") at libvirt-storage.c:1414 #10 0x000056061edce21b in remoteDispatchStorageVolLookupByPath (server=0x56061f95c030, msg=0x56061f9ccf20, ret=0x7fc31c000980, args=0x7fc31c000940, rerr=0x7fc38bffec10, client=0x56061f9d6870) at remote/remote_daemon_dispatch_stubs.h:18277 #11 remoteDispatchStorageVolLookupByPathHelper (server=0x56061f95c030, client=0x56061f9d6870, msg=0x56061f9ccf20, rerr=0x7fc38bffec10, args=0x7fc31c000940, ret=0x7fc31c000980) at remote/remote_daemon_dispatch_stubs.h:18255 #12 0x00007fc3d1aa4315 in virNetServerProgramDispatchCall (msg=0x56061f9ccf20, client=0x56061f9d6870, server=0x56061f95c030, prog=0x56061f9c2dc0) at rpc/virnetserverprogram.c:437 #13 virNetServerProgramDispatch (prog=0x56061f9c2dc0, server=server@entry=0x56061f95c030, client=0x56061f9d6870, msg=0x56061f9ccf20) at rpc/virnetserverprogram.c:304 #14 0x00007fc3d1aaab4d in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x56061f95c030) at rpc/virnetserver.c:144 #15 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x56061f95c030) at rpc/virnetserver.c:165 #16 0x00007fc3d19da091 in virThreadPoolWorker (opaque=opaque@entry=0x56061f9d66f0) at util/virthreadpool.c:167 #17 0x00007fc3d19d9418 in virThreadHelper (data=<optimized out>) at util/virthread.c:206 #18 0x00007fc3ce858ea5 in start_thread (arg=0x7fc38bfff700) at pthread_create.c:307 #19 0x00007fc3ce58196d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 (gdb) frame 5 #5 0x00007fc3d1a61241 in virStoragePoolObjListSearchCb (payload=0x7fc3840017a0, name=<optimized out>, opaque=0x7fc38bffe9b0) at conf/virstorageobj.c:495 495 virObjectLock(obj); (gdb) p *obj $1 = {parent = {parent = {u = {dummy_align1 = 16290545681, dummy_align2 = 0x3cafe0011, s = {magic = 3405643793, refs = 3}}, klass = 0x7fc3a40e9c40}, lock = {lock = {__data = {__lock = 2, __count = 0, __owner = 20024, __nusers = 1, __kind = 512, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\070N\000\000\001\000\000\000\000\002", '\000' <repeats 21 times>, __align = 2}}}, configFile = 0x7fc384001290 "/etc/libvirt/storage/hygonstoragepool.xml", autostartLink = 0x7fc384004530 "/etc/libvirt/storage/autostart/hygonstoragepool.xml", active = true, starting = false, autostart = true, asyncjobs = 0, def = 0x7fc384001e50, newDef = 0x0, volumes = 0x7fc384004640} (gdb) thread 26 [Switching to thread 26 (Thread 0x7fc3bf5c8700 (LWP 20024))] #0 __unregister_atfork (dso_handle=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/unregister-atfork.c:115 115 while ((val = deleted->handler->refcntr) != 0) (gdb) bt #0 __unregister_atfork (dso_handle=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/unregister-atfork.c:115 #1 0x00007fc3ac01d283 in __do_global_dtors_aux () from /lib64/libsoftokn3.so #2 0x00007fc3bf5c6db0 in ?? () #3 0x00007fc3d22d3229 in _dl_close_worker (map=<optimized out>) at dl-close.c:266 Backtrace stopped: frame did not save the PC (gdb) p *deleted->handler $2 = {next = 0x7fc3ce84de78 <fork_handler_pool+56>, prepare_handler = 0x0, parent_handler = 0x0, child_handler = 0x7fc3ac0238f0 <ForkedChild>, dso_handle = 0x7fc3ac254708, refcntr = 0, need_signal = 1} (gdb)