Bug 1755400

Summary: deadlock in valgrind when terminating program
Product: [Fedora] Fedora Reporter: Pavel Březina <pbrezina>
Component: valgrindAssignee: Mark Wielaard <mjw>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 34CC: atikhono, dodji, jakub, mjw, rjones
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: valgrind-3.18.1-1.fc34 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-10-28 19:31:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pavel Březina 2019-09-25 12:24:58 UTC
Description of problem:
When program that is run under valgrind receives SIGTERM to be terminated, valgrind gets stuck.

Version-Release number of selected component (if applicable):
valgrind-3.15.0-9.fc30.x86_64

How reproducible:
Sometimes. It is run in vagrant box as part of SSSD upstream CI. It does not happen everytime, the frequency is quite low.

Steps to Reproduce:
I do not know.

Actual results:
CI tests gets stuck, waiting for the valgrind to finish which never happens.

Expected results:
Valgrind finishes and CI tests continue.

Additional info:
ps aux | grep valgrind
vagrant  21568  0.0  1.5 107872 61980 ?        S    08:33   0:02 valgrind --log-file=/tmp/sssd-intg.CZti5UDE/var/log/sssd/valgrind_ifp.log /tmp/sssd-intg.CZti5UDE/libexec/sssd/sssd_ifp --uid 0 --gid 0 --debug-to-files

cat /tmp/sssd-intg.CZti5UDE/var/log/sssd/valgrind_ifp.log
==21568== Process terminating with default action of signal 15 (SIGTERM)
==21568==    at 0x50C7D58: __unregister_atfork (in /usr/lib64/libc-2.29.so)
==21568==    by 0x5080CE8: __cxa_finalize (in /usr/lib64/libc-2.29.so)
==21568==    by 0x5BE5BE6: ??? (in /usr/lib64/ldb/modules/ldb/memberof.so)
==21568==    by 0x401026A: _dl_fini (in /usr/lib64/ld-2.29.so)
==21568==    by 0x508067F: __run_exit_handlers (in /usr/lib64/libc-2.29.so)
==21568==    by 0x50807BF: exit (in /usr/lib64/libc-2.29.so)
==21568==    by 0x48D6290: orderly_shutdown (server.c:249)
==21568==    by 0x4EADFB5: tevent_common_invoke_signal_handler (tevent_signal.c:370)
==21568==    by 0x4EAE142: tevent_common_check_signal (tevent_signal.c:468)
==21568==    by 0x4EB017D: epoll_event_loop_once (tevent_epoll.c:909)
==21568==    by 0x4EAE41A: std_event_loop_once (tevent_standard.c:110)
==21568==    by 0x4EA9537: _tevent_loop_once (tevent.c:772)

(gdb) bt
#0  vgModuleLocal_do_syscall_for_client_WRK () at m_syswrap/syscall-amd64-linux.S:173
#1  0x00000000580a8b70 in do_syscall_for_client (syscall_mask=0x1002cadca8, tst=0x10020084b0, syscallno=202) at m_syswrap/syswrap-main.c:1964
#2  vgPlain_client_syscall (tid=tid@entry=1, trc=trc@entry=73) at m_syswrap/syswrap-main.c:1964
#3  0x00000000580a4e6b in handle_syscall (tid=tid@entry=1, trc=73) at m_scheduler/scheduler.c:1209
#4  0x00000000580a66aa in vgPlain_scheduler (tid=tid@entry=1) at m_scheduler/scheduler.c:1531
#5  0x00000000580ba318 in final_tidyup (tid=tid@entry=1) at m_main.c:2440
#6  0x00000000580ba4bd in shutdown_actions_NORETURN (tid=1, tids_schedretcode=VgSrc_FatalSig) at m_main.c:2129
#7  0x00000000580f6178 in run_a_thread_NORETURN (tidW=1) at m_syswrap/syswrap-linux.c:203
#8  0x0000000000000000 in ?? ()
(gdb) l
168		   restarting it. */
169	2:	syscall
170	3:	/* In the range [3, 4), the syscall result is in %rax, 
171		   but hasn't been committed to RAX. */
172	
173		POP_di_si_dx_cx_8
174	
175		movq	%rax, OFFSET_amd64_RAX(%rsi)	/* save back to RAX */
176	
177	4:	/* Re-block signals.  If eip is in [4,5), then the syscall

Comment 1 Ben Cotton 2019-10-31 18:48:59 UTC
This message is a reminder that Fedora 29 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 29 on 2019-11-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '29'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 29 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 2 Ben Cotton 2020-11-03 15:36:01 UTC
This message is a reminder that Fedora 31 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 31 on 2020-11-24.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '31'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 31 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 3 Pavel Březina 2020-11-04 10:36:34 UTC
This still happens sporadically.

Comment 4 Richard W.M. Jones 2020-12-08 18:46:56 UTC
I'm pretty sure I am seeing this, and also that I have a test case that
(for me) reproduces this 100% of the time.

You have to compile nbdkit (https://github.com/libguestfs/nbdkit) from
source which is not too difficult, then run this command in the nbdkit
directory:

$ NBDKIT_VALGRIND=1 ./nbdkit -U - -v -D data.AST=1 data '@4 "\x00"' allocator=malloc --run 'qemu-img convert $uri /tmp/out'

valgrind hangs on exit.

Since this didn't happen until I upgraded this box from F32, I suspect
this might actually be a kernel/glibc problem or bad interaction with
valgrind.

valgrind-3.16.1-8.fc34.x86_64
kernel 5.8.15-301.fc33.x86_64
glibc-2.32.9000-18.fc34.x86_64

Comment 5 Ben Cotton 2021-02-09 15:12:46 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 6 Richard W.M. Jones 2021-08-11 10:10:47 UTC
(In reply to Richard W.M. Jones from comment #4)
> I'm pretty sure I am seeing this, and also that I have a test case that
> (for me) reproduces this 100% of the time.
> 
> You have to compile nbdkit (https://github.com/libguestfs/nbdkit) from
> source which is not too difficult, then run this command in the nbdkit
> directory:
> 
> $ NBDKIT_VALGRIND=1 ./nbdkit -U - -v -D data.AST=1 data '@4 "\x00"'
> allocator=malloc --run 'qemu-img convert $uri /tmp/out'
> 
> valgrind hangs on exit.
> 
> Since this didn't happen until I upgraded this box from F32, I suspect
> this might actually be a kernel/glibc problem or bad interaction with
> valgrind.
> 
> valgrind-3.16.1-8.fc34.x86_64
> kernel 5.8.15-301.fc33.x86_64
> glibc-2.32.9000-18.fc34.x86_64

I think this was actually caused by subtle memory corruption in
my tests, because of an incorrect kernel madvise() hint.  In any
case it no longer happens with the latest test and
valgrind-3.17.0-11.fc35.x86_64

Comment 7 Fedora Update System 2021-10-20 14:06:01 UTC
FEDORA-2021-07e75edcab has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-07e75edcab

Comment 8 Fedora Update System 2021-10-20 20:03:49 UTC
FEDORA-2021-07e75edcab has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-07e75edcab`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-07e75edcab

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 9 Fedora Update System 2021-10-28 19:31:20 UTC
FEDORA-2021-07e75edcab has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.