Bug 1564527

Summary: ntpd segfaults in sigcancel_handler / _dl_name_match_p
Product: [Fedora] Fedora Reporter: Christian Heimes <cheimes>
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 27CC: aoliva, arjun.is, cheimes, codonell, dj, fweimer, law, mfabian, pfrankli, rth, siddhesh
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-17 09:49:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christian Heimes 2018-04-06 14:09:39 UTC
Description of problem:
Over the last couple of weeks, FreeIPA's pull request CI has failed because "systemctl restart ntpd" segfaults. It's unclear what exactly is causing the segfault. I have seen the segfault a bunch of times. We don't collect statistics about the cause of failing test. I estimate that at least one test run every day is affected by the segfault. The stacktrace either points to ntpd or glibc. glibc seems more likely.

Version-Release number of selected component (if applicable):
ntp-4.2.8p10-3.fc27.x86_64
glibc-2.26-21.fc27.x86_64

How reproducible:
Irregularly, FreeIPA's CI is seeing ntpd related test failures at least once a day

Steps to Reproduce:
systemctl restart ntpd.service during IPA test suite

Actual results:
ntpd fails with segfault

Expected results:
No segfault

Additional info:

Example log from https://fedorapeople.org/groups/freeipa/prci/jobs/7a91e66c-39a0-11e8-ad87-fa163edd113b/-TestServerReplicaCALessToCAFull--test_install_caless_server_replica/master.ipa.test/journal.gz

Apr 06 13:47:42 master.ipa.test systemd[1]: Starting Network Time Service...
Apr 06 13:47:43 master.ipa.test kernel: show_signal_msg: 12 callbacks suppressed
Apr 06 13:47:43 master.ipa.test kernel: ntpd[15749]: segfault at 7f64c53e0ff8 ip 00007f64c51cf0c1 sp 00007f64c53e1000 error 6 in ld-2.26.so[7f64c51bd000+27000]
Apr 06 13:47:43 master.ipa.test audit[15748]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:ntpd_t:s0 pid=15748 comm="ntpd" exe="/usr/sbin/ntpd" sig=11 res=1
Apr 06 13:47:43 master.ipa.test audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-15750-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 06 13:47:43 master.ipa.test systemd[1]: Created slice system-systemd\x2dcoredump.slice.
Apr 06 13:47:43 master.ipa.test systemd[1]: Started Process Core Dump (PID 15750/UID 0).
Apr 06 13:47:43 master.ipa.test systemd[1]: ntpd.service: Control process exited, code=killed status=11
Apr 06 13:47:43 master.ipa.test audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ntpd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Apr 06 13:47:43 master.ipa.test systemd[1]: Failed to start Network Time Service.
Apr 06 13:47:43 master.ipa.test systemd[1]: ntpd.service: Unit entered failed state.
Apr 06 13:47:43 master.ipa.test systemd[1]: ntpd.service: Failed with result 'signal'.
Apr 06 13:47:43 master.ipa.test systemd-coredump[15751]: Process 15748 (ntpd) of user 0 dumped core.
                                                         
                                                         Stack trace of thread 15749:
                                                         #0  0x00007f64c51cf0c1 _dl_name_match_p (ld-linux-x86-64.so.2)
                                                         #1  0x00007f64c51c749c do_lookup_x (ld-linux-x86-64.so.2)
                                                         #2  0x00007f64c51c833f _dl_lookup_symbol_x (ld-linux-x86-64.so.2)
                                                         #3  0x00007f64c51cd5b3 _dl_fixup (ld-linux-x86-64.so.2)
                                                         #4  0x00007f64c51d548a _dl_runtime_resolve_xsavec (ld-linux-x86-64.so.2)
                                                         #5  0x00007f64c372c531 _Unwind_Find_FDE (libgcc_s.so.1)
                                                         #6  0x00007f64c3728a73 n/a (libgcc_s.so.1)
                                                         #7  0x00007f64c3729ce0 n/a (libgcc_s.so.1)
                                                         #8  0x00007f64c372a486 _Unwind_ForcedUnwind (libgcc_s.so.1)
                                                         #9  0x00007f64c3d251c0 __pthread_unwind (libpthread.so.0)
                                                         #10 0x00007f64c3d19ca2 sigcancel_handler (libpthread.so.0)
                                                         #11 0x00007f64c3d26a80 __restore_rt (libpthread.so.0)
                                                         #12 0x00007f64c3a0cad0 __nanosleep (libc.so.6)
                                                         #13 0x00007f64c3a0c9da sleep (libc.so.6)
                                                         #14 0x000055d1999c9df2 my_pthread_warmup_worker (ntpd)
                                                         #15 0x00007f64c3d1b61b start_thread (libpthread.so.0)
                                                         #16 0x00007f64c3a4891f __clone (libc.so.6)
                                                         
                                                         Stack trace of thread 15748:
                                                         #0  0x00007f64c3d1cb7d pthread_join (libpthread.so.0)
                                                         #1  0x000055d1999c9fd4 ntpdmain (ntpd)
                                                         #2  0x00007f64c395200a __libc_start_main (libc.so.6)
                                                         #3  0x000055d1999ba17a _start (ntpd)
Apr 06 13:47:43 master.ipa.test audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-15750-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

Comment 1 Florian Weimer 2018-04-06 14:21:18 UTC
(In reply to Christian Heimes from comment #0)
> glibc-2.26-21.fc27.x86_64

Please upgrade to glibc-2.26-24.fc27.x86_64 or later and retest.  This is likely bug 1527887.

This issue will reproduce only on machines with AVX-512 support.

Comment 2 Christian Heimes 2018-04-06 14:46:02 UTC
Thanks for the hint, Florian! We are going to update our test images and see if that fixes the problem. I'll report back to you next week.

Comment 3 Christian Heimes 2018-04-09 09:44:43 UTC
I have added glibc >= 2.26-24 as a requirement for our test suite. Let's see if tests are more stable now.

Fixed upstream
master:
https://pagure.io/freeipa/c/888d9861f86f27531753cc53644b098dc395098d

Comment 4 Christian Heimes 2018-04-17 09:49:08 UTC
I haven't seen a segfault in a while. It looks like the glibc update did the trick. Thanks Florian!

*** This bug has been marked as a duplicate of bug 1527887 ***