Description of problem: Over the last couple of weeks, FreeIPA's pull request CI has failed because "systemctl restart ntpd" segfaults. It's unclear what exactly is causing the segfault. I have seen the segfault a bunch of times. We don't collect statistics about the cause of failing test. I estimate that at least one test run every day is affected by the segfault. The stacktrace either points to ntpd or glibc. glibc seems more likely. Version-Release number of selected component (if applicable): ntp-4.2.8p10-3.fc27.x86_64 glibc-2.26-21.fc27.x86_64 How reproducible: Irregularly, FreeIPA's CI is seeing ntpd related test failures at least once a day Steps to Reproduce: systemctl restart ntpd.service during IPA test suite Actual results: ntpd fails with segfault Expected results: No segfault Additional info: Example log from https://fedorapeople.org/groups/freeipa/prci/jobs/7a91e66c-39a0-11e8-ad87-fa163edd113b/-TestServerReplicaCALessToCAFull--test_install_caless_server_replica/master.ipa.test/journal.gz Apr 06 13:47:42 master.ipa.test systemd[1]: Starting Network Time Service... Apr 06 13:47:43 master.ipa.test kernel: show_signal_msg: 12 callbacks suppressed Apr 06 13:47:43 master.ipa.test kernel: ntpd[15749]: segfault at 7f64c53e0ff8 ip 00007f64c51cf0c1 sp 00007f64c53e1000 error 6 in ld-2.26.so[7f64c51bd000+27000] Apr 06 13:47:43 master.ipa.test audit[15748]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=system_u:system_r:ntpd_t:s0 pid=15748 comm="ntpd" exe="/usr/sbin/ntpd" sig=11 res=1 Apr 06 13:47:43 master.ipa.test audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-15750-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Apr 06 13:47:43 master.ipa.test systemd[1]: Created slice system-systemd\x2dcoredump.slice. Apr 06 13:47:43 master.ipa.test systemd[1]: Started Process Core Dump (PID 15750/UID 0). Apr 06 13:47:43 master.ipa.test systemd[1]: ntpd.service: Control process exited, code=killed status=11 Apr 06 13:47:43 master.ipa.test audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=ntpd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Apr 06 13:47:43 master.ipa.test systemd[1]: Failed to start Network Time Service. Apr 06 13:47:43 master.ipa.test systemd[1]: ntpd.service: Unit entered failed state. Apr 06 13:47:43 master.ipa.test systemd[1]: ntpd.service: Failed with result 'signal'. Apr 06 13:47:43 master.ipa.test systemd-coredump[15751]: Process 15748 (ntpd) of user 0 dumped core. Stack trace of thread 15749: #0 0x00007f64c51cf0c1 _dl_name_match_p (ld-linux-x86-64.so.2) #1 0x00007f64c51c749c do_lookup_x (ld-linux-x86-64.so.2) #2 0x00007f64c51c833f _dl_lookup_symbol_x (ld-linux-x86-64.so.2) #3 0x00007f64c51cd5b3 _dl_fixup (ld-linux-x86-64.so.2) #4 0x00007f64c51d548a _dl_runtime_resolve_xsavec (ld-linux-x86-64.so.2) #5 0x00007f64c372c531 _Unwind_Find_FDE (libgcc_s.so.1) #6 0x00007f64c3728a73 n/a (libgcc_s.so.1) #7 0x00007f64c3729ce0 n/a (libgcc_s.so.1) #8 0x00007f64c372a486 _Unwind_ForcedUnwind (libgcc_s.so.1) #9 0x00007f64c3d251c0 __pthread_unwind (libpthread.so.0) #10 0x00007f64c3d19ca2 sigcancel_handler (libpthread.so.0) #11 0x00007f64c3d26a80 __restore_rt (libpthread.so.0) #12 0x00007f64c3a0cad0 __nanosleep (libc.so.6) #13 0x00007f64c3a0c9da sleep (libc.so.6) #14 0x000055d1999c9df2 my_pthread_warmup_worker (ntpd) #15 0x00007f64c3d1b61b start_thread (libpthread.so.0) #16 0x00007f64c3a4891f __clone (libc.so.6) Stack trace of thread 15748: #0 0x00007f64c3d1cb7d pthread_join (libpthread.so.0) #1 0x000055d1999c9fd4 ntpdmain (ntpd) #2 0x00007f64c395200a __libc_start_main (libc.so.6) #3 0x000055d1999ba17a _start (ntpd) Apr 06 13:47:43 master.ipa.test audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-15750-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
(In reply to Christian Heimes from comment #0) > glibc-2.26-21.fc27.x86_64 Please upgrade to glibc-2.26-24.fc27.x86_64 or later and retest. This is likely bug 1527887. This issue will reproduce only on machines with AVX-512 support.
Thanks for the hint, Florian! We are going to update our test images and see if that fixes the problem. I'll report back to you next week.
I have added glibc >= 2.26-24 as a requirement for our test suite. Let's see if tests are more stable now. Fixed upstream master: https://pagure.io/freeipa/c/888d9861f86f27531753cc53644b098dc395098d
I haven't seen a segfault in a while. It looks like the glibc update did the trick. Thanks Florian! *** This bug has been marked as a duplicate of bug 1527887 ***