Bug 1546795 - systemd-journald coredump due to slow fsync
Summary: systemd-journald coredump due to slow fsync
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: 27
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1267707 systemd-journald-sigabrt, systemd-sigabrt
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-02-19 15:54 UTC by CR
Modified: 2018-11-30 21:33 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-11-30 21:33:05 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description CR 2018-02-19 15:54:35 UTC
Description of problem:
After upgrading to F27 from F25, systemd began crashing frequently.  Looking at the recent coredumps, there's always an fsync at the top of one of the threads. Looks like probably fsync() was being slow, due to high system load (the box performs lots of file IO), and then the watchdog mentioned in some previous bugzillas kills it.

#0  0x00007f611e93eddc in fsync () from /lib64/libpthread.so.0


Version-Release number of selected component (if applicable):
systemd.x86_64                           234-9.fc27                      @updates

How reproducible:
Averaging slightly more than 1 crash per day since the upgrade to F27.

Steps to Reproduce:
Wait anywhere from a few hours to a few days. No obvious pattern has emerged in terms of timing, but it's probably occurring under high IO load:
Feb 06 20:50:10 steel.internal systemd-coredump[18355]: Stack trace of thread 12253:
Feb 09 18:33:22 steel.internal systemd-coredump[14808]: Stack trace of thread 10770:
Feb 11 17:37:59 steel.internal systemd-coredump[22739]: Stack trace of thread 19130:
Feb 13 16:20:12 steel.internal systemd-coredump[1281]: Stack trace of thread 610:
Feb 13 22:03:32 steel.internal systemd-coredump[2918]: Stack trace of thread 1282:
Feb 13 23:08:09 steel.internal systemd-coredump[3092]: Stack trace of thread 2921:
Feb 14 04:03:54 steel.internal systemd-coredump[5361]: Stack trace of thread 5211:
Feb 14 05:03:07 steel.internal systemd-coredump[5618]: Stack trace of thread 5535:
Feb 14 08:15:23 steel.internal systemd-coredump[6124]: Stack trace of thread 5872:
Feb 14 10:09:25 steel.internal systemd-coredump[6409]: Stack trace of thread 6173:
Feb 14 16:40:10 steel.internal systemd-coredump[7928]: Stack trace of thread 6497:
Feb 17 02:16:43 steel.internal systemd-coredump[1740]: Stack trace of thread 1114:
Feb 18 23:51:49 steel.internal systemd-coredump[8103]: Stack trace of thread 1741:
Feb 19 06:50:04 steel.internal systemd-coredump[9029]: Stack trace of thread 8204:
Feb 19 07:50:25 steel.internal systemd-coredump[9174]: Stack trace of thread 9053:

Comment 1 CR 2018-03-09 19:59:48 UTC
It seems like these periodic SIGABRT coredumps of systemd-journald may cause journal logs to be lost. I'm not completely certain of that though. Logs are written out-of-order frequently, which makes it very difficult to understand exactly what happened during the SIGABRT time.

I recently discovered this is not the first time this bug has been reported.  In Bug 1267707 it was "fixed" (so to speak) by hacking the timeout up to 3 minutes from 1 minute.  Of course 3 minutes is also not sufficient to make the bug go away for real.

Comment 2 CR 2018-05-11 15:20:23 UTC
Still broken as before in latest update.  Coredumps are always waiting on fsync, along the lines of:

Thread 2 (Thread 0x7f30d7e55700 (LWP 11697)):
#0  in fsync () from /lib64/libpthread.so.0
#1  in journal_file_set_offline_internal () from /usr/lib/systemd/libsystemd-shared-234.so
#2  in journal_file_set_offline_thread () from /usr/lib/systemd/libsystemd-shared-234.so
#3  in start_thread () from /lib64/libpthread.so.0
#4  in clone () from /lib64/libc.so.6

Comment 3 Ben Cotton 2018-11-27 15:45:32 UTC
This message is a reminder that Fedora 27 is nearing its end of life.
On 2018-Nov-30  Fedora will stop maintaining and issuing updates for
Fedora 27. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora  'version' of '27'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 27 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 4 Ben Cotton 2018-11-30 21:33:05 UTC
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.