Bug 1607901 - Kernel 4.18 causing systemd to coredump on boot on i686 [NEEDINFO]
Summary: Kernel 4.18 causing systemd to coredump on boot on i686
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 28
Hardware: i686
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: x86Tracker
TreeView+ depends on / blocked
 
Reported: 2018-07-24 14:13 UTC by Jeff Backus
Modified: 2019-05-28 22:44 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-05-28 22:44:58 UTC
Type: Bug
labbott: needinfo? (jeff.backus)


Attachments (Terms of Use)
Portion of systemd log showing coredump (10.88 KB, text/plain)
2018-07-24 14:13 UTC, Jeff Backus
no flags Details
Patch to fix udev segfault issue. (537 bytes, patch)
2018-08-09 21:33 UTC, Jeff Backus
no flags Details | Diff

Description Jeff Backus 2018-07-24 14:13:12 UTC
Created attachment 1470305 [details]
Portion of systemd log showing coredump

Description of problem:
System fails to boot with kernel 4.18 on i686.

Version-Release number of selected component (if applicable):
Rawhide and F28, using any of the kernel release candidates after rc0.git2.1. On F28, systemd is what is available from official repos

How reproducible:
With small sample size, able to reproduce 66% of time. Able to reproduce in a VM.

Steps to Reproduce:
1. Install one of the kernel 4.18 release candidates after rc0.git2.1 on a 32bit Fedora 28 install
2. Reboot and select 4.18 kernel

Actual results:
Booting halts, dropping to dracut prompt. Log indicates coredump. Last systemd msg prior to coredump:
Jul 24 04:41:18 localhost.localdomain systemd[661]: var-lib-nfs-rpc_pipefs.mount: Executing: /usr/bin/mount sunrpc /var/lib/nfs/rpc_pipefs -t rpc_pipefs

I've attached a portion of the log from boot. Full log available on request.

Expected results:
Successful boot, leaving user at graphical login

Additional info:
Was able to narrow down when this showed up to somewhere between kernel dist-git  commits dc16ce7d36f (good, produced rc0.git2.1) and 6a5d7f80f2 (bad, produced rc0.git5.1). The following commits would not compile, probably due to soundwire issue (fix introduced in 6a5d7f80f2):
  - 037431cf9
  - 6cf9fb960
  - 9382c1533
  - 4b8512e91

Comment 1 Justin M. Forbes 2018-07-24 15:30:29 UTC
Adding it to the i686 blocker, the SIG should track this down and fix it.

Comment 2 Jeff Backus 2018-07-24 15:39:57 UTC
(In reply to Justin M. Forbes from comment #1)
> Adding it to the i686 blocker, the SIG should track this down and fix it.

Thanks, Justin. Working on it.

Comment 3 Jeff Backus 2018-07-25 16:27:49 UTC
Cherry-picked dist-git commit 6a5d7f80f2 into the commits above that failed to compile. Was able to get 037431cf9 and 6cf9fb960 to compile and they don't show the segfault bug. Commit 9382c1533 compiles and does show the segfault issue, so I suspect bug was introduced. I believe this dist-git commit corresponds to kernel commit 1c8c5a9d38f6...

Comment 4 Jeff Backus 2018-08-09 12:34:49 UTC
Finally managed to track it down to commit 24dea04767e6 in Linus's tree. Interestingly, the commit message is:

> Since LD_ABS/LD_IND instructions are now removed from the core and reimplemented through a combination of inlined BPF instructions and a slow-path helper, we can get rid of the complexity from x32 JIT.

I'll try contacting upstream and keep digging...

Comment 5 Jeff Backus 2018-08-09 21:33:45 UTC
Created attachment 1474835 [details]
Patch to fix udev segfault issue.

Comment 6 Jeff Backus 2018-08-09 21:38:47 UTC
Found the issue. Looks like upstream reduced the size of the stack when making the above mentioned JIT improvements. It appears that the JIT is overrunning its stack, inducing random crashes.

The patch I just added fixes the issue. Next week I'll work on submitting to upstream.

Comment 7 Laura Abbott 2018-08-10 12:08:53 UTC
That's a great find!

Comment 8 Laura Abbott 2018-10-01 21:17:47 UTC
We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 28 kernel bugs.
 
Fedora 28 has now been rebased to 4.18.10-300.fc28.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.
 
If you have moved on to Fedora 29, and are still experiencing this issue, please change the version to Fedora 29.
 
If you experience different issues, please open a new bug report for those.

Comment 9 Dominik 'Rathann' Mierzejewski 2018-10-04 17:59:01 UTC
I can confirm that this isn't occurring with 4.18.10-200.fc28.i686 running on Intel Atom N270.

Comment 10 Ben Cotton 2019-05-02 19:22:39 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 11 Ben Cotton 2019-05-02 20:44:41 UTC
This message is a reminder that Fedora 28 is nearing its end of life.
On 2019-May-28 Fedora will stop maintaining and issuing updates for
Fedora 28. It is Fedora's policy to close all bug reports from releases
that are no longer maintained. At that time this bug will be closed as
EOL if it remains open with a Fedora 'version' of '28'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 28 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 12 Ben Cotton 2019-05-28 22:44:58 UTC
Fedora 28 changed to end-of-life (EOL) status on 2019-05-28. Fedora 28 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.