Bug 724928

Summary: Reboot ends with kernel panic on systemd abort()
Product: [Fedora] Fedora Reporter: Zdenek Kabelac <zkabelac>
Component: systemdAssignee: Lennart Poettering <lpoetter>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: rawhideCC: clydekunkel7734, harald, johannbg, lpoetter, mads, mailings, metherid, michal, mschmidt, nicolas.mailhot, notting, plautrba, robatino, sergei.litvinenko, tflink
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: AcceptedBlocker
Fixed In Version: systemd-31-2.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-08-05 18:36:44 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 713560    
Attachments:
Description Flags
shutdown panic from 3.0.0-1.fc16 kernel none

Description Zdenek Kabelac 2011-07-22 10:44:28 UTC
Description of problem:

I'm not particularly sure where is the problem - but as the systemd is the
first one crashing - I'm reporting it as systemd bug:

Final messages before kernel panic:

Detaching loop devices.
Detaching DM devices.
Successfully changed into root pivot.
Assertion 'close_nointr(fd) == 0' failed at src/util.c:274, function close_nointr_nofail(). Aborting().
systemd-shutdow[1] general protection ip:7f2138be2b77 sp:7fff4d21a940 error:0 in libc-2.14.90.so[7f2138bab000+19e000]
Kernel panic - not syncing: Attempted to kill init!
Pid: 1, comm: systemd-shutdow Not tainted 3.0.0-rc7-00186-gc9a28a5 #5
Call Trace:
 [<ffffffff814948c8>] panic+0x9b/0x1a7
 [<ffffffff810549bb>] ? do_exit+0x81b/0x950
 [<ffffffff81054a5a>] do_exit+0x8ba/0x950
 [<ffffffff81054e4f>] do_group_exit+0x4f/0xc0
 [<ffffffff81067b8e>] get_signal_to_deliver+0x3be/0x680
 [<ffffffff8100216f>] do_signal+0x6f/0x7a0
 [<ffffffff81065030>] ? check_kill_permission+0x240/0x240
 [<ffffffff814a3b09>] ? sub_preempt_count+0xa9/0xe0
 [<ffffffff814a03b2>] ? _raw_spin_unlock_irqrestore+0x42/0x80
 [<ffffffff810666e3>] ? force_sig_info+0xe3/0x100
 [<ffffffff81002925>] do_notify_resume+0x65/0x80
 [<ffffffff8129919e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff814a0aaa>] retint_signal+0x46/0x8c


This happens with upstream kernel - master commit id:
cf6ace16a3cd8b728fb0afa68368fd40bbeae19f
(just some nearly final 3.0) kernel.

Is it good idea to abort()  process with pid 1 ??

Version-Release number of selected component (if applicable):
systemd-30-1.fc16.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Tim Flink 2011-07-22 22:11:34 UTC
Discussed at the 2011-07-22 alpha blocker bug review meeting. Tentatively accepted as a Fedora 16 alpha blocker assuming that a proposal to modify the alpha release criteria is accepted.

The installer must be able to complete an installation using the entire disk, existing free space, or existing Linux partitions methods, with or without encryption or LVM enabled.

Comment 3 Tim Flink 2011-07-22 22:12:46 UTC
(In reply to comment #2)
> Discussed at the 2011-07-22 alpha blocker bug review meeting. Tentatively
> accepted as a Fedora 16 alpha blocker assuming that a proposal to modify the
> alpha release criteria is accepted.
> 
> The installer must be able to complete an installation using the entire disk,
> existing free space, or existing Linux partitions methods, with or without
> encryption or LVM enabled.

I put the wrong proposed criterion in the previous comment. It was supposed to read:

The systems' mechanisms for shutting down, logging out and rebooting from a virtual console must work.

Comment 4 Michal Schmidt 2011-07-24 20:49:04 UTC
*** Bug 725158 has been marked as a duplicate of this bug. ***

Comment 5 Michal Schmidt 2011-07-24 20:51:39 UTC
*** Bug 725253 has been marked as a duplicate of this bug. ***

Comment 6 Michal Jaegermann 2011-07-24 21:45:03 UTC
Created attachment 514935 [details]
shutdown panic from 3.0.0-1.fc16 kernel

Hm, got hit by the same but with kernel-3.0.0-1.fc16.  Call trace is somewhat diffrent but pretty similar.  Attached just in case. For one reason or another systemd is not crashing if I am rebooting with older kernels.

Comment 7 Harald Hoyer 2011-07-27 10:43:04 UTC
*** Bug 725999 has been marked as a duplicate of this bug. ***

Comment 8 Tim Flink 2011-07-28 21:39:05 UTC
The proposed criterion under which this was accepted as a blocker has changed to:

It must be possible to trigger a system shutdown using standard console commands, and the system must shut down in such a way that storage volumes (e.g. simple partitions, LVs and PVs, RAID arrays) are taken offline safely and the system's BIOS or EFI is correctly requested to power down the system.

Since the criterion has changed, moving back to proposed blocker from accepted.

Does this affect shutdown as well or is this just an issue with reboot? If I'm reading this right, it sounds like the systemd crash happens after storage devices have been taken offline safely.

Can someone confirm this?

Comment 9 Clyde E. Kunkel 2011-07-28 23:14:15 UTC
Shutdown and reboot affected.  Fixed with systemd-31-2.fc16.

Comment 10 Michal Schmidt 2011-07-29 16:56:42 UTC
*** Bug 726462 has been marked as a duplicate of this bug. ***

Comment 11 Tim Flink 2011-07-29 18:02:36 UTC
Discussed in the 2011-07-29 blocker review meeting. Accepted as F16 alpha blocker due to violation of the following alpha release criterion [1]:

It must be possible to trigger a system shutdown using standard console
commands, and the system must shut down in such a way that storage volumes
(e.g. simple partitions, LVs and PVs, RAID arrays) are taken offline safely and
the system's BIOS or EFI is correctly requested to power down the system.

[1] https://fedoraproject.org/wiki/Fedora_16_Alpha_Release_Criteria

Comment 12 Tim Flink 2011-08-05 18:36:44 UTC
Verified as fixed in systemd-31-2.fc16. If the problem should re-appear, please re-open this bug.

Comment 13 Mads Kiilerich 2011-08-05 18:56:01 UTC
How can a kernel panic be fixed in systemd?

I agree that a workaround in systemd is fine for an alpha release criteria, but is it really acceptable that systemd can cause the kernel to panic? Isn't that the real bug that should be solved?

Comment 14 Zdenek Kabelac 2011-08-05 20:41:28 UTC
(In reply to comment #13)
> How can a kernel panic be fixed in systemd?
> 
> I agree that a workaround in systemd is fine for an alpha release criteria, but
> is it really acceptable that systemd can cause the kernel to panic? Isn't that
> the real bug that should be solved?

Well aborting pid == 1 isn't really good idea, what should kernel do...
The bug is already fixed in upstream systemd package. 
It's been obviously userspace problem.