701282 – poweroff() hangs

Bug 701282 - poweroff() hangs

Summary: poweroff() hangs

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	i686
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	Kernel Maintainer List
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2011-05-02 12:55 UTC by Anton Guda
Modified:	2011-05-07 17:51 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2011-05-06 09:02:45 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
log with delay - used root remount (sda7) as proposed (122.48 KB, text/plain) 2011-05-02 15:59 UTC, Anton Guda	no flags	Details
same as bad case - but used another unused partition (sda13) - no delay (122.47 KB, text/plain) 2011-05-02 16:01 UTC, Anton Guda	no flags	Details
dmesg - bad case with new timestamps (123.41 KB, text/plain) 2011-05-02 21:07 UTC, Anton Guda	no flags	Details
List of processes from the same script (3.11 KB, text/plain) 2011-05-02 21:08 UTC, Anton Guda	no flags	Details
cat of /proc/mount during shutdown (1.24 KB, text/plain) 2011-05-02 21:11 UTC, Anton Guda	no flags	Details
View All

Description Anton Guda 2011-05-02 12:55:03 UTC

Description of problem:
During poweroff (not reboot) both via poweroff command and acpid based button reaction, systemd sleep approx 3 min before shutdown. System w/o selinux, initrd, plymouth not used, boot to multi-user target. To rectify problem, I create debug file  /lib/systemd/system-shutdown/50-atu-ps.sh:

#!/bin/bash
/bin/mount /docs # vfat based partition
ps aux > /dosc/ps_aux.txt
/bin/umount /dosc

During poweroff, file was created (nothing unexpected in contents, only 3 proc active), but 3-min delay disappeared. After some experiments I found minimal set of actions to prevent delay:

mount -t tmpfs tmpfs /mnt/x
umount /mnt/x


Version-Release number of selected component (if applicable):
systemd-26-1 (same result from version 18, before I failed to use systemd at all)

How reproducible:
Always. On 4 hosts. some with separated /usr, some with combined.


Steps to Reproduce:
1. poweroff (or button) - wait 3 min - real powerdown.
2. create minimal workaround file (see above)
3. poweroff - real powerof w/o delay
  
Actual results:
see above

Expected results:
Better: fast poweroff.
More fun: output some results why it happens.

Additional info:

Comment 1 Lennart Poettering 2011-05-02 13:44:46 UTC

which systemd version is this? if it's older than 26, can you retry after updating?

if this continues to exist, please boot with "systemd.log_level=debug systemd.log_target=kmsg" on the kernel cmdline, and add a shell script like the following to  /lib/systemd/system-shutdown/:

#!/bin/sh
mount / -orw,remount
dmesg > /dmesg.shutdown
mount / -oro,remount

then, reproduce the issue and attach the file /dmesg.shutdown which is generated on shutdown here. This should give us a hint what might be timing out there.

Comment 2 Anton Guda 2011-05-02 15:59:45 UTC

Created attachment 496293 [details]
log with delay - used root remount (sda7) as proposed

Comment 3 Anton Guda 2011-05-02 16:01:39 UTC

Created attachment 496294 [details]
same as bad case - but used another unused partition (sda13) - no delay

Comment 4 Lennart Poettering 2011-05-02 16:56:21 UTC

Hmm, this lacks timestamps. Is this a homegrown kernel? The standard fedora kernel has timestamps, and I really would like to see them here. What did you do to disable the timestamps? Can you redo things with timestamps enabled please?

Seems /usr is busy for you. That's already an indication that something is really wrong.

Comment 5 Anton Guda 2011-05-02 21:07:44 UTC

Created attachment 496365 [details]
dmesg - bad case with new timestamps

I build a new kernel with timestamps (custom kernel required - some staged drivers(not in this host), some FS not in modules). The log ended at approx 70 sec. But delay is after this point - there is some messages a-la "broke IRQ 41" and SATA after - on 100 -120 sec. Total delay may be less than 3 min - but I dont know exect start point to measure.

Comment 6 Anton Guda 2011-05-02 21:08:38 UTC

Created attachment 496366 [details]
List of processes from the same script

Comment 7 Anton Guda 2011-05-02 21:11:46 UTC

Created attachment 496367 [details]
cat of /proc/mount during shutdown

As seen, not /usr nor /var is mounted at this point. And why reboot and poweroff is so different? How simple action - mount/umount a tmpfs can workaround this problem?

Comment 8 Lennart Poettering 2011-05-02 21:33:09 UTC

The dump from #6 shows that everything is perfectly in order and the shutdown so far used less than 10s.

Note that the shell scripts are executed as the very last thing in the shutdown process, right before calling reboot(). If this hang there, then this has nothing to do with systemd, but simply with the kernel.

Reassigning. But I figure the kernel folks will close this pretty quickly since you are using your own kernel, and it is probably tainted.

Comment 9 Kay Sievers 2011-05-02 21:59:24 UTC

Are you sure that you don't configure the kernel to run the ancient /sbin/hotplug fork bomb?
It is known to access the rootfs hundreds of times during bootup and shutdown, even when there is no tool with that name since a long time.

In that case, make sure you kernel has:
  CONFIG_UEVENT_HELPER_PATH=""

The kernel should probably be fixed to call:
  usermodehelper_disable()
when entering the reboot() call. Just like it's done for suspend.

Comment 10 Chuck Ebbert 2011-05-03 01:33:58 UTC

Can you boot with the kernel option "sysrq_always_enabled" and try to capture traces from sysrq-p and/or sysrq-l when it's hanging?

Comment 11 Anton Guda 2011-05-03 13:43:08 UTC

New kernel was build with CONFIG_UEVENT_HELPER_PATH="" (really is was an artefact "/sbin/hotplug"). All works - poweroff is fast now. Propose to close this bug. May it worth to mention this fact in systemd docs - as bug was visible only with it (upstart poweroffs w/o delay with bad kernel).
Thanks!

Comment 12 Kay Sievers 2011-05-03 14:13:12 UTC

Great news. Thanks for testing and report! Took us a long time to find out that the kernel is doing the bad thing here and not userspace. :)

In case you are in the mood, maybe you could test this patch below with the CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" again.

I'm going to submit this patch to fix the issue, and make also other possible kernel-forked helpers impossible during reboot().

It's just a single line. No worries if you don't have the time, I'll try to fine a setup that has this problem.

+++ b/kernel/sys.c
@@ -344,6 +344,7 @@ static void kernel_shutdown_prepare(
        blocking_notifier_call_chain(&reboot_notifier_list,
                (state == SYSTEM_HALT)?SYS_HALT:SYS_POWER_OFF, NULL);
        system_state = state;
+       usermodehelper_disable();
        device_shutdown();
 }
 /**

Comment 13 Anton Guda 2011-05-05 07:09:21 UTC

With this patch and CONFIG_UEVENT_HELPER_PATH="/bin/hotplug" shutdown is fast.
All works. What about the same change in  kernel_restart_prepare?

Comment 14 Kay Sievers 2011-05-05 09:46:30 UTC

(In reply to comment #13)
> With this patch and CONFIG_UEVENT_HELPER_PATH="/bin/hotplug" shutdown is fast.
> All works.

Great. Thanks for testing this. I'll submit that.

> What about the same change in  kernel_restart_prepare?

We would not need it there, because I/O on the disk does not really hurt us on reboot. But I'll add it there too, just to look nice.

Thanks again!

Comment 15 Chuck Ebbert 2011-05-06 09:02:45 UTC

We set CONFIG_UEVENT_HELPER_PATH="" in the standard Fedora config, so this is not a Fedora kernel bug.

Comment 16 Kay Sievers 2011-05-07 17:51:12 UTC

Patch is on the way:

http://git.kernel.org/?p=linux/kernel/git/gregkh/driver-core-2.6.git;a=commitdiff;h=b50fa7c8077c625919b1e0a75fc37b825f024518

Note You need to log in before you can comment on or make changes to this bug.