Bug 909720

Summary:

systemd stops on a shutdown/reboot after an upgrade to Fedora 18

Product:

[Fedora] Fedora

Reporter:

Michal Jaegermann <michal>

Component:

systemd

Assignee:

systemd-maint

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

CC:

fdeutsch, harald, johannbg, lnykryn, mattdm, maurizio.antillon, metherid, mschmidt, msekleta, notting, plautrba, red, systemd-maint, vpavlin

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

systemd-201-2.fc18.6

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2013-05-16 02:58:41 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

960911

Attachments:

Description	Flags
a shot of a screen which shows up when trying to reboot from level 3	none
screen messages on a stuck reboot with systemd.log_level=debug	none
sometimes you can see this	none
picture one of last screens before poweroff gets stuck	none
picture two of last screens before poweroff gets stuck	none
picture three of last screens before poweroff gets stuck	none
/etc/fstab from an affected machine	none
dmesg with a debugging information	none
results of 'systemctl dump'	none
an output from 'ps wwwwx -o pid,ppid,cgroup,cmd 1326'	none

Description Michal Jaegermann 2013-02-10 17:35:39 UTC

Created attachment 695869 [details]
a shot of a screen which shows up when trying to reboot from level 3

Description of problem:

A machine which did not have any troubles before upgrading an installation to Fedora 18 stopped shutting down and rebooting after an upgrade.  It goes through initial motions and gets stuck.  If trying that from level 3 the last line on a screen is something like "systemd-journald[251]: Received SIGUSR1" (a picture of such screen attached).  When booted in a graphic mode then there is nothing really look at on any of text consoles but hitting few times "Alt-Ctrl-Del" usually gets me back to a grub menu and a machine reboots again regardless if I tried shutdown or reboot.  A similar action when at level 3 does not help that way.

Running from a commmand line 'systemctl -f reboot', and similar, does have a desired effect but is clearly of no help in "normal circumstances".

Nothing in logs that I can find to shed some light on what happens.

Version-Release number of selected component (if applicable):
systemd-197-1.fc18.1

How reproducible:
always

Additional info:
This particular machine happens to have /var as a separate file system but /var/run and /var/lock are symlinks as expected.

Comment 1 Michal Jaegermann 2013-02-10 20:00:39 UTC

On an idea that something is trying to get something out of files in /var/run/ instead of /run/, while /var is already unmounted, I booted in emergency mode and added /var/run symlink on / (/var/lock was already there).  Unfortunately this did not fix the issue.  Shutdown is getting stuck in the same point as before.

There is a change, though.  Now also when booted to level 3 I can get a machine back to a grub menu with a help of "Alt-Ctrl-Del".  No poweroff, regardless.  Sometimes, when unstucking that process in this way, it is possible to glimpse some message about watchdog but this flicks so fast that I am unable to read what is says.

Comment 2 Michal Jaegermann 2013-02-18 22:02:17 UTC

Apparently nobody wants to even acknowledge the problem but I still cannot shutdown or reboot.  Among various machines I upgraded to F18 only one is affected that way.  A separate /var looks like something bogus.  I moved /var to / (and adjusted /etc/fstab, of course) and nothing changed.  A reboot and/or poweroff, unless 'systemctl -f ...' is used, is getting stuck as before.  That lasts until I hit Alt-Ctrl-Del which forces a continuation (some messages scroll on a screen but way too fast for me to even catch some fragments) and I get eventually to a grub menu and a machine reboots but never powers off.

Appending to boot command 'systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M' causes a copious output to screen which does not seems to be permanently logged anywhere.  An attempt to reboot ends up with such lines on a screen

Started Trigger Flushing of Journal to Persistent Storage...
systemd-journald[xxx]: Received SIGUSR1
Started Trigger Flushing of Journal to Persistent Storage.

(A photo of debug messages on the last screen attached - earlier stuff is not for me to read).

Whererever this "Persistent Storage" is supposed to be so far I failed to find anything of that sort. In /var/log/messages I can find whatever from the last startup but nothing about shutting down.

BTW - suspend/resume cycle does work and does not seem to be affected.

Maybe another possible clue - whatever I am trying with journalctl I cannot display anything earlier than the last boot.

Any ideas what is going on here and how to debug that?

Comment 3 Michal Jaegermann 2013-02-18 22:04:52 UTC

Created attachment 699168 [details]
screen messages on a stuck reboot with systemd.log_level=debug

Comment 4 Michal Jaegermann 2013-02-18 22:09:03 UTC

Created attachment 699169 [details]
sometimes you can see this

I noticed on occasions the following message on a console.  I have no idea how to trigger that but it may show up (or not).  I cannot really tell how relevant it is here.

I thought that I attached that picture previously but apparently something went askew.

Comment 5 Michal Jaegermann 2013-02-19 00:21:56 UTC

Created attachment 699242 [details]
picture one of last screens before poweroff gets stuck

The following three pictures are photos of all available screens after 'poweroff' got stuck.  A machine was rebooted with the following added to a standard boot line:

    systemd.log_level=debug systemd.log_target=syslog-or-kmsg log_buf_len=1M enforcing=0 3

On the first of these three shots you can see:

   Stopped Power-Off
   ....
   Stopped target Shutdown

and what follows it looks to me like indeed an attempt to restart the whole system ending up with

   Mounted /usr
   Reached target Mounted Local File Systems.

culminating with

   systemd-journald[xyz]: Received SIGUSR1

and we are stuck. Is all of this correct and expected?  It does not look that way to me

Comment 6 Michal Jaegermann 2013-02-19 00:23:45 UTC

Created attachment 699244 [details]
picture two of last screens before poweroff gets stuck

Comment 7 Michal Jaegermann 2013-02-19 00:24:33 UTC

Created attachment 699246 [details]
picture three of last screens before poweroff gets stuck

Comment 8 Michal Schmidt 2013-02-19 13:33:24 UTC

The screenshots indicate that the shutdown sequence starts, but some other jobs that conflict with the shutdown jobs get enqueued.

The screenshots suggest it may be related to encrypted filesystems. Would you attach /etc/fstab, /etc/crypttab, and the output of 'lsblk'?

A text log would be more useful to work with than a series of screenshots, so please try to create one. I suggest you to enable the Debug Shell on tty9 
(see http://freedesktop.org/wiki/Software/systemd/Debugging#Early_Debug_Shell), boot with "systemd.log_level=debug systemd.log_target=kmsg log_buf_len=1M", reproduce the problem, then switch to the debug shell and save some data to disk:
  dmesg > /dmesg.txt
  systemctl dump > /dump.txt
  sync && reboot -f
After reboot please attach the two files here. Thanks.

Comment 9 Davide Repetto 2013-02-20 19:53:42 UTC

I have the problem as well on a bunch of machines upgraded to F18.
I will try with a debug-shell later if you still need it.

lsblk
=====
NAME    MAJ:MIN RM   SIZE RO TYPE   MOUNTPOINT
sda       8:0    0 189,9G  0 disk   
├─sda1    8:1    0     2M  0 part   
├─sda2    8:2    0 515,7M  0 part   
│ └─md0   9:0    0 515,7M  0 raid1  /boot
├─sda3    8:3    0    32G  0 part   
│ └─md1   9:1    0    32G  0 raid10 /
├─sda4    8:4    0 155,4G  0 part   
│ └─md2   9:2    0 155,4G  0 raid10 /home
└─sda5    8:5    0     2G  0 part   [SWAP]
sdb       8:16   0 596,2G  0 disk   
├─sdb1    8:17   0     5M  0 part   
├─sdb2    8:18   0   518M  0 part   
├─sdb3    8:19   0    32G  0 part   
├─sdb4    8:20   0 560,8G  0 part   /hde
└─sdb5    8:21   0   2,9G  0 part   [SWAP]
sdc       8:32   0 189,9G  0 disk   
├─sdc1    8:33   0     2M  0 part   
├─sdc2    8:34   0 515,7M  0 part   
│ └─md0   9:0    0 515,7M  0 raid1  /boot
├─sdc3    8:35   0    32G  0 part   
│ └─md1   9:1    0    32G  0 raid10 /
├─sdc4    8:36   0 155,4G  0 part   
│ └─md2   9:2    0 155,4G  0 raid10 /home
└─sdc5    8:37   0     2G  0 part   [SWAP]
sdd       8:48   0 372,6G  0 disk   
└─sdd1    8:49   0 372,6G  0 part   /run/media/davide/emulator
sr0      11:0    1  1024M  0 rom    
sr1      11:1    1  1024M  0 rom    

#/etc/fstab
===========
UUID=a0c04a67-f856-4066-9402-138c3207401f /                       ext4    defaults        1 1
UUID=49df9fc5-1439-48a2-9775-8a14dc0f1cf7 /boot                   ext4    defaults        1 2
UUID=ee43009c-75f1-49f0-83e9-40bd79c2c973 /home                   ext4    defaults,noatime,data=writeback,commit=30,barrier=0      1 2
UUID=968df47b-39ee-47d3-8c90-0a4b1bfc6b2a /hde                    ext4    defaults,noatime,data=writeback,commit=120,barrier=0,noauto,comment=systemd.automount      1 2
UUID=ef92e405-da6b-46d9-affe-12ed8d46d5c0 /mnt/backup-boot        ext4    defaults,noauto,noatime,nodiratime,data=writeback,commit=360,barrier=0        1 0
UUID=e2c201d3-8cb3-4ba3-b9a9-18678ba3e477 /mnt/backup-root        ext4    defaults,noauto,noatime,nodiratime,data=writeback,commit=360,barrier=0        1 0
UUID=471531ca-3f8a-4066-bc56-15b8c7959d2d swap                    swap    defaults
UUID=92fcf421-9183-47c9-bd40-90f8e3c3c400 swap                    swap    defaults
UUID=817cf39b-743f-449f-bc38-104d6a5b632e swap                    swap    defaults

//smb/sys		/net/work		cifs	user=****,pass=****,domain=WORKGROUP,uid=davide,gid=users,noauto,comment=systemd.automount	0 0

Comment 10 Michal Jaegermann 2013-02-21 00:51:07 UTC

(In reply to comment #8)
> The screenshots indicate that the shutdown sequence starts, but some other
> jobs that conflict with the shutdown jobs get enqueued.

Yes, indeed.  As I wrote I cannot make any sense out of this.
 
> The screenshots suggest it may be related to encrypted filesystems.

Not sure where this is coming from but on this machine there is not, and it never was, any encrypted filesystem.  This particular hardware is 1002HA Asus eeepc netbook.  It was originally installed with Fedora 12 (I think, or maybe even earlier) and it was upgraded, not ever reinstalled from scratch, to (at least) Fedora 14, 16 and 18 and it always working fine with an exception of this boo-boo after the last upgrade.

> Would
> you attach /etc/fstab, /etc/crypttab, and the output of 'lsblk'?

There is no /etc/crypttab.  An output of lsblk looks like this:

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0 149.1G  0 disk 
├─sda1   8:1    0 203.9M  0 part /boot
├─sda2   8:2    0  31.4M  0 part 
├─sda3   8:3    0    10G  0 part /usr
├─sda4   8:4    0     1K  0 part 
├─sda5   8:5    0     4G  0 part /var/cache
├─sda6   8:6    0     3G  0 part /
├─sda7   8:7    0     2G  0 part [SWAP]
├─sda8   8:8    0 509.9M  0 part /usr/local
└─sda9   8:9    0 129.3G  0 part /home

'sda2' is some kind of an EFI partition which came that way from a manufacturer and which I never touched - just in case.
/etc/fstab attached.
 
> A text log would be more useful to work with than a series of screenshots,

I agree, and much quicker and more convenient to produce, but I did not know how to get such things in this situation. :-)

> http://freedesktop.org/wiki/Software/systemd/Debugging#Early_Debug_Shell),

Error 503 Service Unavailable

Service Unavailable
Guru Meditation:

XID: 1191354645

Varnish cache server

(whatever all that may mean).  It looks like that instructions at
https://fedoraproject.org/wiki/Systemd_early_debug_shell are at least equivalent.

>   dmesg > /dmesg.txt
>   systemctl dump > /dump.txt

Attached.  BTW - all available updates are applied and selinux is active but set to "permissive".

(In reply to comment #9 from Davide Repetto)
> I have the problem as well on a bunch of machines upgraded to F18.

So it appears that you have even a bigger headache.  I do have a number of installations which went through the same F16->F18 upgrade and only this particular one decided that it does not want to reboot anymore.  I lucked out here as at least this one is not a remote and some others are.

Comment 11 Michal Jaegermann 2013-02-21 00:52:34 UTC

Created attachment 700302 [details]
/etc/fstab from an affected machine

Comment 12 Michal Jaegermann 2013-02-21 00:54:18 UTC

Created attachment 700303 [details]
dmesg with a debugging information

reboot starts at 65.572214

Comment 13 Michal Jaegermann 2013-02-21 00:55:29 UTC

Created attachment 700304 [details]
results of 'systemctl dump'

r

Comment 14 Michal Jaegermann 2013-02-21 01:01:16 UTC

BTW - there is 'rd.luks=0' in dmesg output.  As comment #8 suggests "may be related to encrypted filesystems" I tried to boot without this option.  No change.

Comment 15 Michal Schmidt 2013-02-22 16:23:30 UTC

Michal,
in the logs I found that at some point during the shutdown a conflicting transaction gets installed. Apparently something calls for a restart of the autofs service:

 [   67.089143] systemd[1]: Enqueued job autofs.service/restart as 915

systemd then goes on to fulfil the requirement dependencies of autofs.service, which implies cancelling of the shutdown jobs that are still pending.

I'm not sure where the autofs restart is called from. Perhaps it's in some network-related hook script. Please try to find out where it could be, by grepping for autofs under /etc...

I have now implemented a general mechanism upstream to avoid problems of this class:
http://cgit.freedesktop.org/systemd/systemd/commit/?id=23ade460e5a118daa575a961b405d089f95e0617

http://cgit.freedesktop.org/systemd/systemd/commit/?id=b85bdddafb321fa870b9250a2ff17040d6996061

Comment 16 Davide Repetto 2013-02-22 18:54:58 UTC

Hallo Michal,
can you make a build we can test?

Comment 17 Michal Jaegermann 2013-02-22 20:39:03 UTC

(In reply to comment #15)
> Apparently something calls for a restart of the
> autofs service:
> 
>  [   67.089143] systemd[1]: Enqueued job autofs.service/restart as 915

> Please try to find out where it could be, by
> grepping for autofs under /etc...

Here is the catch.  While I am trying to resolve this problem:

# systemctl status autofs.service
autofs.service - Automounts filesystems on demand
	  Loaded: loaded (/usr/lib/systemd/system/autofs.service; disabled)
	  Active: inactive (dead)
# pgrep -f -l autofs
# pgrep -f -l automou
# ls /run/auto*
ls: cannot access /run/auto*: No such file or directory

So far so good.

# grep -ri autofs /etc/systemd /etc/rc.d/init.d/

# grep -ri autofs /etc | grep -v selinux
/etc/auto.misc:# Details may be found in the autofs(5) manpage
/etc/sysconfig/autofs:# Define default options for autofs.
/etc/sysconfig/autofs:#	     request autofs lookup SRV RRs for a domain of
/etc/sysconfig/autofs:# If no schema is set autofs will check each of the schemas
/etc/sysconfig/autofs:#AUTH_CONF_FILE="/etc/autofs_ldap_auth.conf"
/etc/sysconfig/autofs:# If the kernel supports using the autofs miscellanous device
/etc/gconf/gconf.xml.defaults/%gconf-tree.xml:					<longdesc>Whether to display information about all filesystems (including types like &apos;autofs&apos; and &apos;procfs&apos;). Useful for getting a list of all currently mounted filesystems.</longdesc>
/etc/auto.master:# For details of the format look at autofs(5).
/etc/auto.master:# Include /etc/auto.master.d/*.autofs
/etc/autofs_ldap_auth.conf:See autofs_ldap_auth.conf(5) for more information.
/etc/autofs_ldap_auth.conf:<autofs_ldap_sasl_conf
Binary file /etc/prelink.cache matches

Now I tried reboot, I am getting stuck and I have on a debug console:

# pgrep -f -l autofs
1645 /bin/systemctl restart autofs.service

In /run indeed three files showed up: /run/autofs.pid /run/autofs-running (both look the same but permits are different) and
/run/autofs.fifo-net. Boggles ....

If I will do enable autofs.service, which was actually active before the whole puzzle started, then it works fine but a reboot outcome is exactly the same.

For an extra attraction I have another, bigger, laptop which went through the same F16->F18 upgrade, and on which autofs is currently active and which reboots and shutdowns without any hint of a problem; nor, lucky me, I was hit by this on any other upgraded desktop machines.

Comment 18 Michal Jaegermann 2013-02-23 01:58:05 UTC

Maybe autofs is a blind alley after all?  Looking again at this dmesg output I attached I see lines like that:

[   67.083333] systemd[1]: Installed new job home.mount/start as 922
[   67.083542] systemd[1]: Installed new job local-fs-pre.target/start as 930
[   67.083621] systemd[1]: Job shutdown.target/start finished, result=canceled
[   67.083807] systemd[1]: Job systemd-poweroff.service/start finished, result=canceled
[   67.084166] systemd[1]: Job poweroff.target/start finished, result=canceled
[   67.084325] systemd[1]: Job umount.target/start finished, result=canceled
[   67.084552] systemd[1]: Installed new job tmp.mount/start as 936
[   67.084703] systemd[1]: Installed new job boot.mount/start as 937
[   67.084775] systemd[1]: Installed new job systemd-fsck@dev-disk-by\x2duuid-627515fb\x2dc43a\x2d4046\x2d9e88\x2dd08db1387aa8.service/start as 938
[   67.084920] systemd[1]: Installed new job usr-local.mount/start as 945
[   67.084996] systemd[1]: Installed new job systemd-fsck@dev-disk-by\x2duuid-b6c83b78\x2d1941\x2d48ea\x2d9959\x2db1d97537e19d.service/start as 946
[   67.085139] systemd[1]: Installed new job usr.mount/start as 953
[   67.085203] systemd[1]: Installed new job systemd-fsck@dev-disk-by\x2duuid-d73b72e2\x2df494\x2d489d\x2d8149\x2df14732f5e181.service/start as 954
[   67.085325] systemd[1]: Installed new job var-cache.mount/start as 961
[   67.085409] systemd[1]: Installed new job systemd-fsck@dev-disk-by\x2duuid-0b6ce9a3\x2d7b6e\x2d4ff7\x2dbbcc\x2dfe5c8d7e28dd.service/start as 962
[   67.085571] systemd[1]: Installed new job systemd-remount-fs.service/start as 969
[   67.085725] systemd[1]: Installed new job fedora-storage-init.service/start as 970
[   67.085880] systemd[1]: Installed new job fedora-readonly.service/start as 975
[   67.085952] systemd[1]: Installed new job systemd-fsck-root.service/start as 976

I am not sure how to interpret all of this but "systemd-fsck-root.service/start"?  On a shutdown?  Is this really correct?  Only later there is

[   67.089143] systemd[1]: Enqueued job autofs.service/restart as 915

This is indeed the only one "Enqueued" in that poweroff sequence beyond "Enqueued job poweroff.target/start as 784" itself but at 67.084166 there is "Job poweroff.target/start finished, result=canceled".  No idea why "cancelled".

Comment 19 Michal Jaegermann 2013-02-23 22:30:15 UTC

(In reply to comment #15)

> I have now implemented a general mechanism upstream to avoid problems of this class:
> http://cgit.freedesktop.org/systemd/systemd/commit/?id=23ade460e5a118daa575a961b405d089f95e0617
> 
> http://cgit.freedesktop.org/systemd/systemd/commit/?id=b85bdddafb321fa870b9250a2ff17040d6996061

I tried to see if I can hack my own binaries patched with these but these two do not apply without rejects on the top of systemd-197-1.fc18.2 (koji) sources.  OTOH some patches from that rpm appear to be needed on the top of d6996061 snapshot but other are rejected.  I am afraid that untangling that knot is too delicate task.  Sigh!

Comment 20 Michal Jaegermann 2013-02-24 00:16:30 UTC

(In reply to comment #19)
> 
> I tried to see if I can hack my own binaries

Contrary to what I wrote in comment #19 I figured out how to modify patches from comment #15 so they can be used with systemd-197-1.fc18.2 sources.  After that a recompiled systemd indeed is not stopping anymore in the middle of a shutdown.

Only as I understand that this is really a workaround and while some tasks suddenly restart while they should not is very far from clear.  How do we know that a reverse will not happen - i.e. something should start but for mysterious reasons it fails to do so?

Comment 21 Michal Schmidt 2013-02-25 15:26:20 UTC

(In reply to comment #17)
> Now I tried reboot, I am getting stuck and I have on a debug console:
> 
> # pgrep -f -l autofs
> 1645 /bin/systemctl restart autofs.service

This process is the key to understanding of what's going on on your system. Please find out who called it ("ps -o pid,ppid,cgroup,cmd 1645" or just find it in "systemd-cgls").

(In reply to comment #18)
> I am not sure how to interpret all of this but
> "systemd-fsck-root.service/start"?  On a shutdown?  Is this really correct? 

Well, this is already due to the autofs.service/restart job being enqueued.

> Only later there is
> 
> [   67.089143] systemd[1]: Enqueued job autofs.service/restart as 915

This message concludes the activation of a transaction. The preceding installations of jobs were from the same transaction.

There should be a message at the start of the transaction calculation, but for an unknown reason it's not seen in your log:
systemd[1]: Trying to enqueue job autofs.service/restart/replace

(In reply to comment #20)
> Contrary to what I wrote in comment #19 I figured out how to modify patches
> from comment #15 so they can be used with systemd-197-1.fc18.2 sources. 
> After that a recompiled systemd indeed is not stopping anymore in the middle
> of a shutdown.

Thanks for testing this.

Comment 22 Michal Jaegermann 2013-02-26 00:55:19 UTC

(In reply to comment #21)

> Please find out who called it ("ps -o pid,ppid,cgroup,cmd 1645" or just find
> it in "systemd-cgls").

It appears that systemd.  That is what what I got in a debug shell from 'ps ...' after I got stuck (with systemd-197-1.fc18.1 reinstalled)

 1326     1 3:cpuacct,cpu:/system/autof /usr/sbin/automount --pid-file /run/autofs.pid

and systemd-cgls output is:

└─system
  ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
  ├─autofs.service
  │ └─1326 /usr/sbin/automount --pid-file /run/autofs.pid
  ├─plymouth-poweroff.service
  │ └─1206 @usr/sbin/plymouthd --mode=shutdown --attach-to-session
  ├─NetworkManager.service
  │ └─695 /sbin/dhclient -d -sf /usr/libexec/nm-dhcp-client.action -pf /var/r...
  ├─systemd-journald.service
  │ └─269 /usr/lib/systemd/systemd-journald
  ├─systemd-udevd.service
  │ └─263 /usr/lib/systemd/systemd-udevd
  └─debug-shell.service
    ├─ 255 /bin/bash
    └─1367 systemd-cgls

That is possibly not the whole picture because I have seen also this:

└─system
  ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
  ├─yum-cron.service
  │ └─control
  │   ├─1207 /bin/bash /etc/rc.d/init.d/yum-cron stop
  │   ├─1211 /bin/bash /etc/rc.d/init.d/yum-cron stop
  │   ├─1212 /bin/sh /usr/local/bin/mkyumopts
  │   └─1223 /bin/systemctl restart autofs.service
.....

Only yum-cron.service timed out after a while, the whole shutdown moved a bit forward and got again stuck on automount.  At least some other machines which do not see that issue have yum-cron.service enabled and nothing of that sort happens.

Comment 23 Michal Schmidt 2013-02-26 01:10:47 UTC

(In reply to comment #22)
>  1326     1 3:cpuacct,cpu:/system/autof /usr/sbin/automount --pid-file
> /run/autofs.pid

This is not the process I was interested in. I wanted to see what calls the /bin/systemctl process.


>   │   ├─1212 /bin/sh /usr/local/bin/mkyumopts

What does the mkyumopts process do? Where is it called from? Is this perhaps the parent of the /bin/systemctl process?

>   │   └─1223 /bin/systemctl restart autofs.service

Comment 24 Michal Jaegermann 2013-02-26 02:01:34 UTC

(In reply to comment #23)
> 
> This is not the process I was interested in. I wanted to see what calls the
> /bin/systemctl process.

Can you see that in a full output of 'ps wwwwx -o pid,ppid,cgroup,cmd 1326'?  Attached.  But that is all what I got.

> >   │   ├─1212 /bin/sh /usr/local/bin/mkyumopts
> 
> What does the mkyumopts process do?

I knew that I should not muddy the picture. :-)  Forget about 'mkyumopts'.  It is a shell script which depending on network parameters generates different options for yum.  It just looks at an outuput of 'ip' and does not touch any file systems.  That is it.  In any case my other laptop, which does not have shutdown issues with F18, is using exactly the same script.

>  Where is it called from? Is this perhaps
> the parent of the /bin/systemctl process?

This got called a subprocess of something which timed out, vanished, and the whole thing got stuck firmly with this already out of the picture.

Comment 25 Michal Jaegermann 2013-02-26 02:02:51 UTC

Created attachment 702565 [details]
an output from 'ps wwwwx -o pid,ppid,cgroup,cmd 1326'

Comment 26 Michal Schmidt 2013-02-27 09:14:11 UTC

A possible way to capture what's running the systemctl command is:

mv /bin/systemctl /bin/systemctl.real

And create /bin/systemctl as an executable shell wrapper:

#!/bin/sh
echo "MMM $(grep 'name=systemd' /proc/self/cgroup) $(sed 's/\x00/ /g' /proc/$PPID/cmdline) $0 $@" > /dev/kmsg
exec -a $(basename $0) systemctl.real "$@"


Then after reproducing the problem:
dmesg | grep MMM

Comment 27 Michal Jaegermann 2013-02-27 20:26:55 UTC

OK, I figured out what is happening in my case.

On a shutdown systemd is running /etc/rc.d/init.d/yum-cron stop.  That script has at the beginning:

test -f /etc/sysconfig/yum-cron && . /etc/sysconfig/yum-cron

and that happens always - really needed or not.  Due to my particular configuration that runs /usr/local/bin/mkyumopts and thanks to paranoid checks somewhere else as a side effect this indeed restarts autofs and shutdown is getting stuck.

A solution for me in this case turns out to be to make /usr/local/bin/mkyumopts aware if it is called by sourcing /etc/sysconfig/yum-cron from /etc/rc.d/init.d/yum-cron (where mkyumopts is not needed at all) or by sourcing from /etc/cron.daily/0yum-update.cron (definitely intended but not affecting startup and shutdown operations).

Interesting questions remain why this was not a problem before an upgrade to F18 and why my other system with an apparently similar setup seems not to show up the issue.  Maybe some checks are not that paranoid there but I will have to look yet.

What hit Davide Repetto (comment #9) I have no idea.  It is not likely that my private hacks showed up on his machines. :-)

OTOH changes referenced in comment #15 do prevent such non-obvious interactions from waking up unwanted processes in wrong moments.

Thanks for all hints how to debug the issue.

Comment 28 Jóhann B. Guðmundsson 2013-02-27 21:22:36 UTC

The Moodle maintainer duplicated the yum-cron initscript so anyone that have moodle install will likely hit this same issue...

Comment 29 Michal Jaegermann 2013-02-27 21:44:05 UTC

(In reply to comment #28)
> The Moodle maintainer duplicated the yum-cron initscript so anyone that have
> moodle install will likely hit this same issue...

Maybe not anyone.  I have /etc/sysconfig/yum-cron modified to run an extra script which generates dynamically, depending on the current network configuration, parameters for /etc/cron.daily/0yum-update.cron.  The same configuration file happens to be used by /etc/rc.d/init.d/yum-cron too and there, as a side effect and again only in some network configurations, was waking up autofs before I changed how my option generating script is called.  If moodle may be doing something of that sort I have no idea.

Even with my fix for that particular issue it does not mean that unintended interactions of that sort may not show up in other places.

Comment 30 Fedora Update System 2013-04-10 20:13:35 UTC

systemd-201-2.fc18.1 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/systemd-201-2.fc18.1

Comment 31 Fedora Update System 2013-04-11 23:26:32 UTC

Package systemd-201-2.fc18.2:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-201-2.fc18.2'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-5452/systemd-201-2.fc18.2
then log in and leave karma (feedback).

Comment 32 Fedora Update System 2013-04-16 00:01:04 UTC

Package systemd-201-2.fc18.4:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-201-2.fc18.4'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-5452/systemd-201-2.fc18.4
then log in and leave karma (feedback).

Comment 33 Davide Repetto 2013-04-17 06:41:26 UTC

Initially systemd-201-2.fc18.4 fixed the issue, but after yesterday updates the machine(s) that had the problem before, now again do not reboot properly.
Last message I see on the console says it reached target shutdown...

yum history info 140
========================
[root@dave ~]# LANG=C yum history info 140
Loaded plugins: auto-update-debuginfo, fastestmirror, langpacks, priorities, refresh-packagekit, remove-with-leaves, rpm-warm-cache,
              : tsflags, upgrade-helper, verify
Transaction ID : 140
Begin time     : Tue Apr 16 09:41:37 2013
Begin rpmdb    : 3835:244f93d127809e8ab1d52c44f91f16d01761a963
End time       :            09:49:38 2013 (8 minutes)
End rpmdb      : 3835:c6b951ae03006ea72268c8fa024515c0aa830ba9
User           :  <davide>
Return-Code    : Success
Command Line   : -y --skip-broken upgrade
Transaction performed with:
    Installed     rpm-4.10.3.1-1.fc18.i686                                @updates
    Updated       yum-3.4.3-53.fc18.noarch                                @updates
    Installed     yum-metadata-parser-1.1.4-7.fc18.i686                   @fedora
    Installed     yum-plugin-auto-update-debug-info-1.1.31-10.fc18.noarch @updates
    Installed     yum-plugin-fastestmirror-1.1.31-10.fc18.noarch          @updates
Packages Altered:
    Erase   kernel-3.8.4-202.fc18.i686               @updates
    Install kernel-3.8.7-201.fc18.i686               @updates
    Erase   kernel-devel-3.8.4-202.fc18.i686         @updates
    Install kernel-devel-3.8.7-201.fc18.i686         @updates
    Updated kernel-headers-3.8.6-203.fc18.i686       @updates
    Update                 3.8.7-201.fc18.i686       @updates
    Erase   kernel-modules-extra-3.8.4-202.fc18.i686 @updates
    Install kernel-modules-extra-3.8.7-201.fc18.i686 @updates
    Updated kernel-tools-3.8.6-203.fc18.i686         @updates
    Update               3.8.7-201.fc18.i686         @updates
    Updated kernel-tools-libs-3.8.6-203.fc18.i686    @updates
    Update                    3.8.7-201.fc18.i686    @updates
    Updated libestr-0.1.4-1.fc18.i686                @updates
    Update          0.1.5-1.fc18.i686                @updates
    Updated libgusb-0.1.4-1.fc18.i686                @fedora
    Update          0.1.6-1.fc18.i686                @updates
    Updated rsyslog-7.2.5-2.fc18.i686                @updates
    Update          7.2.6-1.fc18.i686                @updates
    Updated xorg-x11-drv-intel-2.21.5-1.fc18.i686    @updates
    Update                     2.21.6-1.fc18.i686    @updates
    Updated xorg-x11-drv-mga-1.6.1-2.fc18.i686       @fedora
    Update                   1.6.2-6.fc18.i686       @updates
    Updated yum-3.4.3-53.fc18.noarch                 @updates
    Update      3.4.3-54.fc18.noarch                 @updates
history info

Comment 34 Harald Hoyer 2013-04-17 09:24:02 UTC

(In reply to comment #33)
> Initially systemd-201-2.fc18.4 fixed the issue, but after yesterday updates
> the machine(s) that had the problem before, now again do not reboot properly.
> Last message I see on the console says it reached target shutdown...
> 
> yum history info 140
> ========================
> [root@dave ~]# LANG=C yum history info 140
> Loaded plugins: auto-update-debuginfo, fastestmirror, langpacks, priorities,
> refresh-packagekit, remove-with-leaves, rpm-warm-cache,
>               : tsflags, upgrade-helper, verify
> Transaction ID : 140
> Begin time     : Tue Apr 16 09:41:37 2013
> Begin rpmdb    : 3835:244f93d127809e8ab1d52c44f91f16d01761a963
> End time       :            09:49:38 2013 (8 minutes)
> End rpmdb      : 3835:c6b951ae03006ea72268c8fa024515c0aa830ba9
> User           :  <davide>
> Return-Code    : Success
> Command Line   : -y --skip-broken upgrade
> Transaction performed with:
>     Installed     rpm-4.10.3.1-1.fc18.i686                               
> @updates
>     Updated       yum-3.4.3-53.fc18.noarch                               
> @updates
>     Installed     yum-metadata-parser-1.1.4-7.fc18.i686                  
> @fedora
>     Installed     yum-plugin-auto-update-debug-info-1.1.31-10.fc18.noarch
> @updates
>     Installed     yum-plugin-fastestmirror-1.1.31-10.fc18.noarch         
> @updates
> Packages Altered:
>     Erase   kernel-3.8.4-202.fc18.i686               @updates
>     Install kernel-3.8.7-201.fc18.i686               @updates

This will generate a new initramfs.

Does "/run/initramfs/.need_shutdown" exist on your system?

Comment 35 Fedora Update System 2013-04-18 02:38:50 UTC

Package systemd-201-2.fc18.5:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-201-2.fc18.5'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-5452/systemd-201-2.fc18.5
then log in and leave karma (feedback).

Comment 36 Davide Repetto 2013-04-19 10:13:00 UTC

(In reply to comment #34)
> This will generate a new initramfs.
> 
> Does "/run/initramfs/.need_shutdown" exist on your system?

OK. I tried regenerating the initramfs and checked for "/run/initramfs/.need_shutdown" which exists.

I also tried systemd-201-2.fc18.5 from updates testing.

Unfortunately the problem still persists.

Comment 37 Michal Schmidt 2013-04-19 12:41:38 UTC

Davide,
I believe you are seeing a different bug than Michal. Would you please file a new Bugzilla?
It would be interesting to test if commenting out the CIFS mount and/or the automounts in fstab helps.

Comment 38 Davide Repetto 2013-04-19 21:32:56 UTC

(In reply to comment #37)
Thanks for the reply Michael,
after an hour of testing with different combinations I can say with good confidence that removing CIFS mounts and/or the automounts in fstab does not help.
Rebooting to and from level 3 does not alleviate the problem either. (I usually boot to level 5)

BTW a couple of times I noticed a strange message:
[OK] unable to unmount /boot

Do you feel it is still better to file a separate bug?

Comment 39 Michal Schmidt 2013-04-19 22:21:48 UTC

Yes, I do. Please be prepared to use the debug-shell in a similar way as was suggested in this BZ.

Comment 40 Fedora Update System 2013-05-07 13:40:42 UTC

systemd-201-2.fc18.6 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/FEDORA-2013-5452/systemd-201-2.fc18.6

Comment 41 Fedora Update System 2013-05-09 10:02:56 UTC

Package systemd-201-2.fc18.6:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-201-2.fc18.6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-5452/systemd-201-2.fc18.6
then log in and leave karma (feedback).

Comment 42 Fedora Update System 2013-05-16 02:58:41 UTC

systemd-201-2.fc18.6 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.