Bug 1088619

Summary:

"a stop job is running for Session 1 of user antonio"

Product:

[Fedora] Fedora

Reporter:

antonio montagnani <antonio.montagnani>

Component:

systemd

Assignee:

systemd-maint

Status:

CLOSED EOL

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

adam, ahmadsamir3891, anton4linux, awilliam, bruno, chhudson, dbranchini, germano.massullo, horsley1953, h.reindl, ian.rob.lee, igeorgex, jehan.procaccia, jhrozek, johannbg, jss, j, jusko, kparal, leho, lnykryn, mcatanzaro, msekleta, ohadlevy, plautrba, pmatilai, rmy, robinlee.sysu, rodrigorivascosta, sales, samuel-rhbugs, satellitgo, somlo, sta040, systemd-maint, technixp, tim, tuksgig, vpavlin, zbyszek, zing

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-06-29 20:08:38 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
systemd chagelog	none
copy of messages file	none
My reboot script	none
journalctl log of a complete session which hits the bug	none
shutdown-log	none
updated reboot script	none
0900-allow_stop_jobs_to_be_killed_during_shutdown.patch	none

Description antonio montagnani 2014-04-16 21:28:41 UTC

Description of problem:
installed kernel from updates-testing

Version-Release number of selected component (if applicable):
systemd-208-16.fc20.x86_64

How reproducible:
always

Steps to Reproduce:
1.shutdown from inside a session
2.
3.

Actual results:
long shut-down waiting: a stop job is running for Session 1 of user antonio 

Expected results:
clean shut-down as with other kernels

Additional info:
it seems that the problem is for one user, a different user gets a clean shut-down. On my laptop everything seems o.k.

Comment 1 antonio montagnani 2014-04-17 04:58:28 UTC

I note that it happens only if user session is not short,i.e. if I login and immediately I shut-down the machine, everything is fine

Comment 2 antonio montagnani 2014-04-23 17:22:52 UTC

updated to 3.14.1-200.fc20.x86_64 kernel from updates-testing: it works fine

Comment 3 antonio montagnani 2014-04-23 17:28:20 UTC

comment 2 should be deleted: no improvement during shut-down

Comment 4 antonio montagnani 2014-05-16 16:49:13 UTC

3.14.4-200.fc20.x86_64 kernel seems to have solved the issue...

Comment 5 antonio montagnani 2014-05-16 16:52:57 UTC

with reference to comment #4 , same issue re-appeared after four shut-downs...so it must not be considered

Comment 6 antonio montagnani 2014-05-21 05:24:05 UTC

if I logoff and then shut-down everything is fine. Any tip in order to see what is preventing correct shut-down??

Comment 7 antonio montagnani 2014-06-03 21:21:44 UTC

Created attachment 901913 [details]
systemd chagelog

not sure if it can help.....and forgot who asket for this file

Comment 8 antonio montagnani 2014-06-08 09:35:35 UTC

can anybody confirm that after updating gnome-shell issue is solved??

Comment 9 antonio montagnani 2014-06-11 06:50:21 UTC

it happened also on my second system (that is a fully 64 bit F20 on a modern laptop). Comment #8 should be disregarded

Comment 10 antonio montagnani 2014-06-12 20:58:01 UTC

also upgrading to 208-17 in updates-testing doesn't solve this bug

Comment 11 antonio montagnani 2014-06-15 15:34:27 UTC

Created attachment 908919 [details]
copy of messages file


This part of attached file is suspicious....and it shows a delay of more than a minute

May 13 19:11:39 pcdesktop1 systemd: Stopped LSB: Init script for live image..
May 13 19:11:39 pcdesktop1 systemd: Stopped Network Manager.
May 13 19:11:39 pcdesktop1 systemd: systemd-logind.service: main process exited, code=exited, status=1/FAILURE
May 13 19:11:39 pcdesktop1 systemd: Stopped Login Service.
May 13 19:11:39 pcdesktop1 systemd: Unit systemd-logind.service entered failed state.
May 13 19:11:39 pcdesktop1 systemd: systemd-logind.service holdoff time over, scheduling restart.
May 13 19:11:39 pcdesktop1 systemd: Requested transaction contradicts existing jobs: Transaction is destructive.
May 13 19:11:39 pcdesktop1 systemd: systemd-logind.service failed to schedule restart job: Transaction is destructive.
May 13 19:11:39 pcdesktop1 systemd: Unit systemd-logind.service entered failed state.
May 13 19:11:39 pcdesktop1 systemd: Received SIGRTMIN+20 from PID 8214 (plymouthd).
May 13 19:11:39 pcdesktop1 systemd: Started Show Plymouth Power Off Screen.
May 13 19:11:40 pcdesktop1 systemd: Started Restore /run/initramfs.
May 13 19:11:43 pcdesktop1 systemd: Received SIGRTMIN+20 from PID 8214 (plymouthd).
May 13 19:13:09 pcdesktop1 systemd: session-1.scope stopping timed out. Killing.
May 13 19:13:09 pcdesktop1 systemd: Stopped Session 1 of user antonio.
May 13 19:13:09 pcdesktop1 systemd: Unit session-1.scope entered failed state.
May 13 19:13:09 pcdesktop1 systemd: Stopping user-1001.slice.
May 13 19:13:09 pcdesktop1 systemd: Removed slice user-1001.slice.
May 13 19:13:09 pcdesktop1 systemd: Stopping Permit User Sessions...
May 13 19:13:09 pcdesktop1 systemd: Stopped Permit User Sessions.
May 13 19:13:09 pcdesktop1 systemd: Stopping Basic System.
May 13 19:13:09 pcdesktop1 systemd: Stopped target Basic System.
May 13 19:13:09 pcdesktop1 systemd: Stopping Slices.
May 13 19:13:09 pcdesktop1 systemd: Stopped target Slices.
May 13 19:13:09 pcdesktop1 systemd: Stopping User and Session Slice.
May 13 19:13:09 pcdesktop1 systemd: Removed slice User and Session Slice.
May 13 19:13:09 pcdesktop1 systemd: Stopping Paths.
May 13 19:13:09 pcdesktop1 systemd: Stopped target Paths.
May 13 19:13:09 pcdesktop1 systemd: Stopping CUPS Printer Service Spool.
May 13 19:13:09 pcdesktop1 systemd: Stopped CUPS Printer Service Spool.
May 13 19:13:09 pcdesktop1 systemd: Stopping Forward Password Requests to Wall Directory Watch.
May 13 19:13:09 pcdesktop1 systemd: Stopped Forward Password Requests to Wall Directory Watch.
May 13 19:13:09 pcdesktop1 systemd: Stopping Timers.
May 13 19:13:09 pcdesktop1 systemd: Stopped target Timers.
May 13 19:13:09 pcdesktop1 systemd: Stopping Daily Cleanup of Temporary Directories.
May 13 19:13:09 pcdesktop1 systemd: Stopped Daily Cleanup of Temporary Directories.
May 13 19:13:09 pcdesktop1 systemd: Stopping dnf makecache timer.
May 13 19:13:09 pcdesktop1 systemd: Stopped dnf makecache timer.
May 13 19:13:09 pcdesktop1 systemd: Stopping Sockets.
May 13 19:13:09 pcdesktop1 systemd: Stopped target Sockets.
May 13 19:13:09 pcdesktop1 systemd: Stopping Open-iSCSI iscsiuio Socket.
May 13 19:13:09 pcdesktop1 systemd: Closed Open-iSCSI iscsiuio Socket.
May 13 19:13:09 pcdesktop1 systemd: Stopping CUPS Printing Service Sockets.
May 13 19:13:09 pcdesktop1 systemd: Closed CUPS Printing Service Sockets.
May 13 19:13:09 pcdesktop1 systemd: Stopping Open-iSCSI iscsid Socket.

Comment 12 Lennart Poettering 2014-06-17 11:46:56 UTC

*** Bug 1107981 has been marked as a duplicate of this bug. ***

Comment 13 Krystian 2014-06-17 11:54:45 UTC

Bug(?) still exist on systemd-208-17.fc20.x86_64. This same issue.


@antonio montagnani: yum downgrade systemd to systemd-208-9.fc20.x86_64. On 208-9 everything is OK and we can wait on it after problem will be fixed.

Comment 14 Lennart Poettering 2014-06-17 12:06:26 UTC

*** Bug 1073714 has been marked as a duplicate of this bug. ***

Comment 15 antonio montagnani 2014-06-24 13:25:43 UTC

I upgraded to systemd-208-19.fc20.x86_64 from updates-testing and bug is still present

Comment 16 Elder Marco 2014-06-24 23:05:29 UTC

Yes, the same problem on my Dell Vostro 5470. Sometimes, I can shutdown the computer normally and sometimes, I have to wait for ~ 2 min.

Comment 17 Sergio Basto 2014-06-25 14:31:42 UTC

I also have this problem.

Comment 18 Krystian 2014-06-27 22:13:13 UTC

Problem seems fixed to me after update to systemd-208-19.fc20.x86_64.


Basic info about my hardware if it is important: http://www.cnet.com/products/lenovo-b570e-4760-15-6-c-b800-windows-7-home-premium-64-bit-2-gb-ram-320-gb-hdd-series/specs/

Comment 19 Sergio Basto 2014-06-30 20:45:35 UTC

(In reply to Krystian from comment #18)
> Problem seems fixed to me after update to systemd-208-19.fc20.x86_64.

I think I saw shutdown once without problems and other with problems this last one ends with a message: can't stop watchdogs ... 

so not sure if is fixed or not

Comment 20 Krystian 2014-06-30 21:23:24 UTC

I think i've seen that message too, but laptop was shutdown in few seconds. I hope it will not come back - for few days, quiet and normal.

Maybe it depends on specific hardware? I don't know - bug was submitted 2014-04-16 and still exist still with status New and lack of concrete proposals.

For me - if the problem returns i switch back to Fedora 19. F20 is not the best release - it don't have critical bugs, but some annoying issues from time to time.

Comment 21 antonio montagnani 2014-07-01 07:00:01 UTC

situation has improved, I have two systems:

1) desktop (AMD processor, SSD hard disk) is affected, but now bug appears sometimes, let me say one out of four instead of three out of four
2) laptop (Intel I3, standard hard disk) has been affected very very rarely, let me say two times in three months

I can't say that bug has been solved

Comment 22 Krystian 2014-07-01 19:12:29 UTC

Yes - problem still exist, my luck just passed :-)

Today (i've updated system yesterday) when i was shutting down my laptop - bug return again with "a stob job is running...". Well - maybe it's one-time incident i thought, but i've turned on laptop again, logged to GNOME3, then click shutting down and "a stop job is running..." occur again and freeze about ~1min.

Nothing serious but annoying sometimes - that's why i've started hibernating instead of shutting down.

Comment 23 Tom Horsley 2014-07-01 19:34:28 UTC

Created attachment 913905 [details]
My reboot script

I've got this script stashed in /usr/local/bin/pre-reboot and I've got an
alias defined:

reboot is aliased to `/usr/local/bin/pre-reboot | sudo /bin/bash'

I've been rebooting with this (or variations as I refined it) for a while now and haven't had any hang problems (it is important to run it as the user that is logged in, not as root).

Comment 24 Saurav Sengupta 2014-07-04 10:02:33 UTC

I have kernel version 3.14.9-200.fc20 (x86_64) and systemd version 208-19.fc20 (x86_64), both installed from the updates repository (not updates-testing), as part of regular update, and still face this issue, though it occurs randomly, not always. In bug #1023820 and bug #1059476 it is stated that systemd-208-14.fc20 fixes this issue (or a similar issue with user@.service) but that is not the case here. I am running Fedora 20 natively, not in a virtual machine.

Comment 25 technixp 2014-07-05 19:31:29 UTC

I have same problem on my acer v3-772G.
No problems with desktop PC. 

I'm running almost same software on both.

Fedora updated from official updates repository.

Comment 26 Ron Yorston 2014-07-09 05:17:56 UTC

I have recently started to encounter intermittent delays during shutdown.  It first happened on July 4th and twice since then.  In each case the log contains something like:

Jul  8 21:54:30 vulcan systemd: Stopped Network Manager.
Jul  8 21:54:30 vulcan systemd: Received SIGRTMIN+20 from PID 4181 (plymouthd).
Jul  8 21:54:30 vulcan systemd: Started Show Plymouth Power Off Screen.
Jul  8 21:54:31 vulcan systemd: Started Restore /run/initramfs.
Jul  8 21:55:59 vulcan systemd: session-1.scope stopping timed out. Killing.
Jul  8 21:55:59 vulcan systemd: Stopped Session 1 of user rmy.
Jul  8 21:55:59 vulcan systemd: Unit session-1.scope entered failed state.

Comment 27 antonio montagnani 2014-07-09 07:01:27 UTC

..that confirms comment #11

Comment 28 Daniele Branchini 2014-07-09 09:25:08 UTC

my (maybe useless) 2 cents:

I have been working from the beginning of March on four different Fedora 20 x86_64 workstations, three of them have only one local user and never showed this bug symptoms, the only machine affected has multiple (ldap) users and nfs (v3) homes.

Comment 29 Tom Horsley 2014-07-09 11:31:04 UTC

More than one user logging in does indeed almost always seem to trigger this problem for me as well, but for the first time in a long time when I rebooted this morning to get a new kernel update active, it said waiting for user NNNN, where NNNN is my normal user that it doesn't usually hang on.

I think I need to revise my perl pre-reboot script one more time to find the entire process tree for any systemd --user PIDs and kill them all explicitly regardless of which user they are.

Comment 30 Elad Alfassa 2014-07-10 16:51:58 UTC

Isn't this a duplicate of bug #1059476 ?

Anyway, I see this here too with a single-user system.

Comment 31 antonio montagnani 2014-07-10 17:09:19 UTC

no, my system is a two-user system, and I do not know whether also the second user is affected as it is an unfrequent computer user

Comment 32 antonio montagnani 2014-07-23 17:24:19 UTC

latest systemd-208-20.fc20 doesn't solve the issue (is anybody working to solve it? bug is three month old)

Comment 33 Adam Williamson 2014-07-31 00:48:50 UTC

Created attachment 922771 [details]
journalctl log of a complete session which hits the bug

I'm also seeing this issue, now, on a fresh Fedora 20 install with systemd-208-21.fc20. I have /var as a separate partition and the system is set up as a FreeIPA client. Attaching a full log of a short session where I just booted, logged in (via ssh), and shut down.

Comment 34 Adam Williamson 2014-07-31 01:04:07 UTC

Bug persists even with /var moved back to the root LV , so the issue seems to be remote users, not separate /var partition.

Comment 35 Tom Horsley 2014-07-31 11:20:27 UTC

Actually, I think the issue is more than one user, remote or not. I first saw this when I created a new user to test a fresh login in KDE, then deteted the user after logging in once. It spent 5 minutes waiting on the now deleted user.

Comment 36 Adam Williamson 2014-07-31 12:35:34 UTC

I have only ever logged into the freshly installed system as one user and root. It has system user accounts, of course, but every system does.

Comment 37 Ahmad Samir 2014-08-01 14:57:31 UTC

I see this issue too sometimes while shutting down, I usually use one user on this system. I tracked it down to systemd-208-16, reverting to any older one and I don't see that shutdown hang with the "a stop job is running for Session..." message.

Building systemd and not applying 0331-logind-given-that-we-can-now-relatively-safely-shutd.patch seems to mitigate the issue, i.e. the logs still have systemd-logind complaining about failing to stop session-*.scope... etc but the scope is killed almost instantly. From that patch:

From a0ef58e3e421909661c615ee6b067a9c2cd9f955 Mon Sep 17 00:00:00 2001
From: Michal Sekletar <msekleta>
Date: Tue, 4 Mar 2014 17:00:54 +0100
Subject: [PATCH] logind: given that we can now relatively safely shutdown
 sessions copes without working cgroup empty notifications there's no need to
 set the stop timeout of sessions scopes low

apparently the stop timeout still needs to be set very low.

Having to rebuild systemd locally every time it's updated in the repos is a pain I don't want to live with; supposedly one can configure the session scope unit via dbus but I couldn't figure out how. What I ended up doing was create a file /run/systemd/system/session-$SESSION_NO.scope.d/90-TimeoutStopUSec.conf with this in it:
[Scope]
TimeoutStopSec=500000us

then 'systemctl daemon-reload', this reverts the behaviour to what systemd < 208-16 did.

Comment 38 Michal Sekletar 2014-08-04 12:59:03 UTC

I don't think that reverting patch is a correct solution. We don't want such a short stop timeout for all scopes in general. Proper solution would be to identify misbehaving components delaying shutdown and fix them. Or maintainers of graphical session managers can set stop timeout for graphical session explicitly.

Comment 39 Sergio Basto 2014-08-04 13:15:06 UTC

(In reply to Michal Sekletar from comment #38)
> Proper solution would be to
> identify misbehaving components delaying shutdown and fix them. 

But we still don't have a clue what is it , isn't it ? I still need identify misbehaving component(s).

Comment 40 Ahmad Samir 2014-08-04 13:38:48 UTC

(In reply to Michal Sekletar from comment #38)
> I don't think that reverting patch is a correct solution. We don't want such
> a short stop timeout for all scopes in general. Proper solution would be to
> identify misbehaving components delaying shutdown and fix them. Or
> maintainers of graphical session managers can set stop timeout for graphical
> session explicitly.

No, not a solution; I just found a workaround for a very irritating issue that's been bugging me for ~4 months - and thought I should post my findings a) maybe those findings can shed some light on what's causing this bug and b) others who're afflicted by the same issue can adapt that workaround if they want, that is until a proper fix is found.

Comment 41 Tom Horsley 2014-08-04 14:05:40 UTC

I'm 99.99999% sure it is these things:

 2003 ?        Ss     0:00 /usr/lib/systemd/systemd --user
 2009 ?        S      0:00  \_ (sd-pam)

The hangs started at the same time the user daemons showed up in systemd.

If a different user logs into the system, the user daemon for that user appears, but then never goes away even when that user logs out.

If I kill -9 all the "orphan" user daemon process trees on the system manually just before a reboot, I don't get the hangs or the stop job messages.

Also if I manually modify the pam config files that create these things to not create them, I also never get the hang (but have other problems like sound no longer working because these user daemons also replace ConsoleKit).

Comment 42 Chris Hudson 2014-08-05 14:05:01 UTC

I am seeing the same behavior. My f20 box is fully updated and is an IPA client. Attaching debug.

Comment 43 Chris Hudson 2014-08-05 14:05:47 UTC

Created attachment 924220 [details]
shutdown-log

Comment 44 Ian Lee 2014-08-06 13:28:59 UTC

I was having the same problem. I believe the issue for me at least and possibly other IPA/sssd users, is sssd shutting down too soon.

I added the the following to sssd.service,

Before=systemd-user-sessions.service

So far this has worked, but I've only restarted a couple of times, so not conclusive

Comment 45 Chris Hudson 2014-08-06 13:39:25 UTC

Thanks, that workaround seems to work for me as well.

Comment 46 Jason Elwell 2014-08-07 12:39:56 UTC

Ian's workaround from comment#44 works great!  Thank you!!

Just for clarification, for those of us who might not be checked out on systemd, the file that needs to be modified is most likely:

/usr/lib/systemd/system/sssd.service

and you put the "Before" line in the [Unit] section.  You then must run:

systemctl daemon-reload

or it won't take before you reboot.

Comment 47 Harald Reindl 2014-08-15 13:01:22 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1073714 was closed as duplicate

> the reboot hangs for what seems like forever (I've given it as 
> long as 5 minutes) with the cylon eyeball and a message about 
> a stop job running for user XXXX (where XXXX seems to alternate 
> between all the users that had logged in at one time or other - 
> most of them now logged out, so I have no idea why it would 
> need to wait for anything associated with that user

this still happens randomly after i upgraded production to F20
randomly means really randomly and can happen even shortly
after a machine is bootet and restarted because a small
change

most likely the dirty workaround "loginctl anable-linger" to prevent the log-flood of https://bugzilla.redhat.com/show_bug.cgi?id=1072368 is the underlying reason

honestly i want the time back where crond just started a command as a specified user without the init-system making a lot of action, noise and user-sessions around it because that over-complication and bound all to systemd brings ongoing troubles never existed over years

Comment 48 Artur Flinta 2014-08-20 08:21:14 UTC

Unfortunately workaround from comment#44 and comment#46 didn't work for me. Once or twice system shutdown was lightning fast, but for most cases I still have over minute long waiting for „stop job is running...”

Comment 49 Tom Horsley 2014-08-20 12:13:29 UTC

(In reply to Artur Flinta from comment #48)
> Unfortunately workaround from comment#44 and comment#46 didn't work for me.
> Once or twice system shutdown was lightning fast, but for most cases I still
> have over minute long waiting for „stop job is running...”

Me either. I don't even use sssd, so it obviously can't be the problem.

I still use the reboot script from comment #23 as the only thing that works for me, and I've added code to it to "umount -l" all the NFS filesystems so I also don't hang forever if one of the servers is down when I reboot.

Comment 50 Harald Reindl 2014-08-20 12:25:52 UTC

it is *for sure* not any service
that is a systemd-problem

i faced that on the machine below much more often than on the other virtual servers which have running a lot of more services as the reverse proxy and are in fact cloes of the same golden master

crond.service
dbus.service
dnsmasq.service
fedora-readonly.service
getty
haveged.service
iptables.service
kmod-static-nodes.service
network.service
postfix.service
rsyslog.service
sshd.service
systemd-fsck-root.service
systemd-fsck@dev-disk-by\x2duuid-b834776d\x2d69d1\x2d49c6\x2d97c1\x2dd6d758a438f0.service
systemd-journald.service
systemd-logind.service
systemd-random-seed.service
systemd-remount-fs.service
systemd-sysctl.service
systemd-tmpfiles-setup-dev.service
systemd-tmpfiles-setup.service
systemd-udev-trigger.service
systemd-udevd.service
systemd-update-utmp.service
systemd-user-sessions.service
systemd-vconsole-setup.service
trafficserver.service
user
user
vmtoolsd.service
vnstat.service

Comment 51 procaccia 2014-08-26 16:16:56 UTC

I also have that problem, we are a school with +100 multiuser fedora20 stations running with sssd + ldap and NFS. It is really anoying to wait more than 2 minutes to get the system reboot .
when user reboots, he gets the "F/fedora logo" , only by pressing escape we get back to a console where "red stars" moving letf to right showing message:
"A stop job is waiting on User Manager 14587"
only minutes after, the system finally reboots (after a timeout probably !?)
I tried to apply comment 44 solution whitout success:

[root@b313-04 ~]# cat /usr/lib/systemd/system/sssd.service
[Unit]
Description=System Security Services Daemon
# SSSD will not be started until syslog is
After=syslog.target
Before=systemd-user-sessions.service

[root@b313-04 ~]# systemctl daemon-reload

please, if not a clean solution, let us know how to have a workaround that "brute force" the reboot whithout checking for users running jobs .

Comment 52 Harald Reindl 2014-08-29 20:05:33 UTC

and i bet the reason is the same as why i can't kill that idiot systemd-session-process below

[root@proxy:~]$ ps aux | grep wwwcron
wwwcron    358  0.0  0.1  46560  4156 ?        Ss   21:52   0:00 /usr/lib/systemd/systemd --user

[root@proxy:~]$ > messages 

[root@proxy:~]$ kill 358

[root@proxy:~]$ kill 358

[root@proxy:~]$ cat messages 
Aug 29 22:04:18 proxy systemd[358]: Starting Exit the Session...
Aug 29 22:04:18 proxy systemd[4200]: Failed at step CHDIR spawning /usr/bin/kill: No such file or directory
Aug 29 22:04:18 proxy systemd[358]: systemd-exit.service: main process exited, code=exited, status=200/CHDIR
Aug 29 22:04:18 proxy systemd[358]: Failed to start Exit the Session.
Aug 29 22:04:18 proxy systemd[358]: Dependency failed for Exit the Session.
Aug 29 22:04:18 proxy systemd[358]: Unit systemd-exit.service entered failed state.
Aug 29 22:04:19 proxy systemd[358]: Starting Exit the Session...
Aug 29 22:04:19 proxy systemd[4204]: Failed at step CHDIR spawning /usr/bin/kill: No such file or directory
Aug 29 22:04:19 proxy systemd[358]: systemd-exit.service: main process exited, code=exited, status=200/CHDIR
Aug 29 22:04:19 proxy systemd[358]: Failed to start Exit the Session.
Aug 29 22:04:19 proxy systemd[358]: Dependency failed for Exit the Session.
Aug 29 22:04:19 proxy systemd[358]: Unit systemd-exit.service entered failed state.

Comment 53 Harald Reindl 2014-08-30 16:10:28 UTC

and that is *why behavior* like https://bugzilla.redhat.com/show_bug.cgi?id=1072368 is bullshit - any production machine affacted from this bug hangs on user-sessions caused by "loginctl enable-linger" to not spit massive crap in the syslog all day long from cronjobs

i sit again for a machine trying to terminate "wwwcron" user-session forever

fix that crap or stop flood the log and so don't force users to enable linger

Comment 54 Sergio Basto 2014-08-30 16:35:12 UTC

loginctl disable-linger $USER 
and
loginctl disable-linger root 

seems that workaround this bug 

thanks for yours investigation

Comment 55 Harald Reindl 2014-08-30 16:48:18 UTC

fine, so we have the choice between an ignorant upstream (https://bugzilla.redhat.com/show_bug.cgi?id=1072368#c3) and that bug and then developers wonder about the tone of users sometimes? 

systemd bugs in Fedora in general lay around for months - what is with this one existing since many months? that logging would never had existed if developers would take care and read tehir systemlogs - that such bugs ever hit users proves that https://bugzilla.redhat.com/show_bug.cgi?id=1072368 even makes the developers bild with all the useless noise

 https://bugzilla.redhat.com/show_bug.cgi?id=1010572#c25

Comment 56 Panu Matilainen 2014-09-04 14:59:52 UTC

FWIW, of my six systems this never occurs on the two using local accounts. All four in FreeIPA+sssd configuration exhibit the long long pause during shutdown.  The tweak to sssd.service from comment #44 cures it, thanks Ian! OTOH fiddling with the linger settings had absolutely no effect on these systems.

Comment 57 Orion Poplawski 2014-09-04 16:12:23 UTC

Re: comment #44 - has anyone filed a bug against sssd?

Comment 58 Harald Reindl 2014-09-04 16:57:24 UTC

who cares about sssd?

* not installed on any of my machines
* the affected user-sessions are crond only

Comment 59 Orion Poplawski 2014-09-04 17:00:05 UTC

Harald - A number of people have indicated that modifying sssd.service has helped them, so presumably they (and I care).  That's why I asked.

Comment 60 Harald Reindl 2014-09-04 17:06:17 UTC

you did not understand what i say:

* the root cause is systemd
* don't shoot the messenger
* don't burry bugs by workarounds somewhere else
* don't report bugs on a place where the bug is not introduced

Comment 61 Panu Matilainen 2014-09-05 04:28:06 UTC

(In reply to Orion Poplawski from comment #57)
> Re: comment #44 - has anyone filed a bug against sssd?

I was about to, but see bug 1100752.

Comment 62 Jakub Hrozek 2014-09-05 12:40:57 UTC

Ian Lee kindly contributed a patch to sssd upstream adding:
Before=nss-user-lookup.target 
Wants=nss-user-lookup.target

Here is a test build with Ian's patch:
http://koji.fedoraproject.org/koji/taskinfo?taskID=7530061

If it indeed solves your problem, I'm going to make an official update.

Thanks for the patch, Ian!

Comment 63 Panu Matilainen 2014-09-05 15:46:57 UTC

(In reply to Jakub Hrozek from comment #62)
> Here is a test build with Ian's patch:
> http://koji.fedoraproject.org/koji/taskinfo?taskID=7530061
> 
> If it indeed solves your problem, I'm going to make an official update.

Briefly tested on one of my affected boxes, works fine for me. As in, "fixes the slow shutdown syndrome, no other unwanted side-effects noticed".

Comment 64 Sergio Basto 2014-09-06 00:09:46 UTC

not for me , but I believe we have 2 bugs , I am like  Harald Reindl don't have sssd running on my system .

Comment 65 Panu Matilainen 2014-09-06 05:43:14 UTC

Yup, there are (at least) two different bugs in play. One is sssd-related, the other is something else.

Comment 66 Adam Williamson 2014-09-06 07:07:01 UTC

If there are (at least) two bugs in play, then we need (at least) two bug reports. :)

Comment 67 Jakub Hrozek 2014-09-08 09:39:23 UTC

(In reply to Panu Matilainen from comment #63)
> (In reply to Jakub Hrozek from comment #62)
> > Here is a test build with Ian's patch:
> > http://koji.fedoraproject.org/koji/taskinfo?taskID=7530061
> > 
> > If it indeed solves your problem, I'm going to make an official update.
> 
> Briefly tested on one of my affected boxes, works fine for me. As in, "fixes
> the slow shutdown syndrome, no other unwanted side-effects noticed".

Thank you very much for testing. I pushed Ian's patch upstream and submitted a F-20 update for SSSD:
https://admin.fedoraproject.org/updates/sssd-1.11.6-3.fc20

For F-21 and rawhide, I'll just rebase to 1.12.1 later today most probably.

Comment 68 Rodrigo Rivas Costa 2014-09-11 16:17:12 UTC

This bug seems a duplicate of one I reported to ArchLinux some time ago [1].

It looks like it only affects systemd-208 and it is already fixed upstream.

After quite some debugging I determined that it was caused by a race (paradox?) in systemd-exit.service trying to kill its own grandfather or something like that.

I used this workaround that worked everytime: add a dummy "ExecStart=/usr/bin/true" before the kill command in systemd-exit.service. The dummy command may not be run, but the kill will.

    [Service]
    Type=oneshot
    ExecStart=/usr/bin/true
    ExecStart=/usr/bin/kill -s 58 $MANAGERPID

HTH

[1]: https://bugs.archlinux.org/task/38344

Comment 69 Ahmad Samir 2014-09-11 17:52:42 UTC

(In reply to Rodrigo Rivas Costa from comment #68)
> This bug seems a duplicate of one I reported to ArchLinux some time ago [1].
> 
> It looks like it only affects systemd-208 and it is already fixed upstream.
> 
> After quite some debugging I determined that it was caused by a race
> (paradox?) in systemd-exit.service trying to kill its own grandfather or
> something like that.
> 

You mean: http://lists.freedesktop.org/archives/systemd-devel/2014-January/016519.html
[...]

Comment 70 Harald Reindl 2014-09-11 17:58:42 UTC

don't tell me that above from 2014-01 is not backported in Fedora 20

if that it is true it proves once again that the idea have only major version numbers for a core-component like systemd instead bring upstream bugfix-releases like 208.1, 208.2 while hope downstream distributions are doing cherry picking right

i hate that "only major versions" attitude in general (even if the cherry picking would always work) because nobody but a few people are knowing what code is really on the systems of the users 

the only reason for doing so is that upstream can move fast forward and believes it don#t need to care for any downstream - fine if doing that with a Browser like Chrome but not for a a componet with the same importance than the kernel itself

https://www.kernel.org/ shows a responsible upstream

Comment 71 Adam Williamson 2014-09-11 18:03:45 UTC

http://cgit.freedesktop.org/systemd/systemd-stable/log/?h=v208-stable

Comment 72 Harald Reindl 2014-09-11 18:18:13 UTC

http://cgit.freedesktop.org/systemd/systemd-stable/log/?h=v208-stable don't replace minor versions nor does that make the fedora changelogs useful

comeon, tell me the *full* and *exact* changelog of systemd as released with F20 and systemd-208-21.fc20.x86_64 - that information just don't exist anywhere

with an upstream having point releases you could use upstream git and review the changes between the current and the previous minor release - currently you don#t know anything as user except that systemd related bugs in Fedora taking months 

don#t you think the kernel developers have a godd reason for minor releases and maintain more than one major release?

frankly if i would not be hitted multiple times each year by long existing systemd troubles i won't care but the current situation proves that systemd or the way Fedora integrates systemd does soemthing terrible wrong

Comment 73 Sergio Basto 2014-09-12 22:21:22 UTC

the patch in http://lists.freedesktop.org/archives/systemd-devel/2014-January/016519.html
is not applied even in master 

http://cgit.freedesktop.org/systemd/systemd/tree/src/core/dbus-service.c

don't have any entry SD_BUS_PROPERTY("WaitForMainPIDOnStop ...

Comment 74 Harald Reindl 2014-09-12 22:25:28 UTC

> is not applied even in master

cool, who cares about real world problems if they can include dhcpd, networkd and what not else instead cleanup and fix the xisting code.....

don't get me wring - i love systemd for many things but that careless attitude makes me terrible angry

Comment 75 Adam Williamson 2014-09-12 22:37:10 UTC

so, maybe the patch wasn't a good patch? anyone consider that possibility?

"cool, who cares about real world problems if they can include dhcpd, networkd and what not else instead cleanup and fix the xisting code....."

have you *seen* the number of commits on the 208-stable branch?

Comment 76 Harald Reindl 2014-09-12 22:44:25 UTC

yes, have *you* seen the date of http://lists.freedesktop.org/archives/systemd-devel/2014-January/016519.html

Comment 77 Sergio Basto 2014-09-13 01:20:08 UTC

the commit is this : 

http://cgit.freedesktop.org/systemd/systemd-stable/commit/?h=v208-stable&id=a2ca1771f9f8f2b8b5f08ab181e5cd9e6935f695

and is applied at least on systemd-208-21.fc20.x86_64 , which have:  

Build Date  : Qui 24 Jul 2014 14:37:54 WEST
Install Date: Qua 30 Jul 2014 23:07:01 WEST

I already test it on 2014-09-06 and it wasn't fixed. So, or bug is not totally fixed or we have yet another bug ...

Comment 78 Harald Reindl 2014-09-13 08:26:00 UTC

i doubt that GIT cherry picking works really for such a large project as well as i doubt that the cherry picks (evenif they match the source) always have the needeed changes in other code parts

there may be in the meantime border cases fixed somewhere else unintentional or not marked as "cherry" and we have a total different behavior with 208 and upstream in exactly that bordercases rendering a specific patch useless

a responsible software project has minor releases - period

Comment 79 Harald Reindl 2014-10-06 19:41:38 UTC

are you developers aware that this crap affects any machine with multiple users wether linger or not? it is a bad joke that after a upgrade to F20 machines serving a ton of sftp-chroot accounts via nss-mysql over 4 years now needs a hard reset or you have to wait for minutes until reboot happens

Comment 80 Sergio Basto 2014-10-06 19:52:11 UTC

Hi, Harald Reindl

I think in this bug report already fix 2 bugs 

1 - sssd. ( Start before systemd-user-sessions.service and nss-user-lookup.target ) 
2 - Temporary work around for slow shutdown due to unterminated user sessions.

so we have a 3rd bug (at least) to fix  which is users with linger 

loginctl disable-linger $USER 
and
loginctl disable-linger root 

seems that workaround this bug 


So I think it is time to open a new bug ... 

and wait 2 minutes to shutdown is not the end of the world .

Comment 81 Harald Reindl 2014-10-06 19:56:16 UTC

if you would have read me comment you would see that "disable-linger" is not a workaround (well, i was also the one ruled out that disable-linger in some cases makes things better)

there is a underlying design bug which was not there in F19

> and wait 2 minutes to shutdown is not the end of the world

bullshit if you have 80 sftp-user sessions waiting two minutes in production and that all worked fine long before systemd was part of the game and nobody needed these "user-session" processes at all - it's the result of *bloatware* instead keep things simple

Comment 82 Tom Horsley 2014-10-06 20:54:56 UTC

Its not 2 minutes. It is at least 5 minutes.

I can get rid of the "user session" nonsense by editing all the pam config files to remove the systemd user session entries, but unfortunately there is apparently more to user sessions because other things stop working when I do that (like no more access to sound devices).

Perhaps the combination of eradicating user sessions and re-enabling ConsoleKit would work?

Comment 83 Harald Reindl 2014-10-06 21:01:00 UTC

> Perhaps the combination of eradicating user sessions 
> and re-enabling ConsoleKit would work?

perhaps stop develop new features and do some QA upstream may help

* that problem was not there in F19
* all the user session code did not exist over years
* systemd was fine in F17/F18 and mostly in F19
* it gets worser with each fedora release

all that suer session stuff and the log flooding did not exist all the years and was not missed, fine if someone has a usecase for that stuff - but don't break left and right for some features not missed over decades

and no - that is not beeing unfriendly - frankly they can put whatever they want into systemd until it does not beark my long running installations, don't bloat the core OS and don't introduce un-needed and potential unsecure code - having such troubles over many months tells me that there are also enough security wise questionable thinhs nonody discovered or even worse discovered but not published

having such bugs over months mells like lost of control

Comment 84 Harald Reindl 2014-10-22 14:57:17 UTC

there are more rough edges in that context

why does "systemd --user" need to call "/usr/bin/kill" as reaction of the kill-command itself and why does it inherit the chroot of the sshd-process?

that below is a system-session triggered by SSHD with openssh-chroot in combination with pam-mysql - obviously the user-manager inherits the sshd-chroot and so can't call the kill command

[root@sftp:~]$ ps aux | grep 29657
**** 29657  0.0  0.4  46580  4288 ?        Ss   Okt17   0:00 /usr/lib/systemd/systemd --user
root     31532  0.0  0.2 112684  2160 pts/0    S<+  16:51   0:00 /usr/bin/grep --color 29657

[root@sftp:~]$ kill 29657

[root@sftp:~]$ cat messages
Oct 22 16:51:48 sftp systemd: Failed at step CHDIR spawning /usr/bin/kill: No such file or directory
Oct 22 16:51:48 sftp systemd: systemd-exit.service: main process exited, code=exited, status=200/CHDIR
Oct 22 16:51:48 sftp systemd: Failed to start Exit the Session.
Oct 22 16:51:48 sftp systemd: Dependency failed for Exit the Session.
Oct 22 16:51:48 sftp systemd:
Oct 22 16:51:48 sftp systemd: Unit systemd-exit.service entered failed state.

Comment 85 Sergio Basto 2014-10-22 15:43:00 UTC

(In reply to Tom Horsley from comment #82)
> Its not 2 minutes. It is at least 5 minutes.
> I can get rid of the "user session" nonsense by editing all the pam config
> files to remove the systemd user session entries, but unfortunately there is
> apparently more to user sessions because other things stop working when I do
> that (like no more access to sound devices).
> 
> Perhaps the combination of eradicating user sessions and re-enabling
> ConsoleKit would work?

I think I read this for the first time and this is other issue , I just have a delay at most 60 seconds on shutdown .  
Like Harald Reindl seems to be in a context of several users , so please open another bug report and provide a way to reproduce the problem , if you can . 

Thanks,

Comment 86 Alves 2014-11-11 01:05:43 UTC

This bug is till present in Fedora 21
This is a brand new, nothing installed, LXC container. Originally from Fedora 20, I did yum-upgrade it. When I do poweroff, it takes at leas 1.5 minutes. 

[  OK  ] Stopped LSB: Bring up/down networking.
[    **] A stop job is running for User Manager for UID 0 (29s / 1min 30s)

Does anybody have a workaround? This issue shows that it was never properly fixed in Fedora 19, 20 and now 21. It also means that Centos 7 suffers from the same defect. Sytemd is not completely understood at this point, and people and companies are migrating to competing distributions which have not incorporated systemd.

Comment 87 Alves 2014-11-11 01:19:20 UTC

Also, this "workaround"
loginctl disable-linger root 
does not help a bit.

Comment 88 Tom Horsley 2014-11-11 02:16:50 UTC

I'm still using my script from comment 23 to shutdown, which seems to work most of the time (though I've tweaked it some since the version attached to this bug). I don't think centos 7 has this problem. I believe they cutoff systemd for that release before the user manager was added (which is when all this nonsense started).

Comment 89 Alves 2014-11-11 03:34:02 UTC

do you mind posting the script from comment #23? It does not work for me at all.
The script does a "print reboot" at its last line, but the reboot does not follow.
This is an LXC container so maybe something is missing.

Comment 90 Adam Williamson 2014-11-11 07:27:04 UTC

Alves: well, this is now a pretty big and messy bug report, it's difficult to talk about 'this bug'. There are probably three or four different bugs if you look at all the comments and symptoms and workarounds, and it's quite hard for a developer to make a dent in.

It would help if someone could do some really specific diagnosis on at least one of the bugs, but no-one has, yet.

Comment 91 antonio montagnani 2014-11-11 07:53:14 UTC

Adam: I am the original reporter (Last April).I think that nobody worked seriously on this bug (we should include also some report of time necessary to solve a bug between report and solution as test of quality of maintainers), so there is no driving force to give to users any light how to debug it.....waiting for any useful suggestion. In any case now bug shows up less frequently, that is also an additional problem

Comment 92 Tom Horsley 2014-11-11 11:11:23 UTC

(In reply to Alves from comment #89)
> do you mind posting the script from comment #23? It does not work for me at
> all.
> The script does a "print reboot" at its last line, but the reboot does not
> follow.
> This is an LXC container so maybe something is missing.

It prints a bunch of commands which need to be piped into a root shell, which is why the comment talks about setting up an alias to run the script and pipe to the shell.

Comment 93 Michael Catanzaro 2014-11-11 16:38:51 UTC

(In reply to Adam Williamson (Red Hat) from comment #90)
> Alves: well, this is now a pretty big and messy bug report, it's difficult
> to talk about 'this bug'. There are probably three or four different bugs if
> you look at all the comments and symptoms and workarounds, and it's quite
> hard for a developer to make a dent in.

I'll add one more: I've been affected by "this bug" for a while, and until very recently I had only one user account on my system.

Comment 94 Alves 2014-11-11 17:01:58 UTC

I had to abandon Fedora 21. It has a second bug, maybe from the LXC template. SSH does hang after "entering interactive", that is, after authentication. I could not make it work. Again, I found hundreds of people with similar issues. Most of them suggested to set "UseDNS no", but that did not help, and my name resolution was fine, so that is not the issue. This is till very experimental software, and in reality it seems that Red Hat could care less about Fedora. Centos 7 also has some bugs that have not been fixed. Red Hat is a shame.

Comment 95 Sergio Basto 2014-11-11 20:10:12 UTC

(In reply to Tom Horsley from comment #88)
> I believe they cutoff
> systemd for that release before the user manager was added (which is when
> all this nonsense started).

we should see if we can cutoff user manager of systemd , or something to fix this problem .

Comment 96 Harald Reindl 2014-11-11 20:15:19 UTC

agreed - "user manager" don't provide anything useful in most setups expect the problems mentioned in that bugreport and additional processes for every crond or ssh-user 

that can be a lot of user-sessions in case nss-mysql for chrooted mass-user setups replacing FTP and *exactly* that ones are problematic

that below just stinks - it's a user-manager process related to a ordinary sftp-session with the user provided by nss-mysql - not a single problem over years until F20

systemd as core-component needs *minor releases* - period
_______________________________________________

[root@sftp:~]$ kill 29657

[root@sftp:~]$ cat messages
Oct 22 16:51:48 sftp systemd: Failed at step CHDIR spawning /usr/bin/kill: No such file or directory
Oct 22 16:51:48 sftp systemd: systemd-exit.service: main process exited, code=exited, status=200/CHDIR
Oct 22 16:51:48 sftp systemd: Failed to start Exit the Session.
Oct 22 16:51:48 sftp systemd: Dependency failed for Exit the Session.
Oct 22 16:51:48 sftp systemd:
Oct 22 16:51:48 sftp systemd: Unit systemd-exit.service entered failed state.

Comment 97 Tom Horsley 2014-11-11 22:45:28 UTC

(In reply to Sergio Monteiro Basto from comment #95)

> we should see if we can cutoff user manager of systemd , or something to fix
> this problem .

I did manage to eradicate user sessions by editing all the pam config files that include a pam_systemd.so file.

Unfortunately, all that does is disclose that in addition to user sessions, pam has apparently incorporated all the ConsoleKit functionality, so things like sound stop working when you disable pam user sessions.

I never tried re-enabling ConsoleKit after disabling systemd user sessions, that might work if you want to experiment, but I'm sure it will be futile in the end as the systemd fungus grows over yet more things that can't be turned back on.

Comment 98 Harald Reindl 2014-11-11 22:48:07 UTC

frankly systemd and Fedora should offer a option "don't use user sessions" to enable the same behavior as in F19 and all the years before

nobody can tell me after 8 years only Linux in any context without "systemd user sessions" and nothing missed that this part is now mandatory for real good reasons

Comment 99 Ahmad Samir 2014-11-12 17:28:31 UTC

(In reply to Adam Williamson (Red Hat) from comment #90)
> Alves: well, this is now a pretty big and messy bug report, it's difficult
> to talk about 'this bug'. There are probably three or four different bugs if
> you look at all the comments and symptoms and workarounds, and it's quite
> hard for a developer to make a dent in.
> 
> It would help if someone could do some really specific diagnosis on at least
> one of the bugs, but no-one has, yet.

I could be wrong but AFAIU, it's this issue: http://lists.freedesktop.org/archives/systemd-devel/2014-January/016519.html

Comment 100 Sergio Basto 2014-11-12 17:54:42 UTC

(In reply to Ahmad Samir from comment #99)

all that patch are in Fedora since Fev, systemd is pretty close to upstream , the bug was not fixed by that . also mention this in this ticket 


(In reply to Adam Williamson (Red Hat) from comment #90)
> Alves: well, this is now a pretty big and messy bug report, it's difficult
> to talk about 'this bug'. There are probably three or four different bugs if
> you look at all the comments and symptoms and workarounds, and it's quite
> hard for a developer to make a dent in.

Adam , 
We already fix 2 or 3 bugs in this report , but the original bug on comment #0, still present , it is not a big deal for me , because we just have to wait more 30 seconds to shutdown , and I don't shutdown much , (I use suspend)   

But I think we should nominate this bug as blocker bug for F21 release, is the only way that I see, to fix this.

This is already affect RHEL 7 and Centos 7 as reported here , so seems to me pretty serious and also affects our reputation . 

Tks,

Comment 101 procaccia 2014-11-12 21:09:47 UTC

as reported in comment 51, we have +100 stations runing F20 in dual boot with Windows7. Our student (+2000 engineering school users) reboot from one system to the other several times a day. So this bug is very painful. Users end up hit the power on switch to reboot from f20 to windows. Sometimes that genrates fsck at reboot or even worst, hardware domages .
We hope we'll get a correction to this bug, if not, at least a workaround.
thanks

Comment 102 Sergio Basto 2014-11-12 23:03:59 UTC

(In reply to Tom Horsley from comment #97)
> (In reply to Sergio Monteiro Basto from comment #95)
> 
> > we should see if we can cutoff user manager of systemd , or something to fix
> > this problem .
> 
> I did manage to eradicate user sessions by editing all the pam config files
> that include a pam_systemd.so file.
> 
> Unfortunately, all that does is disclose that in addition to user sessions,
> pam has apparently incorporated all the ConsoleKit functionality, so things
> like sound stop working when you disable pam user sessions.
> 
> I never tried re-enabling ConsoleKit after disabling systemd user sessions,
> that might work if you want to experiment, but I'm sure it will be futile in
> the end as the systemd fungus grows over yet more things that can't be
> turned back on.

and shutdown works ?

Comment 103 Tom Horsley 2014-11-12 23:19:09 UTC

Yep, shutdown always worked fine with the user manager stuff disabled. I also never saw this problem until the user manager first showed up in fedora, so it is definitely related to the user sessions. The script I run to shutdown my system kills off all the user manager tasks (among others) before doing the shutdown, and I never have problems with hangs when I use that script.

Comment 104 Michael Catanzaro 2014-11-13 01:15:33 UTC

(In reply to Sergio Monteiro Basto from comment #102)
> and shutdown works ?

Not for me, but I have no such fancy scripts.

Comment 105 Ahmad Samir 2014-11-13 05:06:57 UTC

(In reply to Sergio Monteiro Basto from comment #100)
> (In reply to Ahmad Samir from comment #99)
> 
> all that patch are in Fedora since Fev, systemd is pretty close to upstream
> , the bug was not fixed by that . also mention this in this ticket 
> 

That patch is not in upstream git, nor do I find it in Fedora git, I grep'ed for WaitForMainPIDOnStop in an F20 systemd git checkout, no results. So if you could point me to the upstream commit, that'd be great. (Still, I am not saying I am 100% sure it's the underlaying issue behind this bug report).

Comment 106 Sergio Basto 2014-11-13 08:20:16 UTC

(In reply to Ahmad Samir from comment #105)
> (In reply to Sergio Monteiro Basto from comment #100)
> > (In reply to Ahmad Samir from comment #99)
> > 
> > all that patch are in Fedora since Fev, systemd is pretty close to upstream
> > , the bug was not fixed by that . also mention this in this ticket 
> > 
> 
> That patch is not in upstream git, nor do I find it in Fedora git, I grep'ed
> for WaitForMainPIDOnStop in an F20 systemd git checkout, no results. So if
> you could point me to the upstream commit, that'd be great. (Still, I am not
> saying I am 100% sure it's the underlaying issue behind this bug report).

you are are about some facts, the patch in ML , was not apply to systemd git , it was apply other patch comment #77 

http://cgit.freedesktop.org/systemd/systemd-stable/commit/?h=v208-stable&id=a2ca1771f9f8f2b8b5f08ab181e5cd9e6935f695

the patch posted and not accepted or applied  by systemd team , I not test it .

Comment 107 Ahmad Samir 2014-11-13 10:35:23 UTC

I was talking about this patch http://lists.freedesktop.org/archives/systemd-devel/2014-January/016571.html

The workaround/patch in c#77 didn't work, or only mitigates the issue but doesn't fix it.

Comment 108 Sergio Basto 2014-11-13 13:28:42 UTC

(In reply to Ahmad Samir from comment #107)
> I was talking about this patch
> http://lists.freedesktop.org/archives/systemd-devel/2014-January/016571.html
> 
> The workaround/patch in c#77 didn't work, or only mitigates the issue but
> doesn't fix it.

yes, we need to check if all patch works , sorry for my mistake .

Comment 109 Harald Reindl 2014-11-13 13:51:15 UTC

> The script I run to shutdown my system kills 
> off all the user manager tasks

impossible to work on the sftp-server here because in that context systemd is *buggy like hell* and tries to span /usr/bin/kill while the sftp-sessions is using openssh's sftp-native-chroot and so there is no /usr/bin

WHY do systemd need to spawn kill as rection of a SIGKILL to the PID of "/usr/lib/systemd/systemd --user"

[root@sftp:~]$ kill 29657

[root@sftp:~]$ cat messages
Oct 22 16:51:48 sftp systemd: Failed at step CHDIR spawning /usr/bin/kill: No such file or directory
Oct 22 16:51:48 sftp systemd: systemd-exit.service: main process exited, code=exited, status=200/CHDIR
Oct 22 16:51:48 sftp systemd: Failed to start Exit the Session.
Oct 22 16:51:48 sftp systemd: Dependency failed for Exit the Session.
Oct 22 16:51:48 sftp systemd:
Oct 22 16:51:48 sftp systemd: Unit systemd-exit.service entered failed state.

Comment 110 Tom Horsley 2014-11-13 15:03:09 UTC

Created attachment 957196 [details]
updated reboot script

With all the activity here, I figured I should update the reboot script I use with the latest varsion. This one also solves a completely separate problem with NFS timeouts causing insanely slow shutdown by doing umount -l on any system that seems to be mounted, but which I can't currently ping.

Comment 111 Sergio Basto 2014-11-20 01:28:10 UTC

Created attachment 959207 [details]
0900-allow_stop_jobs_to_be_killed_during_shutdown.patch

Comment 112 Sergio Basto 2014-11-20 04:25:37 UTC

(In reply to Sergio Monteiro Basto from comment #111)
> Created attachment 959207 [details]
> 0900-allow_stop_jobs_to_be_killed_during_shutdown.patch

I tried this patch on top of systemd-208-28.fc20.x86_64 and doesn't fix it and stopped on "a stop job is running ... of user " 30 / 60 seconds, the funny thing was systemd stop messages that changed, it entered in a loop of 3 messages 
(1/3) "a stop job is running ... of user 0"
(2/3) "a stop job is running ... of user 500"
(3/3) "a stop job is running ... of user sergio"

I did a build where 
https://copr.fedoraproject.org/coprs/sergiomb/systemd-allow_stop_jobs_to_be_killed_during_shutdown/

if you want to try, you may need i386 and x86_64 packages so you may download from here : http://copr-be.cloud.fedoraproject.org/results/sergiomb/systemd-allow_stop_jobs_to_be_killed_during_shutdown/fedora-20-i386/systemd-208-29.sb.fc20/

and do a local install with:
yum install systemd-libs-208-29.sb.fc20.i686.rpm  libgudev1-208-29.sb.fc20.i686.rpm --nogpg

Comment 113 Tethys 2015-01-05 23:41:57 UTC

Still a problem with Fedora 21 :-(

macbeth:~% rpm -q systemd
systemd-216-12.fc21.x86_64

Comment 114 Harald Reindl 2015-01-05 23:49:49 UTC

it's a shame- since https://bugzilla.redhat.com/show_bug.cgi?id=1088619#c109 affectes every reboot i did set an alias in the meantime - what the hell...

alias reboot='/usr/bin/systemctl stop crond.service; /usr/bin/sleep 3; /usr/bin/systemctl stop mysqld.service; /usr/bin/touch /forcefsck; /usr/bin/sync; /usr/sbin/reboot -f'

Comment 115 Sergio Basto 2015-02-25 16:30:19 UTC

I don't see this bug anymore , I got this on my laptop  but after some days after my last comment (2014-11-20 ) this bug was fixed, for me . 
The strange was not one updated of systemd that fix it .
Also doesn't have this problem with F21 .
May we close this bug ?

Comment 116 Harald Reindl 2015-02-25 16:34:06 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1088619#c109
https://bugzilla.redhat.com/show_bug.cgi?id=1185278

Comment 117 Fedora End Of Life 2015-05-29 11:35:48 UTC

This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 118 Fedora End Of Life 2015-06-29 20:08:38 UTC

Fedora 20 changed to end-of-life (EOL) status on 2015-06-23. Fedora 20 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 119 John 2015-11-13 18:03:21 UTC

This cluster@#^@! is back in Fedora 23, and it is impossible to fix.

I've disabled a million useless services,

I've masked all the crud services that systemd thinks are too important to be disabled.

I've removed pam_systemd.so from every file under /etc/pam.d

I've removed /usr/lib/systemd/system/user@.service
I've removed /usr/lib/systemd/system/user.slice

All this has done is push the problem further back, but it's always there. And somehow there is still garbage in the logs about user.slice, even though i've removed that file. So anyway, now the delay is at kdm.service:

Nov 14 04:00:13 Il-Duce systemd[1]: Removed slice user.slice.
Nov 14 04:00:13 Il-Duce systemd[1]: Stopping user.slice.
Nov 14 04:01:43 Il-Duce systemd[1]: kdm.service: State 'stop-final-sigterm' timed out. Killing.
Nov 14 04:01:43 Il-Duce systemd[1]: Stopped The KDE login manager.
Nov 14 04:01:43 Il-Duce audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=kdm comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

Systemd was supposed to provide control over what is going on, instead, it sits there, like an unmovable, obstinate, silent, inscrutable #!@$!@%#.

Every time i shutdown, it's the same, I can say goodbye to a minute and a half of my life, thanks to systemd. And for what? I'm apparently waiting for systemd which is waiting for something that has already terminated, or never started in the first place, and systemd CANNOT EVEN TELL ME WHAT IT IS REALLY, ACTUALLY WAITING FOR.

This mess is completely unacceptable.

Comment 120 John 2015-11-13 18:11:09 UTC

I notice Poettering isn't even CC'ed in on this bug. Obviously he doesn't give a #@&*$^.

Comment 121 John 2015-11-13 18:13:32 UTC

If this is how system administrators are supposed to be managing their production servers then god #&@#^@ing help us.

Comment 122 Adam Williamson 2015-11-13 18:22:20 UTC

systemd-maint is an alias read by all the systemd maintainers. Please stop swearing and yelling, this is a bug tracker, and it doesn't help anyone solve anything. The sole pertinent information in the last three comments is "I have some kind of problem". That's all we can derive from it. You didn't even specify what 'this cluster@#^@!' *is*, in your case; IIRC at least three apparently different bugs have been reported here.

Comment 123 John 2015-11-13 18:29:49 UTC

If other people are not seeing this in fedora 23, then i can only assume the problem is caused by something leftover from upgrading through older fedoras.

So where is the systemd infrastructure to actually identify the root cause of the problem? There is nothing. All we have is a multitude of spaghettified targets multiplying into the hundreds...

ll /usr/lib/systemd/system/ | wc -l
422

How is this manageable in real way?

Comment 124 Adam Williamson 2015-11-13 18:37:10 UTC

I have no idea if other people are seeing 'this' when you still haven't explained what 'this' is, exactly, for you. Even if I assume you're seeing some kind of 'stop job' message, it doesn't tell us much; that's a fairly generic message, the context you're seeing it in and the exact message you're seeing are important. At minimum we'd need to see your journal ideally with a *non* messed-with configuration.

Comment 125 John 2015-11-13 18:45:48 UTC

(In reply to awilliam from comment #122)
> You didn't
> even specify what 'this cluster@#^@!' *is*, in your case; IIRC at least
> three apparently different bugs have been reported here.

I've stated very clearly that the problem is a stop job that holds up boot for 1 minute 30 secs. As I've disabled more and more of the useless systemd crud running on my system, it's been pushed back through session-1.scope, user-@1000, and now it's at the kdm.service.

I've provided the journalctl for the relevant period, showing the failure at kdm.service.

I've wasted nearly 8 hours of my life on this already, today, and you tell me I haven't even specified what the problem is? How specific do you expect me to be? Do you expect me to solve this bug which has eluded systemd "developers" for more than a year and a half?

There have been claims that several of the bugs with similar symptoms have been fixed, so, it would probably be safe to assume that my problem is the one that HASN'T been fixed.

If you had any knowledge of this bug, you would realise that 99% pf the problem is that NOONE KNOWS WHAT THE SPECIFIC PROBLEM IS, because the mess that is systemd hides the root cause completely.

If you cannot derive any more information from the 120+ comments in this bug, than: stupid users "have some kind of problem", then that's your problem, not mine.

I would appreciate it if you would keep silent if you don;t have anything productive to contribute towards an actual resolution of this bug.

Comment 126 John 2015-11-13 18:49:51 UTC

correction to my comment above - should have said stop job that holds up shutdown.

Comment 127 John 2015-11-13 18:51:15 UTC

(In reply to awilliam from comment #124)
> I have no idea...

perhaps you should've just stopped there.

Comment 128 Tom Horsley 2015-11-13 18:51:59 UTC

Try this out. My fedora systems have been rebooting instantly using the scheme described here (need to build a few programs and setup some aliases, but you save enough time the first time you reboot to make up for it :-).

http://tomhorsley.com/game/punch.html

Comment 129 Adam Williamson 2015-11-13 19:13:15 UTC

"I've stated very clearly that the problem is a stop job that holds up boot for 1 minute 30 secs."

No you didn't. Your comment says nothing about any stop job.

The word 'stop' does not appear anywhere in https://bugzilla.redhat.com/show_bug.cgi?id=1088619#c119 , nor does anything about 1 minute 30 seconds.

"As I've disabled more and more of the useless systemd crud running on my system, it's been pushed back through session-1.scope, user-@1000, and now it's at the kdm.service."

See, when you yell and scream and fulminate and foam about 'useless systemd crud' and swear and pointlessly bring Lennart personally into your comments, it gives me no confidence that any of your 'debugging' steps was actually doing anything useful as opposed to confusing the issue. So it really doesn't seem like a good idea for me to try and debug this based on the tiny scrap of journal you posted in #c119 with no context nor any useful information on what you actually did to reach that point - no, "I've disabled a million useless services, I've masked all the crud services that systemd thinks are too important to be disabled." does not count as 'useful information'.

"I've provided the journalctl for the relevant period, showing the failure at kdm.service."

You provided a tiny extract with no context at all, *after* you admittedly performed all sorts of drastic surgery on your system but without telling us in any useful detail at all what you actually did. That just isn't any use at all for debugging purposes.

I would love to have 'something productive', but it is impossible for me or anyone else to make any kind of progress on figuring out what's actually happening in your case as long as you don't provide any useful information on it.

Comment 130 Adam Williamson 2015-11-13 19:15:46 UTC

Sorry, I meant to omit the paragraph "The word 'stop' does not appear anywhere in https://bugzilla.redhat.com/show_bug.cgi?id=1088619#c119 , nor does anything about 1 minute 30 seconds." but left it in by mistake. Your text does include the word 'stop', but it does not in fact say anything specifically about a stop job. And there's a passing reference to 'a minute and a half of my time' at the end, indeed, where it's easy to miss.

Comment 131 Tom Horsley 2015-11-13 19:16:24 UTC

The fact that my reboot script/program actually solves my problems leads me to this conclusion from experimental evidence:

1. systemd believes that "user daemons" will go away.

2. "user daemons" in fact, never go away. I have users defined as both local users and as NIS users. Any one of these who uses ssh to get into my machine leaves behind a systemd user daemon which never ever goes away.

3. systemd shutdown then waits for all users to be logged out by waiting for all the user daemons to go away. That never happens. The shutdown is delayed 5 minutes for systemd to timeout.

So here you have my theory about the actual source of the problem. Do I have proof? No. Do I have highly suspicious evidence? Yes. If I kill the user daemons first, the "stop job for user" shutdown delay doesn't happen.

Comment 132 Adam Williamson 2015-11-13 19:18:53 UTC

Tom: what you have sounds very plausible for *your* case, sure. The point I'm trying to make is that we can't assume that anyone who has some kind of issue which can be described as 'it takes a long time to shut down and there's some text with the word stop in it' has the *same* issue as any other person in that situation. Even if we know some kind of 'stop job', specifically, is involved, that's a fairly generic concept in systemd, there could be all kinds of *different* scenarios where some 'stop job', somehow, gets stuck.

This is amply illustrated by the fact that *some* people who've posted here aren't seeing problems any more, and some people *are* still seeing problems. If there was only one issue, that obviously would not be the case.

Comment 133 Sergio Basto 2015-11-13 19:36:37 UTC

(In reply to John from comment #125) 
> I've provided the journalctl for the relevant period, showing the failure at
> kdm.service.

don't use kdm.service , I believe that is a KDE problem and not a systemd problem, like I said in  Comment 115 , problem have been solved without modifying/update systemd package . 

you should understand personal attacks and other kind of comments like that don't help in solving the problem. 

bye.

Comment 134 John 2015-11-13 19:37:50 UTC

You wouldn't believe it Tom, but, with the state my system is in now... even your prog/alias didn't do it... 

I've disabled pam_systemd.so in everything under /etc/pam.d so i don't think I'm running any user junk now, so somehow instead of fixing the problem this has pushed it out to kdm, which runs as root, so won't be killed by your prog.

It's really late here now and this is now the 2nd night I've lost sleep over this. I'll carry on tomorrow. I've been avoiding systemd for years, because i knew it was going to be a mess like this. But this !@#! has woken the dragon.

Comment 135 John 2015-11-13 19:40:38 UTC

(In reply to awilliam from comment #130)
> Sorry, I meant to omit the paragraph "The word 'stop' does not appear
> anywhere in https://bugzilla.redhat.com/show_bug.cgi?id=1088619#c119 , nor
> does anything about 1 minute 30 seconds." but left it in by mistake. Your
> text does include the word 'stop', but it does not in fact say anything
> specifically about a stop job. And there's a passing reference to 'a minute
> and a half of my time' at the end, indeed, where it's easy to miss.

Why would i even be in here responding to this bug, if i did not have the problem being investigated in this bug?

I said the bug was back, thereby referring to this stop job issue, i gave you the snippet of log which clearly shows a 1 minute 30 second stall, etc. It was pretty clear.

Comment 136 John 2015-11-13 19:46:32 UTC

(In reply to awilliam from comment #132)
> Tom: what you have sounds very plausible for *your* case, sure. The point
> I'm trying to make is that we can't assume that anyone who has some kind of
> issue which can be described as 'it takes a long time to shut down and
> there's some text with the word stop in it' has the *same* issue as any
> other person in that situation. Even if we know some kind of 'stop job',
> specifically, is involved, that's a fairly generic concept in systemd, there
> could be all kinds of *different* scenarios where some 'stop job', somehow,
> gets stuck.
> 
> This is amply illustrated by the fact that *some* people who've posted here
> aren't seeing problems any more, and some people *are* still seeing
> problems. If there was only one issue, that obviously would not be the case.

I think your comment here amply illustrates the real problem, which is that systemd doesn't provide enough information on what it is waiting for.

Even now, I've provided you with all the information systemd has provided me. It stops at that kdm.service. Kdm has exited, so what is it REALLY waiting for? How am i supposed to know, if systemd doesn't tell me?

Comment 137 John 2015-11-13 19:54:06 UTC

(In reply to Sergio Monteiro Basto from comment #133)
> don't use kdm.service , I believe that is a KDE problem and not a systemd
> problem, like I said in  Comment 115 , problem have been solved without
> modifying/update systemd package . 

Sergio, I don't think the problem is kdm.service. At first it looked like Plymouth, so i disabled and then masked that, then it moved on to something else, and something else again, user, user.slice, now I've disabled all of those, it's at kdm,service. So I don't think kdm.service is the problem. The problem is systemd. But I'll switch from kdm to something else and see how i go... though I am not optimistic.

Comment 138 John 2015-11-13 20:19:15 UTC

Wow.

I switched to GDM, and didn't even get an X. So i booted into runlevel 1 and switched to LightDM. And that has done the trick. No delay on shutdown. Thanks for prodding me into that Sergio. 

If I now re-enable the user sessions and other stuff I've disabled, and the problem does not reappear, and it was a dodgy kdm.service this whole time - then I still hold systemd to blame, because the lengths I had to go to before kdm.service appeared as a potential issue, were ridiculous. There was no sign it was related to KDM until i crippled practically every other service on my system.

Comment 139 John 2015-11-13 22:23:29 UTC

I've now re-enabled all the systemd user session stuff, and tested other display managers and the result is this: 

xdm, lightdm, lxdm, sddm and kdm displaymanagers all hang on shutdown - but only when my window manager is enlightenment, which is what i've been using recently.

When i switch to another window manager, I've so far found that xdm and lightdm are both fine when used with window manager from xfce or from kde (plasma). 

I haven't tested sddm, lxdm, or kdm with window manager from xfce or kde, but i expect they'll be fine.


So that's great isn't it. After all this, the problem was enlightenment - and the fact that systemd that makes it utterly impossible to know what's going on a linux system these days.

Of course noone involved with systemd will ever admit that systemd's lack of control and information was the major difficulty in diagnosing this - but it's a fact. And a disgrace.

Comment 140 John 2015-11-13 22:38:15 UTC

Systemd says "A stop job is running"

"A stop job is running"

I'll say that again.

"A stop job is running"

WTF use is that?

And journalctl points to kdm.service, user.slice, user-1000.slice, session-1.scope, and several things before that.

But the problem turns out to be enlightenment.

And yet I'm the one who gets lectured to, with: "You didn't even specify what 'this cluster@#^@!' *is*, in your case"

What a #@#!@@!&%@ joke.

The problem here is systemd.

Comment 141 John 2015-11-13 22:45:36 UTC

systemd - the system that saves you 20 seconds on boot, and then robs you of 18 hours trying to get a system to shut down cleanly when one god-forsaken application misbehaves.

Comment 142 Samuel Sieb 2015-11-13 23:19:50 UTC

Ok, we get the message that you're unhappy with systemd.  Go write a blog post or something, but stop ranting here.  It's not helping anyone and annoying the large list of people CCed to this *closed* bug.  If you still have a problem, file a new bug.

Comment 143 Krystian 2016-04-23 12:44:01 UTC

Well...I've encountered this bug on Fedora 23. It appeared from nowhere - just like that. It's my second approach to F23 and again, this bug happened.

Nothing was modified, doing as root etc. It just happened and haunting me from Fedora 20.

Comment 144 Adam Williamson 2016-04-27 19:33:05 UTC

There is no "this bug". There were probably a dozen different bugs in this report. Once again, the fact that two people both see a slow shutdown or reboot and a message with the word "stop" in it does not mean they are seeing the same bug. Please just report your bug separately and we will duplicate them where appropriate. Only follow up on someone else's report if they've described their case in detail and yours definitely sounds the same.

Comment 145 Krystian 2016-04-27 20:55:30 UTC

My bug is this same as author. I'm fighting with "a stop job is running for Session 1 of user xyz" from Fedora 20 to F23.

You have right - I will report my bug separately or search for other's active bugs about this.

Thank You for response.

Comment 146 Adam Williamson 2016-04-27 22:01:08 UTC

again, this is not enough to decide that bugs are the same. user sessions are huge things and any number of things could go wrong with stopping them. the 'user session' is basically in control of starting up and stopping your entire desktop.