Description of problem: Over 1 minute hang on reboot/poweroff of a cleanly installed system. Version-Release number of selected component (if applicable): Fedora-20-Beta-TC6-x86_64-DVD.iso How reproducible: Uncertain. Installed systems seem to frequently but not always have this problem. This particular VM instance always reboots slow. Steps to Reproduce: 0. qemu/kvm, qcow2, minimal package set, installed to LVM Thin Provision using guided partitioning. Booted to the newly installed system. 1. reboot or poweroff Actual results: > 1 minute hang, then reboot. Expected results: Faster reboot. Additional info: Rebooting and looking at the journal there are suspicious items in a row: 19:51:05 Started Restore /run/initramfs. 19:52.35 user stopping timed out. Killing. 19:52:35 Stopped User Manager for 0. 19:53:35 United user entered failed state. sync && reboot -f reboots instantly
Created attachment 816669 [details] shutdown-log debug Followed this: http://freedesktop.org/wiki/Software/systemd/Debugging/#index2h1a
Created attachment 818926 [details] virsh console output systemd.log_level=debug etc. I'm getting this from both Live Desktop Beta TC6 and RC2, but it's "A stop job is running for User Manager for 1000user stopping timed out. Killing." Attaching virsh console log.
Same issue: 2 minute hang when trying to shutdown/reboot. Fedora 20 Beta install onto a kvm. It happens frequently, although not 100% of the time. snippet of my shutdown log (will attach full log): [ 65.425420] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected( ) on /org/freedesktop/DBus/Local [ 153.951503] systemd[1]: user stopping timed out. Killing. [ 153.955456] systemd[1]: user changed stop-sigterm -> stop-sigkill [ 153.955565] systemd[1]: Received SIGCHLD from PID 745 (systemd). [ 153.955614] systemd[1]: Got SIGCHLD for process 745 (systemd) [ 153.955661] systemd[1]: Child 745 died (code=killed, status=9/KILL) [ 153.955664] systemd[1]: Child 745 belongs to user [ 153.955673] systemd[1]: user: main process exited, code=killed, status=9/KILL [ 153.955680] systemd[1]: user changed stop-sigkill -> failed [ 153.955971] systemd[1]: Job user/stop finished, result=done
Created attachment 824690 [details] hang on shutdown log
With systemd-208-8.fc20, and kernel-3.11.10-300.fc20 it has become more transient. But when it happens the hang is longer, over 1 minute, maybe 2 minutes. Before the reboot, from that shell: # loginctl user-status root root (0) Since: Wed 2013-12-11 01:53:47 EST; 3min 33s ago State: active Sessions: 1 ├─session-1.scope │ ├─1065 login -- root │ └─1183 -bash │ └─1216 loginctl user-status root │ └─1217 less └─user ├─1180 /usr/lib/systemd/systemd --user └─1185 (sd-pam) # loginctl show-user 0 UID=0 GID=0 Name=root Timestamp=Wed 2013-12-11 01:53:47 EST TimestampMonotonic=21498632 RuntimePath=/run/user/0 Service=user Slice=user-0.slice State=active Sessions=1 IdleHint=no IdleSinceHint=0 IdleSinceHintMonotonic=0 # cat /run/systemd/users/0 # This is private data. Do not parse. NAME=root STATE=active RUNTIME=/run/user/0 SERVICE=user SLICE=user-0.slice REALTIME=1386744827536712 MONOTONIC=21498632 SESSIONS=1 SEATS=seat0 ACTIVE_SESSIONS=1 ONLINE_SESSIONS=1 ACTIVE_SEATS=seat0 ONLINE_SEATS=seat0
Created attachment 837471 [details] shutdown-log.txt So this is happening a majority of the time with a clean installation of Fedora 20 final RC1.1 DVD ISO plus a 3.13.x kernel: 3.13.0-0.rc3.git0.2.fc21.x86_64 systemd-208-9.fc20.x86_64 All I'm doing is booting to multi-user.target and ssh'ing into the machine once, and after some time I reboot and get this hang fairly reliably. This looks suspicious: [ 777.611981] systemd[1]: Failed to destroy cgroup /user.slice/user-0.slice/session-1.scope: Device or resource busy And big gaps between entries even though systemd is in debug mode: [ 778.439765] systemd[1]: Got D-Bus request: org.freedesktop.DBus.Local.Disconnected() on /org/freedesktop/DBus/Local [ 804.091526] perf samples too long (2527 > 2500), lowering kernel.perf_event_max_sample_rate to 50000 [ 853.846103] systemd[1]: Accepted connection on private bus. And finally: 867.678101] systemd[1]: user stopping timed out. Killing.
*** Bug 1031239 has been marked as a duplicate of this bug. ***
Possible duplicate bug 1044602.
Fixed upstream in http://cgit.freedesktop.org/systemd/systemd/commit/?id=63966da.
Just FYI, I created a new user, and removed the initial temp user I made during the install, and hung for a very, very, long time when I rebooted with the message: A stop job is running for User Manager for 1000. (where 1000 was the UID of the user I deleted while I had the system up). This is on a newly installed Fedora 20 system (from the release DVD).
I have seen this error for uid 1001 (my uid) with a minimal install on a VirtualBox machine. So it's not related to X or any window manager. The only user related processes running were: root owned sshd: user [priv] user owned /usr/lib/systemd/systemd --user user owned (sd-pam) user owned sshd: user@pts/0 user owned -bash I also saw it hang for about a minute on "User Manager for 0".
This is ridiculous. Apparently if I merely ssh into my f20 system using an NIS defined user account, the next time I try to reboot (long after logging out on the ssh session), the system will hang with the stoopid "stop job for UID" of the UID of the NIS user I once logged in as. Is there some way to eradicate whatever "helpful" crap systemd is doing with these per-user things?
this bug is *really* a problem * if you are on a VT and reboot a F20 machine your VT is dead * if you reboot from knsole you need to close the window instead cursor up for re-connect after the few seconds for boot * and the *most critical* issue: if you are working with scripts to control a infrastructure with 20 machines and want to reboot them all with a shellscript containing reboot and sleep commands to reduce load on the host you simply can forget them - "reboot-backup-machines.sh" is useless with F20 and RHEL7
I haven't tried this yet, but someone who is more desperate than me might want to give it a shot: This page: https://wiki.archlinux.org/index.php/Systemd/User implies that these systemd user units are started by pam. Possibly if the lines that say: -session optional pam_systemd.so in various files under /etc/pam.d were commented out, the stupid systemd user instances would never be created and systemd wouldn't wait forever for them to stop.
you are completly wrong here you go: https://bugzilla.redhat.com/show_bug.cgi?id=1023788 see especially comment https://bugzilla.redhat.com/show_bug.cgi?id=1023788#c2 sshd does not close the connection properly before the real shutdown happens which results in the client tries to resume/keep-alive forever the network-manager centric is unacceptable, nobody on this planet needs NM on static configured machines and the world does not turn around mobile devices
systemd-208-11.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/systemd-208-11.fc20
systemd-208-11.fc20 has https://bugzilla.redhat.com/show_bug.cgi?id=1053315 as regression and does not fix https://bugzilla.redhat.com/show_bug.cgi?id=1023788
systemd-208-11.fc20 is unstable besides https://bugzilla.redhat.com/show_bug.cgi?id=1053315 - i guess on a desktop machine that could be a real problem, on a headless server it's only flooding logs and log-reportings Jan 15 05:11:01 testserver systemd[1]: Unit systemd-logind.service entered failed state. Jan 15 05:31:01 testserver systemd[1]: Unit systemd-logind.service entered failed state. Jan 15 05:51:01 testserver systemd[1]: Unit systemd-logind.service entered failed state. Jan 15 06:21:01 testserver systemd[1]: Unit systemd-logind.service entered failed state. Jan 15 06:41:01 testserver systemd[1]: Unit systemd-logind.service entered failed state. Jan 15 07:11:01 testserver systemd[1]: Unit systemd-logind.service entered failed state. Jan 15 08:01:01 testserver systemd[1]: Unit systemd-logind.service entered failed state. Jan 15 08:31:01 testserver systemd[1]: Unit systemd-logind.service entered failed state. Jan 15 10:31:01 testserver systemd[1]: Unit systemd-logind.service entered failed state.
Package systemd-208-11.fc20: * should fix your issue, * was pushed to the Fedora 20 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-208-11.fc20' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2014-0902/systemd-208-11.fc20 then log in and leave karma (feedback).
*why was this pushed to updates-testing* as the build is known to be broken this forces updates-testing users from now on use "yum --exclude=systemd\*,libgudev\* upgrade" :-(
Unpushed now. It don't think your issues were *caused* by changes in this update, but they probably *expose* them. I'll have to work on this a bit more.
thank you very much!
maybe that is also a sign that and what goes wrong fro someone knowing about systemd internals - at least "Reached target Sound Card" is questionable in case of a reload as well as "Mounted Configuration File System" systemd-208-9.fc20.x86_64 [root@srv-rhsoft:~]$ > /var/log/messages; systemctl daemon-reload; cat /var/log/messages Jan 17 14:27:26 srv-rhsoft systemd[1]: Mounted Configuration File System. Jan 17 14:27:26 srv-rhsoft systemd[1]: Reached target Sound Card. Jan 17 14:27:26 srv-rhsoft systemd[1]: Found device /sys/devices/virtual/block/md0. Jan 17 14:27:26 srv-rhsoft systemd[1]: Found device /dev/md0. Jan 17 14:27:26 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-name-localhost.localdomain:0. Jan 17 14:27:26 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-uuid-1d691642:baed26df:1d197496:4fb00ff8. Jan 17 14:27:26 srv-rhsoft systemd[1]: Found device /dev/disk/by-label/boot.
Well, this is clever. Apparently pulseaudio now depends on these systemd user daemons, so the technique from comment #14 does solve the interminable waiting on stop jobs at reboot, but it has the side effect of disabling sound :-(.
We have stop jobs running for users, stop jobs running for NFS mounts on other systems that are down and not responding, etc. How about making systemd listen for Ctrl-Alt-Delete during shutdown/reboot processing and just terminate the damn stop jobs it is waiting on?
i don't know if this would help, but at about 50 minutes of uptime, i've noticed this on /var/log/messages: Jan 25 11:01:01 localhost systemd[1]: Starting user-0.slice. Jan 25 11:01:01 localhost systemd[1]: Created slice user-0.slice. Jan 25 11:01:01 localhost systemd[1]: Starting User Manager for 0... Jan 25 11:01:01 localhost systemd[1]: Starting Session 2 of user root. Jan 25 11:01:01 localhost systemd[1]: Started Session 2 of user root. Jan 25 11:01:01 localhost systemd: Failed to open private bus connection: Failed to connect to socket /run/user/0/dbus/user_bus_socket: No such file or directory Jan 25 11:01:01 localhost systemd: Starting Default. Jan 25 11:01:01 localhost systemd: Reached target Default. Jan 25 11:01:01 localhost systemd: Startup finished in 10ms. Jan 25 11:01:01 localhost systemd[1]: Started User Manager for 0.
also, i don't know if it's relevant but, bfore that, i've lost the ability to resolve internet URLs.
I wonder if this is related to bug 1043212? Perhaps this slow shutdown only happens if the system has been up across a cron.daily run? I get a lot of annoying timeouts and screwed up services after cron runs every night: bug 1057811, bug 1057618, bug 1057614.
@Tom Horsley: unlikely i *heavily* use cronjobs and rarely notice the problem of this bugreport but i face it *randomly* on my testserver-VM while it is completly unpredicatble when and why it happens - never faced it on my real machines until now [root@srv-rhsoft:~]$ cat /etc/crontab | wc -l 48 [root@srv-rhsoft:~]$ ls /etc/cron.daily/ insgesamt 56K -rwxr-xr-x 1 root root 434 2013-08-08 15:14 0logwatch -rwxr-xr-x 1 root root 2,2K 2014-01-22 17:45 certwatch -rwxr-xr-x 1 root root 364 2013-09-20 23:10 dbmail -rwxr-xr-x 1 root root 589 2013-09-03 20:45 etckeeper -rwxr-xr-x 1 root root 12K 2014-01-13 16:23 google-chrome -rwxr-xr-x 1 root root 180 2012-08-01 12:56 logrotate -rwxr-x--- 1 root root 192 2013-08-03 14:37 mlocate -rwxr-xr-x 1 root root 1,8K 2013-08-03 12:44 rkhunter -rwxr-xr-x 1 root root 113 2013-02-25 16:49 check-updates.cron -rwxr-xr-x 1 root root 136 2012-01-10 07:49 cleanup-uploadtemp.cron -rwxr-xr-x 1 root root 249 2010-08-20 16:52 contact-backup.cron -rwxr-xr-x 1 root root 618 2013-08-03 11:32 man-db.cron https://bugzilla.redhat.com/show_bug.cgi?id=1023788#c2 i can reprdouce on *any* machine running F20/RHEL7-Beta at any time ________________________________________ @cornel panceac "Failed to connect to socket /run/user/0/dbus/user_bus_socket" appears all the time whenever a cronjob is started using a userid not called in any cronjob before or starting whatever user-session https://bugzilla.redhat.com/show_bug.cgi?id=1010572
*that* shows systemd in F20 has a serious problem because that happens at any package-update triggering "reload" and there are many, due the dist-upgrade a large amount of the complete log was flooded with that messages and there is no single reason why target "Sound Card" can be reached again and agin while happily play music with MPD without any interruption all the time __________________________________________________________________ [root@srv-rhsoft:~]$ > /var/log/messages; systemctl daemon-reload; sleep 2; cat /var/log/messages Jan 25 23:22:28 srv-rhsoft systemd[1]: Mounted Configuration File System. Jan 25 23:22:28 srv-rhsoft systemd[1]: Reached target Sound Card. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /sys/devices/virtual/block/md0. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /dev/md0. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-name-localhost.localdomain:0. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-uuid-1d691642:baed26df:1d197496:4fb00ff8. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /dev/disk/by-label/boot. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /sys/devices/virtual/block/md2. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /dev/md2. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-name-localhost.localdomain:2. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-uuid-ea253255:cb915401:f32794ad:ce0fe396. Jan 25 23:22:28 srv-rhsoft systemd[1]: Found device /dev/disk/by-label/data. __________________________________________________________________ [root@srv-rhsoft:~]$ > /var/log/messages; systemctl daemon-reload; sleep 2; cat /var/log/messages Jan 25 23:25:31 srv-rhsoft systemd[1]: Mounted Configuration File System. Jan 25 23:25:31 srv-rhsoft systemd[1]: Reached target Sound Card. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /sys/devices/virtual/block/md0. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /dev/md0. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-name-localhost.localdomain:0. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-uuid-1d691642:baed26df:1d197496:4fb00ff8. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /dev/disk/by-label/boot. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /sys/devices/virtual/block/md2. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /dev/md2. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-name-localhost.localdomain:2. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /dev/disk/by-id/md-uuid-ea253255:cb915401:f32794ad:ce0fe396. Jan 25 23:25:31 srv-rhsoft systemd[1]: Found device /dev/disk/by-label/data.
here you see a machine recently rebooted uid 48 is httpd (Apache), the loglines are for sure from teh first time call this cronjob, "Reached target Default" is a joke, that target was reached *long before* and not by start a random cronjob [root@testserver:~]$ cat /etc/crontab | grep apache */15 * * * * apache nice -n 19 ionice -c 3 bash /scripts/cleanup-uploadtemp.cron ____________________________ Jan 25 23:15:01 testserver systemd[1]: Starting user-48.slice. Jan 25 23:15:01 testserver systemd[1]: Created slice user-48.slice. Jan 25 23:15:01 testserver systemd[1]: Starting User Manager for 48... Jan 25 23:15:01 testserver systemd[1199]: Failed to open private bus connection: Failed to connect to socket /run/user/48/dbus/user_bus_socket: No such file or directory Jan 25 23:15:01 testserver systemd[1199]: Starting Default. Jan 25 23:15:01 testserver systemd[1199]: Reached target Default. Jan 25 23:15:01 testserver systemd[1199]: Startup finished in 4ms. Jan 25 23:15:01 testserver systemd[1]: Started User Manager for 48. Jan 25 23:16:26 testserver systemd[1]: Starting Cleanup of Temporary Directories... Jan 25 23:16:26 testserver systemd[1]: Started Cleanup of Temporary Directories.
I can reproduce this with 3.12 kernel but not 3.11.
http://cgit.freedesktop.org/systemd/systemd/commit/?id=63966da fixed this bug.
Commenting here as Adam posted a reference and wanted to share my experience fixing a similar bug. (In reply to weiw333 from comment #33) > http://cgit.freedesktop.org/systemd/systemd/commit/?id=63966da fixed this > bug. I don't think this is a full solution. I also have that patch in my build to address this, but it's a bit hacky, but OK for a released version IMO. We discussed the issue on the systemd mailing list. It related to the killing spree when systemd(PID1) sends term signals to everything) One of the processes it sends a term signal to is the systemd user session which then in-turn starts spawning processes calling kill itself, but after sending the term signal PID1 also sends kill signals to other processes in the cgroup which can (if the race is triggered) include the kill process that was spawned by systemd (user) to do it's killing... bit of a complex one to explain! This is fixed properly upstream by the killmode=mixed patch (IIRC), but a temporary solution is similar to the fix in the httpd .service file Fedora currently ships. Some more comments on our patch here: http://svnweb.mageia.org/packages/cauldron/systemd/current/SOURCES/0515-Temporary-work-around-for-slow-shutdown-due-to-unter.patch?revision=568430&view=markup I've had it confirmed from some of my users that this solves the shutdown slowness for them. It might be unrelated I guess, but seems suspiciously similar to issues reported elsewhere and discussed at length on the systemd list. HTHs
can we revert the commit that introduced this bug?
systemd-208-13.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/systemd-208-13.fc20
forget it https://bugzilla.redhat.com/show_bug.cgi?id=1053315 > The patches that Djalal lists are a proper fix, but they are > rather intrusive, so I'm not trying to backport them now. > Please use 'loginctl enable-linger 0' and 'loginctl enable-linger 48' > for the affected users, as a work-around. that is *unacceptable* if someone maintains a lot of machines with different cronjobs running as different users Feb 17 12:29:28 Updated: systemd-libs-208-13.fc20.x86_64 Feb 17 12:29:29 Updated: systemd-208-13.fc20.x86_64 Feb 17 12:29:29 Updated: libgudev1-208-13.fc20.x86_64 _______________________________________ Feb 17 12:45:01 testserver systemd[1]: Starting user-48.slice. Feb 17 12:45:01 testserver systemd[1]: Created slice user-48.slice. Feb 17 12:45:01 testserver systemd[1]: Starting User Manager for 48... Feb 17 12:45:01 testserver systemd[1498]: Failed to open private bus connection: Failed to connect to socket /run/user/48/dbus/user_bus_socket: No such file or directory Feb 17 12:45:01 testserver systemd[1498]: Starting Default. Feb 17 12:45:01 testserver systemd[1498]: Reached target Default. Feb 17 12:45:01 testserver systemd[1498]: Startup finished in 6ms. Feb 17 12:45:01 testserver systemd[1]: Started User Manager for 48. Feb 17 12:45:01 testserver systemd[1]: Stopping User Manager for 48... Feb 17 12:45:01 testserver systemd[1498]: Stopping Default. Feb 17 12:45:01 testserver systemd[1498]: Stopped target Default. Feb 17 12:45:01 testserver systemd[1498]: Starting Shutdown. Feb 17 12:45:01 testserver systemd[1498]: Reached target Shutdown. Feb 17 12:45:01 testserver systemd[1498]: Starting Exit the Session... Feb 17 12:45:01 testserver systemd[1498]: Received SIGRTMIN+24 from PID 1538 (kill). Feb 17 12:45:01 testserver systemd[1]: Stopped User Manager for 48. Feb 17 12:45:01 testserver systemd[1]: Stopping user-48.slice. Feb 17 12:45:01 testserver systemd[1]: Removed slice user-48.slice. Feb 17 12:45:02 testserver systemd[1]: Starting Cleanup of Temporary Directories... Feb 17 12:45:03 testserver systemd[1]: Started Cleanup of Temporary Directories.
> Enable/disable user lingering for one or more users. > If enabled for a specific user, a user manager is > spawned for the user at boot and kept around after > logouts. This allows users who are not logged in to > run long-running services if that would be true the log above would not happen because there is already running a service for uid 48 (apache), the webserver itself, the log above is a cronjob running twice a hour as user apache [root@testserver:~]$ systemctl status httpd.service httpd.service - Apache Webserver Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled) Active: active (running) since Mo 2014-02-17 12:30:12 CET; 28min ago Main PID: 769 (httpd)
@Harald: Note your log form comment 30 has the sound.target stuff in there on daemon-reload. This is caused by a patch that is now reverted, so that bit should be somewhat solved. Regarding this actual bug (not related to cron) can you check to see if my suggested workaround in comment 34 would solve it for you? I think it should and is included in the new build mentioned above from a couple days ago. There is also a (more recent i.e. a few minutes ago) a fix for log spamming which is a separate issue.
@Colin: this are not daemon-reloads, these are cronjobs Feb 17 12:45:01 testserver systemd[1498]: Stopping Default. Feb 17 12:45:01 testserver systemd[1498]: Stopped target Default. Feb 17 12:45:01 testserver systemd[1498]: Starting Shutdown. Feb 17 12:45:01 testserver systemd[1498]: Reached target Shutdown. https://bugzilla.redhat.com/show_bug.cgi?id=1023820#c34 besides the fact that i do not use fedora http-packages on my rawhide-VM there is no httpd enabled or running at all and i faced this issue, so any change in the httpd.service unlikley fixes the issue the shutdown problem of this bugreport does not happen that often, but given that is is unlikely httpd-related it is hardly the way to go change random systemd-units working for F18/F19 because breakage in F20/F21/RHEL7 i give http://koji.fedoraproject.org/koji/buildinfo?buildID=498731 a try on my testserver-VM in a short ____________________________________________________________ [root@rh:~]$ cat /usr/lib/systemd/system/httpd.service [Unit] Description=Apache Webserver After=network.service haveged.service rngd.service [Service] Type=simple EnvironmentFile=-/etc/sysconfig/httpd Environment="PATH=/usr/sbin:/usr/bin" ExecStart=/usr/sbin/httpd $OPTIONS -D FOREGROUND ExecReload=/usr/sbin/httpd $OPTIONS -k graceful Restart=always RestartSec=1 UMask=006 PrivateTmp=yes NoNewPrivileges=yes CapabilityBoundingSet=CAP_DAC_OVERRIDE CAP_IPC_LOCK CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID ReadOnlyDirectories=/etc ReadOnlyDirectories=/usr InaccessibleDirectories=/sys InaccessibleDirectories=/proc InaccessibleDirectories=/boot InaccessibleDirectories=/home InaccessibleDirectories=/root InaccessibleDirectories=/usr/local/scripts InaccessibleDirectories=/var/lib/rpm InaccessibleDirectories=/var/lib/yum InaccessibleDirectories=/var/spool [Install] WantedBy=multi-user.target ____________________________________________________________
systemd-208-14.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/systemd-208-14.fc20
systemd-208-14.fc20.x86_64 looks much better i can't confirm the hanging shutdown as fixed because i can rarely reproduce it (otherwise i would became choleric longer ago) but given that it does not bring new regressions for me i am open to give positive karma in the hope others are get more happy then with the current state it no longer floods the log everytime a cronjob is started so it's a acceptable compromise and fixing https://bugzilla.redhat.com/show_bug.cgi?id=1053315 however, these are annoying, the first must not happen at all and the others are misleading because calling the first time a cronjob for a user hardly reaches target Default because at that time it is already reached Feb 17 15:55:02 testserver systemd[1163]: Failed to open private bus connection: Failed to connect to socket /run/user/48/dbus/user_bus_socket: No such file or directory Feb 17 15:55:02 testserver systemd[1163]: Starting Default. Feb 17 15:55:02 testserver systemd[1163]: Reached target Default. Feb 17 15:55:02 testserver systemd[1163]: Startup finished in 4ms. ____________________________________ [root@testserver:~]$ cat /var/log/cron Feb 17 15:55:02 testserver CROND[1178]: (apache) CMD ( nice -n 19 ionice -c 3 bash /scripts/cleanup-uploadtemp.cron) Feb 17 16:00:01 testserver CROND[1360]: (apache) CMD ( nice -n 19 ionice -c 3 bash /scripts/cleanup-uploadtemp.cron) ____________________________________ [root@testserver:~]$ cat /var/log/messages Feb 17 15:55:02 testserver systemd[1]: Starting user-48.slice. Feb 17 15:55:02 testserver systemd[1]: Created slice user-48.slice. Feb 17 15:55:02 testserver systemd[1]: Starting User Manager for 48... Feb 17 15:55:02 testserver systemd[1163]: Failed to open private bus connection: Failed to connect to socket /run/user/48/dbus/user_bus_socket: No such file or directory Feb 17 15:55:02 testserver systemd[1163]: Starting Default. Feb 17 15:55:02 testserver systemd[1163]: Reached target Default. Feb 17 15:55:02 testserver systemd[1163]: Startup finished in 4ms. Feb 17 15:55:02 testserver systemd[1]: Started User Manager for 48.
(In reply to Harald Reindl from comment #42) > systemd-208-14.fc20.x86_64 looks much better Thanks for testing. > i can't confirm the hanging shutdown as fixed because i can rarely reproduce > it (otherwise i would became choleric longer ago) but given that it does not > bring new regressions for me i am open to give positive karma in the hope > others are get more happy then with the current state > > it no longer floods the log everytime a cronjob is started > so it's a acceptable compromise and fixing > https://bugzilla.redhat.com/show_bug.cgi?id=1053315 > > however, these are annoying, the first must not happen at all and the others > are misleading because calling the first time a cronjob for a user hardly > reaches target Default because at that time it is already reached > Feb 17 15:55:02 testserver systemd[1163]: Failed to open private bus > connection: Failed to connect to socket /run/user/48/dbus/user_bus_socket: > No such file or directory > Feb 17 15:55:02 testserver systemd[1163]: Starting Default. > Feb 17 15:55:02 testserver systemd[1163]: Reached target Default. > Feb 17 15:55:02 testserver systemd[1163]: Startup finished in 4ms. Those are a bit misleading... You have to look at the PID to know whether it's the system instance or the user instance. I guess the descriptions should be improved to say something like "Default system target", and "Default user target".
(In reply to Harald Reindl from comment #40) > the shutdown problem of this bugreport does not happen that often, but given > that is is unlikely httpd-related it is hardly the way to go change random > systemd-units working for F18/F19 because breakage in F20/F21/RHEL7 For the record, I'm suggesting the same workaround as is currently used in the httpd service is also applied to the user@.service unit. This has now been done in the update you're testing, but I wanted to make it clear it's not related to the httpd.service file in anyway, other than using the same technique to workaround an overly aggressive killing spree.
i understand that but we *really* should avoid such workaround in systemd-units not shipped directly with the systemd-package, this will end in a mess and unpredictable behavior all over the distribution you never know what services somebody is running if "killmode=mixed" is the solution for a major number and in the best case with no regressions to other services than *that* should be the systemd-default and only packages with exceptions for good reasons should override this even if making it systemd-default behavior solves most problems the question if this only masks the "overly aggressive killing spree" root problem instead solve it - given that systemd-upstream claims to do things the right way and all other components around in the distribution needs to be fixed left and right this would just go in the wrong direction
harald: the point is we can't just take the correct upstream fixes (yet) because they're tied to all the instability that means there is not yet a 209 release. but we can't just sit around and wait for them, because we really need to fix #1043212 *now*.
@Adam: i understand that *but* i also asked if "killmode=mixed" as default-behavior would solve this instead change random units left and right > because they're tied to all the instability that means > there is not yet a 209 release yes and that is why i was such a bad person to *ask* if the systemd-developer *may consider* to adjust their development process, for a core-component of in the meantime most distributions it is a unacceptable release versionin 207, 208, 209 and in the meantime any downstream needs cherry picking and backporting like hell instead have upstream point releases like 208.1, 208.2, 208.3 like in the good old days which implies "that is a bugfix-only with no expected new breakage" well, the answer of Lennart is pretty clear and in my understanding "we are doing release-and-forget and not care about downstream distributions and users around as because that would possibly slown down the development of new features and major changes" -------- Original-Nachricht -------- Betreff: Re: [systemd-devel] https://bugzilla.redhat.com/show_bug.cgi?id=1047614 Datum: Wed, 12 Feb 2014 21:19:02 +0100 Von: Lennart Poettering <lennart> Organisation: Red Hat, Inc. An: Reindl Harald <h.reindl> Kopie (CC): Mailing-List systemd <systemd-devel.org> On Wed, 12.02.14 20:05, Reindl Harald (h.reindl) wrote: > https://bugzilla.redhat.com/show_bug.cgi?id=1047614 > > Product: Red Hat Enterprise Linux 7 > Component: systemd (Show other bugs) > Version: 7.0 > Hardware: Unspecified Unspecified > Priority urgent Severity high > > first reported more than 3 months ago > https://bugzilla.redhat.com/show_bug.cgi?id=1023788 > > maybe systemd-upstream should consider slow down development > and spend more energy in quality and stability Well, firstly, it's hardly your business how we spend our time. Secondly, this bug is fixed upstream. Thirdly, patches count more than complaining.
For the love of Pete, how many times do I have to say? This is *bugzilla*, not a mailing list or forum. Please focus.
systemd-208-14.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
systemd-208-14.fc20 does NOT fix this hang. I installed (and rebooted) it a few days ago when it appeared in updates-testing. I just did a "yum update" today and got a new kernel, so I decided to reboot again, and once again it hung longer than I could stand to wait (I gave it about 5 minutes with no obvious signs of any progress like the disk rattling) with the cylon eyeball message about a stop job running for user XXXX. In fact, it kept switching back and forth between two different user numbers, one mine, another a different user who had been logged in using NIS during the weekend testing some software on fedora 20.
+1. Problem still occurs.
This bug is for user . Not user XXXX. I'm not sure you've ever actually been seeing this bug.
> This bug is for user . Not user XXXX and you think the underlying problem is a different? user@service is a template rawhides systemd-210 seems to fix a handful of the F20 bugs on the other hand it re-opens https://bugzilla.redhat.com/show_bug.cgi?id=1053315#c31 what leaves in general a bad taste however, systemd-210 *closes* https://bugzilla.redhat.com/show_bug.cgi?id=1023788 without dirty hacks as well as maybe the side effect https://bugzilla.redhat.com/show_bug.cgi?id=1047614#c9
"and you think the underlying problem is a different?" it seems a reasonable guess that the difference between user 0 (root) and user anything else (not root) is significant given the nature of the bug, yes.
OK, if per-user stop jobs are different, I've created a new Bug 1073714 for them.
I posted over at a similar sounding issue at https://github.com/systemd/systemd/issues/12702. Should I create a new BZ? Failed to start User Manager for UID 0. getent passwd 0 root:x:0:0:root:/root:/bin/bash Showing this: systemctl list-units --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● user loaded failed failed User Manager for UID 0 and: systemctl status user ● user - User Manager for UID 0 Loaded: loaded (/usr/lib/systemd/system/user@.service; static; vendor preset: disabled) Active: failed (Result: protocol) since Tue 2019-06-18 09:57:56 EDT; 9min ago Docs: man:user@.service(5) Main PID: 22782 (code=exited, status=1/FAILURE) Jun 18 09:57:56 ourdomain.edu systemd[1]: Starting User Manager for UID 0... Jun 18 09:57:56 ourdomain.edu systemd[22782]: pam_unix(systemd-user:session): session opened for user root by (uid=0) Jun 18 09:57:56 ourdomain.edu systemd[22782]: bind(/run/user/0/systemd/notify) failed: Address already in use Jun 18 09:57:56 ourdomain.edu systemd[22782]: Failed to fully start up daemon: Address already in use Jun 18 09:57:56 ourdomain.edu systemd[22783]: pam_unix(systemd-user:session): session closed for user root Jun 18 09:57:56 ourdomain.edu systemd[1]: user: Failed with result 'protocol'. Jun 18 09:57:56 ourdomain.edu systemd[1]: Failed to start User Manager for UID 0.