Bug 761656
Summary: | TimeoutSec ignored for oneshot services | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Michal Jaegermann <michal> |
Component: | systemd | Assignee: | Lukáš Nykrýn <lnykryn> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 17 | CC: | johannbg, johannbg, lnykryn, lpoetter, metherid, mschmidt, notting, plautrba, rrauenza, stern, systemd-maint |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2012-07-02 22:24:46 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michal Jaegermann
2011-12-08 19:29:17 UTC
I tried to issue 'systemctl daemon-reexec'. Immediately after that all getty stuff was "dead" but some 20 seconds later all of that was "active running". It is quite possible that systemd in tandem with plymouth-quit-wait.service is responsible for this bogosity. Does "systemctl list-jobs" show anything? (In reply to comment #2) > Does "systemctl list-jobs" show anything? JOB UNIT TYPE STATE 101 getty.target start waiting 102 getty start waiting 104 getty start waiting 106 getty start waiting 108 getty start waiting 113 getty start waiting 139 plymouth...t-wait.service start running 7 jobs listed. After 'systemctl stop plymouth-quit-wait.service' I have all getty jobs running and list-jobs has "0 jobs listed". It looks like that this plymouth-quit-wait.service is waaay too patient. Could you please paste the output of: systemctl status plymouth-quit.service prefdm.service (In reply to comment #4) > Could you please paste the output of: > systemctl status plymouth-quit.service prefdm.service Like this after a boot. plymouth-quit-wait.service - Wait for Plymouth Boot Screen to Quit Loaded: loaded (/lib/systemd/system/plymouth-quit-wait.service; static) Active: activating (start) since Fri, 16 Dec 2011 18:54:53 -0700; 2min 16s ago Main PID: 1178 (plymouth) CGroup: name=systemd:/system/plymouth-quit-wait.service └ 1178 /bin/plymouth --wait Hm, I can see in /lib/systemd/system/plymouth-quit-wait.service 'TimeoutSec=20' but it does not seem to be effective. What is worse I can run 'systemctl disable plymouth-quit-wait.service' but that does not seem to be doing anything. I guess that as the last resort I can stop this insanity in rc.local. A boot sequence seems to be getting more and more fragile with time. Just few minutes ago I could not boot because udev went bonkers and started what looked like an infinite loop (but was just fine on the second try). Some machines are remote, you know? This is not the output I was asking for. Note there are two different services: plymouth-quit.service and plymouth-quit-wait.service. To debug the problem I need to see the output of: systemctl status plymouth-quit.service prefdm.service ...because one of them is responsible for telling plymouth to quit, depending on what the default target is. It is a bug that TimeoutSec is not effective for oneshot services, but I'd like to find out why the timeout would be reached in the first place. When using "systemctl {enable,disable} ..." on a unit without an [Install] section, there should be an error message "Unit files contain no applicable installation information.". I'll look into why it does not appear. As a workaround, instead of rc.local, you could use 'systemctl mask ...' as a heavy-handed way to prevent certain services from being started by systemd. (In reply to comment #6) > To debug the problem I need to see the output of: > > systemctl status plymouth-quit.service prefdm.service Ah, sorry. Misunderstanding. Here we go # systemctl status plymouth-quit.service prefdm.service plymouth-quit.service - Terminate Plymouth Boot Screen Loaded: loaded (/lib/systemd/system/plymouth-quit.service; static) Active: inactive (dead) CGroup: name=systemd:/system/plymouth-quit.service prefdm.service - Display Manager Loaded: loaded (/lib/systemd/system/prefdm.service; enabled) Active: active (running) since Mon, 19 Dec 2011 09:34:42 -0700; 59s ago Main PID: 1179 (gdm-binary) CGroup: name=systemd:/system/prefdm.service ├ 1179 /usr/sbin/gdm-binary -nodaemon ├ 1219 /usr/sbin/gdm-binary -nodaemon ├ 1221 /usr/bin/X :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tc... └ 1232 /usr/libexec/gdmlogin > As a workaround, instead of rc.local, you could use 'systemctl mask ...' Thanks. Good to know that. I realize that this is on 'man systemctl' but memorizing all documenation is not really feasible. gdm is responsible for stopping plymouth. What is the version of the gdm package you have installed? I don't have a program /usr/libexec/gdmlogin on my system. (In reply to comment #8) > gdm is responsible for stopping plymouth. What is the version of the gdm > package you have installed? I don't have a program /usr/libexec/gdmlogin on my > system. I actually have a pretty old version of gdm there and I have good reasons for that. This is one of these reasons - bug 757570. All these pieces are getting way too tangled and buggy. It appears that I will have to find some reasonable workarounds. The old version probably fails to talk to plymouth. Hack your /etc/X11/prefdm to call "plymouth quit" even for gdm. (In reply to comment #10) > The old version probably fails to talk to plymouth. Yes. It is older than plymouth. > Hack your /etc/X11/prefdm to call "plymouth quit" even for gdm. Will do. Thanks! Still a 25 minutes wait if something is "not expected" seems to be tad excessive. Is this still an issue on a fully updated release or can this bug be closed now? The bug has not been fixed yet. (In reply to comment #12) > Is this still an issue on a fully updated release or can this bug be closed > now? Frankly, I do not have now a good test case as I made sure in my current configuration not to trigger it; but accordingly to comment 13 this bug is still there. I have a similar problem (may or may not be related; it's hard to tell) on a system which does not run any GUI at all during bootup. The system log contains messages like this: Feb 13 11:47:18 iolanthe systemd-logind[884]: Failed to start unit: Unit autovt failed to load: File exists. See system logs and 'systemctl status autovt' for details. and there's no login prompt. I didn't wait for more than a couple of minutes to see if anything would time out. This is with a somewhat customized initramfs image; when booting with a standard imitramfs the problem doesn't occur. But in both situations I have: $ systemctl status autovt autovt Loaded: error (Reason: File exists) Active: inactive (dead) What do these "File exists" errors refer to? In fact, why is autovt being run at all? It looks like it's just a fossil remnant in systemd-logind, which should be changed to refer to getty@.service. I'm also seeing this problem as of today's reboot: Apr 15 15:38:23 tendo systemd-logind[1284]: Failed to start unit: Unit autovt failed to load: File exists. See system logs and 'systemctl status autovt' for details. $ cat /etc/redhat-release Fedora release 16 (Verne) $ rpm -qf /lib/systemd/systemd-logind systemd-37-17.fc16.i686 This seems related to: https://bugzilla.redhat.com/show_bug.cgi?id=787252 stracing pid 1 (systemd) during a restart of this service, the only EEXIST is this: 1 accept4(8, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 15 1 fcntl64(15, F_GETFL) = 0x2 (flags O_RDWR) 1 fcntl64(15, F_SETFL, O_RDWR|O_NONBLOCK) = 0 1 getsockname(15, {sa_family=AF_FILE, path="/run/systemd/private"}, [23]) = 0 1 epoll_ctl(4, EPOLL_CTL_ADD, 15, {0, {u32=149391768, u64=149391768}}) = 0 1 epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=149677528, u64=149677528}}) = -1 EEXIST (File exists) Also, downgrading to previous systemd-36-3 did not fix it. I am not running an GUI front end.. this is a text only console. Wait.. I'm wrong! $ systemctl | grep getty getty loaded active running Getty on tty1 getty loaded active running Getty on tty2 getty loaded active running Getty on tty3 getty loaded active running Getty on tty4 getty loaded active running Getty on tty5 getty loaded active running Getty on tty6 getty.target loaded active active Login Prompts And I can verify they are running on the console. So this error is actually irrelevant: $ sudo systemctl restart autovt Failed to issue method call: Unit autovt failed to load: File exists. See system logs and 'systemctl status autovt' for details. Ok, let's upgrade again... Still works, restart still fails, but is irrelevant. I'm also stumped now. Will report back if it stops working again, or on next reboot. (In reply to comment #18) > I am not running an GUI front end.. this is a text only console. > > Wait.. I'm wrong! If you try 'systemctl --failed' is anything actually listed? (In reply to comment #19) > (In reply to comment #18) > > > I am not running an GUI front end.. this is a text only console. > > > > Wait.. I'm wrong! > > If you try 'systemctl --failed' is anything actually listed? Yes, but I assumed unrelated and are mostly stuff I haven't cleaned up in a while ... ]$ sudo systemctl --failed | perl -pe's/Rauenzahn//g' | perl -pe's/_rjr//'g UNIT LOAD ACTIVE SUB JOB DESCRIPTION cryptset...x2dlvol1.service loaded failed failed Cryptography Setup for luks-volBackup-lvol1 cryptset...x2dlvol2.service loaded failed failed Cryptography Setup for luks-volBackup-lvol2 dk-milter.service loaded failed failed SYSV: dk-filter is a daemon that hooks into sendmail and sign/verify mail according DomainKeys standard heyu.service loaded failed failed SYSV: heyu engine/relay mcstrans.service loaded failed failed SYSV: This starts the SELinux Context Translation System Daemon murmur.service loaded failed failed SYSV: murmur is the server for the Mumble network.service loaded failed failed LSB: Bring up/down networking nfs-secure-server.service loaded failed failed Secure NFS Server setterm.service loaded failed failed SYSV: sets term defaults LOAD = Reflects whether the unit definition was properly loaded. ACTIVE = The high-level unit activation state, i.e. generalization of SUB. SUB = The low-level unit activation state, values depend on unit type. JOB = Pending job for the unit. 9 units listed. Pass --all to see inactive units, too. Not sure why it thinks network service isn't working. Networking is up. patch for timeoutsec in oneshot services submitted to git-> http://cgit.freedesktop.org/systemd/systemd/commit/?id=98709151f3e782eb508ba15e2a12c0b46003f061 -> post systemd-44-17.fc17 has been submitted as an update for Fedora 17. https://admin.fedoraproject.org/updates/systemd-44-17.fc17 Package systemd-44-17.fc17: * should fix your issue, * was pushed to the Fedora 17 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing systemd-44-17.fc17' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-9960/systemd-44-17.fc17 then log in and leave karma (feedback). systemd-44-17.fc17 has been pushed to the Fedora 17 stable repository. If problems still persist, please make note of it in this bug report. |