Bug 761656

Summary:	TimeoutSec ignored for oneshot services
Product:	[Fedora] Fedora	Reporter:	Michal Jaegermann <michal>
Component:	systemd	Assignee:	Lukáš Nykrýn <lnykryn>
Status:	CLOSED ERRATA	QA Contact:	Fedora Extras Quality Assurance <extras-qa>
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	17	CC:	johannbg, johannbg, lnykryn, lpoetter, metherid, mschmidt, notting, plautrba, rrauenza, stern, systemd-maint
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2012-07-02 22:24:46 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Michal Jaegermann 2011-12-08 19:29:17 UTC

Description of problem:

After booting an updated rawhide installation I getty processes are not running and I see:

# systemctl | grep getty
getty        loaded inactive   dead      start Getty on tty1
getty        loaded inactive   dead      start Getty on tty2
getty        loaded inactive   dead      start Getty on tty3
getty        loaded inactive   dead      start Getty on tty4
getty        loaded inactive   dead      start Getty on tty5
getty        loaded inactive   dead      start Getty on tty6
getty.target              loaded inactive   dead      start Login Prompts

In logs I can find entries like that:
... systemd-logind[818]: Failed to start unit: Unit autovt failed to load: File exists. See system logs and 'systemctl status autovt' for details.

There is not much more for "See system logs" and 
'systemctl status autovt' prints:

autovt
          Loaded: error (Reason: File exists)
          Active: inactive (dead)

Eh? Which "File exists"?
I get also:

# systemctl status getty.target
getty.target - Login Prompts
          Loaded: loaded (/lib/systemd/system/getty.target; enabled)
          Active: inactive (dead)

and

# systemctl status getty
getty - Getty on tty1
          Loaded: loaded (/lib/systemd/system/getty@.service; enabled)
          Active: inactive (dead) since Thu, 08 Dec 2011 11:50:10 -0700; 15min ago
         Process: 1319 ExecStart=/sbin/agetty %I 38400 (code=killed, signal=HUP)
          CGroup: name=systemd:/system/getty@.service/tty1

After all of this I may see if I am patient enough:

Dec  8 12:13:22 localhost systemd[1]: Reexecuting.

Why this is reexecuting I have not the slightest idea but suddenly after that:

systemctl | grep getty
getty        loaded active running       Getty on tty1
getty        loaded active running       Getty on tty2
getty        loaded active running       Getty on tty3
getty        loaded active running       Getty on tty4
getty        loaded active running       Getty on tty5
getty        loaded active running       Getty on tty6
getty.target              loaded active active        Login Prompts

That is around 25 minutes to start ttys.  25 seconds would be way too long.

Version-Release number of selected component (if applicable):
systemd-37-4.fc17.x86_64

How reproducible:
On every reboot

Comment 1 Michal Jaegermann 2011-12-08 19:41:21 UTC

I tried to issue 'systemctl daemon-reexec'.  Immediately after that all getty stuff was "dead" but some 20 seconds later all of that was "active running".  It is quite possible that systemd in tandem with plymouth-quit-wait.service is responsible for this bogosity.

Comment 2 Michal Schmidt 2011-12-16 02:11:40 UTC

Does "systemctl list-jobs" show anything?

Comment 3 Michal Jaegermann 2011-12-16 02:59:27 UTC

(In reply to comment #2)
> Does "systemctl list-jobs" show anything?

 JOB UNIT                      TYPE            STATE  
 101 getty.target              start           waiting
 102 getty        start           waiting
 104 getty        start           waiting
 106 getty        start           waiting
 108 getty        start           waiting
 113 getty        start           waiting
 139 plymouth...t-wait.service start           running

7 jobs listed.

After 'systemctl stop plymouth-quit-wait.service' I have all getty jobs running and list-jobs has "0 jobs listed".

It looks like that this plymouth-quit-wait.service is waaay too patient.

Comment 4 Michal Schmidt 2011-12-16 23:49:19 UTC

Could you please paste the output of:
systemctl status plymouth-quit.service prefdm.service

Comment 5 Michal Jaegermann 2011-12-17 02:19:52 UTC

(In reply to comment #4)
> Could you please paste the output of:
> systemctl status plymouth-quit.service prefdm.service

Like this after a boot.

plymouth-quit-wait.service - Wait for Plymouth Boot Screen to Quit
          Loaded: loaded (/lib/systemd/system/plymouth-quit-wait.service; static)
          Active: activating (start) since Fri, 16 Dec 2011 18:54:53 -0700; 2min 16s ago
        Main PID: 1178 (plymouth)
          CGroup: name=systemd:/system/plymouth-quit-wait.service
                  └ 1178 /bin/plymouth --wait

Hm, I can see in /lib/systemd/system/plymouth-quit-wait.service 'TimeoutSec=20' but it does not seem to be effective.  What is worse I can run
'systemctl disable plymouth-quit-wait.service' but that does not seem to be doing anything.  I guess that as the last resort I can stop this insanity in rc.local.

A boot sequence seems to be getting more and more fragile with time.  Just few minutes ago I could not boot because udev went bonkers and started what looked like an infinite loop (but was just fine on the second try).  Some machines are remote, you know?

Comment 6 Michal Schmidt 2011-12-19 06:19:03 UTC

This is not the output I was asking for. Note there are two different services: plymouth-quit.service and plymouth-quit-wait.service. To debug the problem I need to see the output of:

systemctl status plymouth-quit.service prefdm.service

...because one of them is responsible for telling plymouth to quit, depending on what the default target is.

It is a bug that TimeoutSec is not effective for oneshot services, but I'd like to find out why the timeout would be reached in the first place.


When using "systemctl {enable,disable} ..." on a unit without an [Install] section, there should be an error message "Unit files contain no applicable installation information.". I'll look into why it does not appear.

As a workaround, instead of rc.local, you could use 'systemctl mask ...' as a heavy-handed way to prevent certain services from being started by systemd.

Comment 7 Michal Jaegermann 2011-12-19 16:48:08 UTC

(In reply to comment #6)

> To debug the problem I need to see the output of:
> 
> systemctl status plymouth-quit.service prefdm.service

Ah, sorry.  Misunderstanding.  Here we go

# systemctl status plymouth-quit.service prefdm.service
plymouth-quit.service - Terminate Plymouth Boot Screen
          Loaded: loaded (/lib/systemd/system/plymouth-quit.service; static)
          Active: inactive (dead)
          CGroup: name=systemd:/system/plymouth-quit.service

prefdm.service - Display Manager
          Loaded: loaded (/lib/systemd/system/prefdm.service; enabled)
          Active: active (running) since Mon, 19 Dec 2011 09:34:42 -0700; 59s ago
        Main PID: 1179 (gdm-binary)
          CGroup: name=systemd:/system/prefdm.service
                  ├ 1179 /usr/sbin/gdm-binary -nodaemon
                  ├ 1219 /usr/sbin/gdm-binary -nodaemon
                  ├ 1221 /usr/bin/X :0 -br -audit 0 -auth /var/gdm/:0.Xauth -nolisten tc...
                  └ 1232 /usr/libexec/gdmlogin

> As a workaround, instead of rc.local, you could use 'systemctl mask ...'

Thanks. Good to know that.  I realize that this is on 'man systemctl' but memorizing all documenation is not really feasible.

Comment 8 Michal Schmidt 2011-12-22 02:00:47 UTC

gdm is responsible for stopping plymouth. What is the version of the gdm package you have installed? I don't have a program /usr/libexec/gdmlogin on my system.

Comment 9 Michal Jaegermann 2011-12-22 02:57:54 UTC

(In reply to comment #8)
> gdm is responsible for stopping plymouth. What is the version of the gdm
> package you have installed? I don't have a program /usr/libexec/gdmlogin on my
> system.

I actually have a pretty old version of gdm there and I have good reasons for that.  This is one of these reasons - bug 757570.

All these pieces are getting way too tangled and buggy.  It appears that I will have to find some reasonable workarounds.

Comment 10 Michal Schmidt 2011-12-22 23:01:17 UTC

The old version probably fails to talk to plymouth. Hack your /etc/X11/prefdm to call "plymouth quit" even for gdm.

Comment 11 Michal Jaegermann 2011-12-23 00:43:00 UTC

(In reply to comment #10)
> The old version probably fails to talk to plymouth.

Yes.  It is older than plymouth.

> Hack your /etc/X11/prefdm to call "plymouth quit" even for gdm.

Will do.  Thanks!  Still a 25 minutes wait if something is "not expected" seems to be tad excessive.

Comment 12 Jóhann B. Guðmundsson 2012-01-29 20:19:34 UTC

Is this still an issue on a fully updated release or can this bug be closed now?

Comment 13 Michal Schmidt 2012-01-30 09:46:04 UTC

The bug has not been fixed yet.

Comment 14 Michal Jaegermann 2012-01-30 16:58:34 UTC

(In reply to comment #12)
> Is this still an issue on a fully updated release or can this bug be closed
> now?

Frankly, I do not have now a good test case as I made sure in my current configuration not to trigger it; but accordingly to comment 13 this bug is still there.

Comment 15 Alan Stern 2012-02-13 17:37:49 UTC

I have a similar problem (may or may not be related; it's hard to tell) on a system which does not run any GUI at all during bootup.  The system log contains messages like this:

Feb 13 11:47:18 iolanthe systemd-logind[884]: Failed to start unit: Unit autovt failed to load: File exists. See system logs and 'systemctl status autovt' for details.

and there's no login prompt.  I didn't wait for more than a couple of minutes to see if anything would time out.  This is with a somewhat customized initramfs image; when booting with a standard imitramfs the problem doesn't occur.  But in both situations I have:

$  systemctl status autovt
autovt
          Loaded: error (Reason: File exists)
          Active: inactive (dead)

What do these "File exists" errors refer to?  In fact, why is autovt being run at all?  It looks like it's just a fossil remnant in systemd-logind, which should be changed to refer to getty@.service.

Comment 16 Rich Rauenzahn 2012-04-15 22:49:34 UTC

I'm also seeing this problem as of today's reboot:

Apr 15 15:38:23 tendo systemd-logind[1284]: Failed to start unit: Unit autovt failed to load: File exists. See system logs and 'systemctl status autovt' for details.


$ cat /etc/redhat-release
Fedora release 16 (Verne)
$ rpm -qf /lib/systemd/systemd-logind
systemd-37-17.fc16.i686

Comment 17 Rich Rauenzahn 2012-04-15 23:24:59 UTC

This seems related to: https://bugzilla.redhat.com/show_bug.cgi?id=787252

Comment 18 Rich Rauenzahn 2012-04-16 02:57:55 UTC

stracing pid 1 (systemd) during a restart of this service, the only EEXIST is this:

1     accept4(8, {sa_family=AF_FILE, NULL}, [2], SOCK_CLOEXEC) = 15             
1     fcntl64(15, F_GETFL)              = 0x2 (flags O_RDWR)                    
1     fcntl64(15, F_SETFL, O_RDWR|O_NONBLOCK) = 0                               
1     getsockname(15, {sa_family=AF_FILE, path="/run/systemd/private"}, [23]) = 0
1     epoll_ctl(4, EPOLL_CTL_ADD, 15, {0, {u32=149391768, u64=149391768}}) = 0  
1     epoll_ctl(4, EPOLL_CTL_ADD, 15, {EPOLLIN|EPOLLERR|EPOLLHUP, {u32=149677528, u64=149677528}}) = -1 EEXIST (File exists)

Also, downgrading to previous systemd-36-3 did not fix it.

I am not running an GUI front end.. this is a text only console.

Wait.. I'm wrong!  

$  systemctl | grep getty
getty        loaded active running       Getty on tty1
getty        loaded active running       Getty on tty2
getty        loaded active running       Getty on tty3
getty        loaded active running       Getty on tty4
getty        loaded active running       Getty on tty5
getty        loaded active running       Getty on tty6
getty.target              loaded active active        Login Prompts

And I can verify they are running on the console.

So this error is actually irrelevant:

$ sudo systemctl restart autovt
Failed to issue method call: Unit autovt failed to load: File exists. See system logs and 'systemctl status autovt' for details.

Ok, let's upgrade again...

Still works, restart still fails, but is irrelevant.

I'm also stumped now.  Will report back if it stops working again, or on next reboot.

Comment 19 Michal Jaegermann 2012-04-16 03:41:48 UTC

(In reply to comment #18)

> I am not running an GUI front end.. this is a text only console.
> 
> Wait.. I'm wrong!

If you try 'systemctl --failed' is anything actually listed?

Comment 20 Rich Rauenzahn 2012-04-16 15:38:57 UTC

(In reply to comment #19)
> (In reply to comment #18)
> 
> > I am not running an GUI front end.. this is a text only console.
> > 
> > Wait.. I'm wrong!
> 
> If you try 'systemctl --failed' is anything actually listed?

Yes, but I assumed unrelated and are mostly stuff I haven't cleaned up in a while ...


]$ sudo systemctl --failed | perl -pe's/Rauenzahn//g' | perl -pe's/_rjr//'g  
UNIT                        LOAD   ACTIVE SUB    JOB DESCRIPTION
cryptset...x2dlvol1.service loaded failed failed     Cryptography Setup for luks-volBackup-lvol1
cryptset...x2dlvol2.service loaded failed failed     Cryptography Setup for luks-volBackup-lvol2
dk-milter.service           loaded failed failed     SYSV: dk-filter is a daemon that hooks into sendmail and sign/verify mail according DomainKeys standard
heyu.service            loaded failed failed     SYSV: heyu engine/relay
mcstrans.service            loaded failed failed     SYSV: This starts the SELinux Context Translation System Daemon
murmur.service              loaded failed failed     SYSV: murmur is the server for the Mumble
network.service             loaded failed failed     LSB: Bring up/down networking
nfs-secure-server.service   loaded failed failed     Secure NFS Server
setterm.service         loaded failed failed     SYSV: sets term defaults

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
JOB    = Pending job for the unit.

9 units listed. Pass --all to see inactive units, too.

Not sure why it thinks network service isn't working.  Networking is up.

Comment 21 Lukáš Nykrýn 2012-06-18 07:28:21 UTC

patch for timeoutsec in oneshot services submitted to git->
http://cgit.freedesktop.org/systemd/systemd/commit/?id=98709151f3e782eb508ba15e2a12c0b46003f061 -> post

Comment 22 Fedora Update System 2012-06-26 12:09:32 UTC

systemd-44-17.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/systemd-44-17.fc17

Comment 23 Fedora Update System 2012-06-26 21:39:09 UTC

Package systemd-44-17.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing systemd-44-17.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-9960/systemd-44-17.fc17
then log in and leave karma (feedback).

Comment 24 Fedora Update System 2012-07-02 22:24:46 UTC

systemd-44-17.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.