Bug 1084401 - RFE: systemd.confirm_spawn=1 should wait forever
Summary: RFE: systemd.confirm_spawn=1 should wait forever
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: systemd
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: systemd-maint
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-04-04 09:11 UTC by Marko Myllynen
Modified: 2016-01-27 15:01 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-01-27 15:01:22 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Marko Myllynen 2014-04-04 09:11:39 UTC
Description of problem:
systemd.confirm_spawn=1 should ask for confirmation when spawning processes but it has a built-in timeout after which a positive response is assumed. This causes issues when an administrator boots up a system with systemd.confirm_spawn=1 and the administrator is interrupted by something while the system is booting, the bootup continues slowly behind his/hers back although the intention was to confirm each step.

systemd.confirm_spawn=1 should not expect a positive response no matter how long it takes to provide a response to a question.

In the worst case, this seems to cause system boot to fail altogether. To reproduce, boot with systemd.confirm_spawn=1, go have a long lunch, and then see time out errors for devices and services on the screen and question should dracut-emergency be executed.

Version-Release number of selected component (if applicable):
systemd-208-15

Comment 1 Jóhann B. Guðmundsson 2014-04-04 10:01:51 UTC
If you boot with systemd.confirm_spawn=1 which you manually have to add and boot into it is expected that you are present and debugging the output from the boot not that you are hanging out at the coffee machine or having a lunch or walk away to play ping pong with a coworker. 

So from my point of view this is working as expected with a worst case scenario you have to reboot the machine and go through the boot process again when you suddenly are present again.

Comment 2 Marko Myllynen 2014-04-04 10:05:26 UTC
(In reply to Jóhann B. Guðmundsson from comment #1)
> If you boot with systemd.confirm_spawn=1 which you manually have to add and
> boot into it is expected that you are present and debugging the output from
> the boot not that you are hanging out at the coffee machine or having a
> lunch or walk away to play ping pong with a coworker.

I actually came across this when trying to debug a system over an unreliable network connection.

When working over unreliable network connections rebooting not really help.

Comment 3 Jóhann B. Guðmundsson 2014-04-04 10:18:43 UTC
(In reply to Marko Myllynen from comment #2)
> (In reply to Jóhann B. Guðmundsson from comment #1)
> > If you boot with systemd.confirm_spawn=1 which you manually have to add and
> > boot into it is expected that you are present and debugging the output from
> > the boot not that you are hanging out at the coffee machine or having a
> > lunch or walk away to play ping pong with a coworker.
> 
> I actually came across this when trying to debug a system over an unreliable
> network connection.
> 
> When working over unreliable network connections rebooting not really help.

Interesting how do you expect being able to debug a system over a "unreliable network connection" ( what ever that means ) 

On that "unreliable network connection" how can you expect to be able to continue in rescue shell if you cant reboot it anyway?

Comment 4 Marko Myllynen 2014-04-04 10:28:57 UTC
(In reply to Jóhann B. Guðmundsson from comment #3)
> (In reply to Marko Myllynen from comment #2)
> > (In reply to Jóhann B. Guðmundsson from comment #1)
> > > If you boot with systemd.confirm_spawn=1 which you manually have to add and
> > > boot into it is expected that you are present and debugging the output from
> > > the boot not that you are hanging out at the coffee machine or having a
> > > lunch or walk away to play ping pong with a coworker.
> > 
> > I actually came across this when trying to debug a system over an unreliable
> > network connection.
> > 
> > When working over unreliable network connections rebooting not really help.
> 
> Interesting how do you expect being able to debug a system over a
> "unreliable network connection" ( what ever that means )

A virtual guest on a remote host over VPN over 4G. If the network connection fails (either due to VPN issues or mobile network issues or 4G modem issues or whatever) it'll take some time to recover and reconnect. The system being troubleshooted should not do anything meanwhile.

What's the benefit of the current scheme compared to timeoutless approach, btw?

Comment 5 Jóhann B. Guðmundsson 2014-04-04 10:45:50 UTC
(In reply to Marko Myllynen from comment #4)
> 
> What's the benefit of the current scheme compared to timeoutless approach,
> btw?

You wont risk leaving the computer hanging in unknown state by waiting for user input definitely

Comment 6 Marko Myllynen 2014-04-04 11:01:28 UTC
(In reply to Jóhann B. Guðmundsson from comment #5)
> > What's the benefit of the current scheme compared to timeoutless approach,
> > btw?
> 
> You wont risk leaving the computer hanging in unknown state by waiting for
> user input definitely

If the purpose of the timeout is to prevent the system to end up in an unknown state, then it's not working, see comment 1.

I think ideally we would have options for both cases and the current option fixed.

Comment 7 Karel Volný 2014-04-04 11:02:00 UTC
(In reply to Jóhann B. Guðmundsson from comment #5)
> You wont risk leaving the computer hanging in unknown state by waiting for
> user input definitely

which brings exactly what benefits over leaving the computer in unknown (maybe even more broken) state after timeouting the questions?

if Marko's example is not good enough, what about not going to lunch but having to do some research during the bootup before deciding whether to confirm or deny the current question?

Comment 8 Marko Myllynen 2014-04-04 11:03:48 UTC
(In reply to Marko Myllynen from comment #6)
> If the purpose of the timeout is to prevent the system to end up in an
> unknown state, then it's not working, see comment 1.

Sorry, meant the description here, i.e., https://bugzilla.redhat.com/show_bug.cgi?id=1084401#c0.

Comment 9 David Howells 2014-04-04 13:16:15 UTC
(In reply to Jóhann B. Guðmundsson from comment #5)
> (In reply to Marko Myllynen from comment #4)
> > 
> > What's the benefit of the current scheme compared to timeoutless approach,
> > btw?
> 
> You wont risk leaving the computer hanging in unknown state by waiting for
> user input definitely

If you've supplied "systemd.confirm_spawn=1" on the command line then, yes, waiting indefinitely *is* the right thing to do.  That's what you've been instructed to do: do it.

Comment 10 Zbigniew Jędrzejewski-Szmek 2014-04-05 05:35:43 UTC
(In reply to David Howells from comment #9)
> (In reply to Jóhann B. Guðmundsson from comment #5)
> > (In reply to Marko Myllynen from comment #4)
> > > 
> > > What's the benefit of the current scheme compared to timeoutless approach,
> > > btw?
> > 
> > You wont risk leaving the computer hanging in unknown state by waiting for
> > user input definitely
> 
> If you've supplied "systemd.confirm_spawn=1" on the command line then, yes,
> waiting indefinitely *is* the right thing to do.  That's what you've been
> instructed to do: do it.
I guess that the timeout is motivated by the fact that sometimes keyboard input does not work and it might be preferable to continue... But this seems a bit of a corner case, and if that happens, one can just reboot and avoid specyfing this option again. I agree that if one specifies confirm_spawn=1, we should wait forever. It's not only the case of leaving the boot unattended, but also of e.g. trying to debug things during boot by logging in at a different console or something like that.

Comment 11 Lennart Poettering 2014-06-17 11:54:38 UTC
Well, it wouldn't really work not making this timeout, since we cannot suspend all timeouts system-wide. So even the question in the fg doesn't timeout anymore, other bits might and still make you very unhappy. That's why we timeout the fg question: so that the other timeouts that wait for the job you are running in some way or another don't timeout too early.

Unless we start to invent a scheme how you can establish a system-wide "stall" for all timeouts I don't think we can do anything about this, and just dropping the timeout here, will not buy you anything...

Comment 12 Fedora End Of Life 2015-05-29 11:27:27 UTC
This message is a reminder that Fedora 20 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 20. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '20'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 20 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 13 Jan Synacek 2016-01-27 15:01:22 UTC
Since comment 11 suggests that this *might* be implemented in the future, I'm forwarding this to upstream. https://github.com/systemd/systemd/issues/2452


Note You need to log in before you can comment on or make changes to this bug.