Bug 807897 - RFE: Need the ability for respawn to be disabled programatically
RFE: Need the ability for respawn to be disabled programatically
Status: NEW
Product: Fedora
Classification: Fedora
Component: systemd (Show other bugs)
rawhide
Unspecified Unspecified
unspecified Severity high
: ---
: ---
Assigned To: systemd-maint
Fedora Extras Quality Assurance
: Triaged
Depends On:
Blocks: systemd-RFE
  Show dependency treegraph
 
Reported: 2012-03-29 01:19 EDT by Andrew Beekhof
Modified: 2014-03-16 23:40 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andrew Beekhof 2012-03-29 01:19:18 EDT
Description of problem:

The final bug from discussions at LinuxCon Prague with Lennart and Kay.

It is common for high availability clusters to mange system services (like SYSV/upstart/systemd).

In such scenarios, it would be problematic if systemd adhered to the respawn/recovery policy in the unit file.  Doing so would confuse the cluster and, depending on the timing, mask problems which would inhibit the recovery preferences configured by the admin in the cluster.

The use of override files was discussed and ultimately deemed unsuitable as the behaviour should depend on who starts the service.  Eg. Taking the service away from the cluster and starting it manually should use the respawn policy from the unit file (and vice-versa).

So the request is for there to be a provision that we can disable any configured respawn policy when we send a message via dbus to start a service.
Comment 1 Andrew Beekhof 2012-06-25 00:22:45 EDT
Changing the org.freedesktop.systemd1.Unit interface to allow
  <property name="OnFailure" type="as" access="read"/>
to be read/write would probably solve it.

Would that be possible?
Comment 2 Lennart Poettering 2013-01-14 16:22:03 EST
Hmm, so we now have a nice way to turn off restart of a service persistently (via a drop-in file in /etc), and from inside the service one-time (via RestartPreventExitStatus=). But you are looking for a way how you can turn this off, from the outside, but still only for a single run?
Comment 3 Andrew Beekhof 2013-01-15 02:46:59 EST
Essentially, yes.
Is comment #1 feasible?
Comment 4 Lennart Poettering 2013-01-15 17:57:10 EST
(In reply to comment #3)
> Essentially, yes.
> Is comment #1 feasible?

Well, it's not that easy. We drop all configuration when "systemctl daemon-reload" is invoked, and then reread all settings fresh. That would mean if we'd make the prop writable it would be lost on the next reload, which is probably not what you want.

Here's an idea. On git versions of systemd (i.e. F19 material, but also RHEL7) you can do the following:

# mkdir -p /run/systemd/system/mydaemon.service.d/
# cat > /run/systemd/system/mydaemon.service.d/50-turn-off-automatic-restart.conf << EOF
[Service]
Restart=no
EOF

i.e. by dropping in a simple config snippet you can alter mydaemon.service as required. By doing this in /run, rather than in /etc this change will be temporary only, i.e. lost at next reboot, but will stay around on "systemctl daemon-reload".

That way the cluster software could create a simple file to torn off auto-restart, and then reenable it by deleting that file again.

Does that make sense? Would that work for you?
Comment 5 Andrew Beekhof 2013-02-06 01:19:41 EST
Sorry for the delay, some 16-node clusters were busy kicking my ass :)

The /run option is less worse than /etc, but still not as preferable as an API call.  Cluster software generally tries not to write to disk if it can be avoided.

We can probably make it work if we have to though.
Comment 6 Jóhann B. Guðmundsson 2013-02-06 03:17:05 EST
Hmm is this not being approached wrong as I see it the units are behaving correctly and the only bad restart option is "always" 

What's missing, is for systemd to be able to signal the HA/Cluster applications that it has declare the service ( even whole targets and containers ) dead and it the cluster should take action, with a knob in systemd.conf which would alter distribution's default systemd profile to HA/Cluster one which for example set the correct restart behaviour global for units etc ( not be changing this in units/targets ) anyway this is probably something that needs/should be discussed at systemd/FAD BRNO.
Comment 7 Jóhann B. Guðmundsson 2013-02-06 03:26:30 EST
(In reply to comment #6)
> Hmm is this not being approached wrong as I see it the units are behaving
> correctly and the only bad restart option is "always" 
> 
> What's missing, is for systemd to be able to signal the HA/Cluster
> applications that it has declare the service ( even whole targets and
> containers ) dead and it the cluster should take action, with a knob in
> systemd.conf which would alter distribution's default systemd profile to
> HA/Cluster one which for example set the correct restart behaviour global
> for units etc ( not be changing this in units/targets ) anyway this is
> probably something that needs/should be discussed at systemd/FAD BRNO.

OnFailure= should probably be extended to signal the ha/cluster software and or another systemd instance running on another host, to take action for active/standby failover setups.
Comment 8 Andrew Beekhof 2013-02-06 23:26:02 EST
Currently we do polling to check the status of systemd based services.
So we do notice when services die, but finding out asynchronously would be even better.

However even if there is support for OnFailure=tell-the-cluster, ideally we'd still want a way to allow the cluster to set that programatically.
Comment 9 Lennart Poettering 2013-03-07 21:02:43 EST
(In reply to comment #5)
> Sorry for the delay, some 16-node clusters were busy kicking my ass :)
> 
> The /run option is less worse than /etc, but still not as preferable as an
> API call.  Cluster software generally tries not to write to disk if it can
> be avoided.

Well, /run is not "disk", it's a tmpfs So you are not actually writing to disk there... It's basically a way to communicate a runtime setting, not more...
Comment 10 Andrew Beekhof 2013-06-16 21:54:43 EDT
Is there a #define or pkg-config variable that holds the /run/systemd/system/ path?
Comment 11 Andrew Beekhof 2013-06-16 22:09:01 EDT
Also, is the equivalent of "systemctl daemon-reload" required after creating/deleting one of these files?
Comment 12 Harald Hoyer 2013-06-17 08:04:04 EDT
(In reply to Andrew Beekhof from comment #10)
> Is there a #define or pkg-config variable that holds the
> /run/systemd/system/ path?

Hmm, nothing except:

/usr/share/pkgconfig/systemd.pc:

systemdsystemunitpath=${systemdsystemconfdir}:/etc/systemd/system:/run/systemd/system:/usr/local/lib/systemd/system:${systemdsystemunitdir}:/usr/lib/systemd/system:/lib/systemd/system
Comment 13 Harald Hoyer 2013-06-17 08:05:01 EDT
(In reply to Harald Hoyer from comment #12)
> (In reply to Andrew Beekhof from comment #10)
> > Is there a #define or pkg-config variable that holds the
> > /run/systemd/system/ path?
> 
> Hmm, nothing except:
> 
> /usr/share/pkgconfig/systemd.pc:
> 
> systemdsystemunitpath=${systemdsystemconfdir}:/etc/systemd/system:/run/
> systemd/system:/usr/local/lib/systemd/system:${systemdsystemunitdir}:/usr/
> lib/systemd/system:/lib/systemd/system

/run/systemd is pretty much hardcoded in systemd, so I think it will never change.
Comment 14 Harald Hoyer 2013-06-17 08:05:39 EDT
(In reply to Andrew Beekhof from comment #11)
> Also, is the equivalent of "systemctl daemon-reload" required after
> creating/deleting one of these files?

yes
Comment 15 Andrew Beekhof 2013-06-19 00:20:45 EDT
(In reply to Harald Hoyer from comment #14)
> (In reply to Andrew Beekhof from comment #11)
> > Also, is the equivalent of "systemctl daemon-reload" required after
> > creating/deleting one of these files?
> 
> yes

That kinda sucks.

Isn't there any way to limit the scope to that one resource?
Otherwise thats a rather heavy stick to be throwing around.

Note You need to log in before you can comment on or make changes to this bug.