802748 – RFE: option to remove socket/fifo when the socket unit is stopped

Bug 802748 - RFE: option to remove socket/fifo when the socket unit is stopped

Summary: RFE: option to remove socket/fifo when the socket unit is stopped

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	systemd
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Assignee:	systemd-maint
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	systemd-RFE
TreeView+	depends on / blocked

Reported:	2012-03-13 12:36 UTC by Peter Rajnoha
Modified:	2015-12-02 00:21 UTC (History)
CC List:	10 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2014-06-13 13:50:17 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	737264	0	unspecified	CLOSED	Provide native systemd service files	2021-02-22 00:41:40 UTC

Internal Links: 737264

Description Peter Rajnoha 2012-03-13 12:36:26 UTC

When calling "systemctl stop" (or a result of automatic stop based on some dependencies defined within units) for a socket unit, systemd should actually remove the socket/fifo, not leaving it in the system.

This comes handy if trying to quickly detect whether the daemon is running or not by directly doing a simple "stat" in the filesystem for the socket/fifo (resp. the daemon is prepared to be run through the systemd socket activation mechanism).

Otherwise, when the socket/fifo stays in the system even when the socket unit is stopped, we still need to do extra work to actually see if the socket is usable or not by means of trying to connect to it which ends up with unnecessary error state in case it's not usable...

Using "pid" files (which are deprecated anyway) is not usable as well since the pid file does not exist yet if systemd socket activation is used.

Comment 1 Michal Schmidt 2012-03-13 12:51:21 UTC

Not removing the socket/FIFO file from the filesystem is currently intentional.

This comment is in socket_close_fds():

                /* One little note: we should never delete any sockets
                 * in the file system here! After all some other
                 * process we spawned might still have a reference of
                 * this fd and wants to continue to use it. Therefore
                 * we delete sockets in the file system before we
                 * create a new one, not after we stopped using
                 * one! */

But I can see how leaving a FIFO file around may cause problems for unsuspecting clients.
I do not quite understand the reasoning given in the comment. If I do "systemctl stop foo.socket", I should not be surprised that no more clients can connect to the service. I do not see what the problem would be if we deleted the socket/FIFO files on socket unit stop.

Lennart, am I missing something?

Comment 2 Michal Schmidt 2012-03-13 13:05:48 UTC

This is especially a problem with FIFOs, because opening an unused FIFO will simply block indefinitely.

Comment 3 Lennart Poettering 2012-03-13 14:37:52 UTC

Consider this: a service A.service uses a socket A.socket. Now both are started, and then you stop A.socket but A.service continues to run. I think in that case the service should have every right to continue use it, and we shouldn't break its current usage. In fact I think I originally deleted those sockets, but ran into a problem with the deletion behaviour which is why added this comment. Don't remember the problem in question however.

I think it is quite crucial here to understand that "a socket unit is running" means "systemd will do actviation for it". It does not mean "this socket exists in the file system".

Now, what I might be open to here, is to add an option "DeleteSockets=yes" or so (which defaults to no), to optionally enable the behaviour Peter asks for.

Comment 4 Lennart Poettering 2012-03-13 14:41:26 UTC

(In reply to comment #3)
> Consider this: a service A.service uses a socket A.socket. Now both are
> started, and then you stop A.socket but A.service continues to run. I think in
> that case the service should have every right to continue use it, and we
> shouldn't break its current usage. In fact I think I originally deleted those
> sockets, but ran into a problem with the deletion behaviour which is why added
> this comment. Don't remember the problem in question however.

to extend on this: we have a couple of services that you can run with and without socket activation. If we care about supporting those we probably should stay away as much as possible from deleting sockets unless we really have to.

Comment 5 Lennart Poettering 2012-03-13 14:43:48 UTC

(In reply to comment #2)
> This is especially a problem with FIFOs, because opening an unused FIFO will
> simply block indefinitely.

If you add O_NDELAY to your open flags if you open the fifo then they won't block?

Comment 6 Lennart Poettering 2012-03-13 15:02:28 UTC

(In reply to comment #3)
> Consider this: a service A.service uses a socket A.socket. Now both are
> started, and then you stop A.socket but A.service continues to run. I think in
> that case the service should have every right to continue use it, and we
> shouldn't break its current usage. In fact I think I originally deleted those
> sockets, but ran into a problem with the deletion behaviour which is why added
> this comment. Don't remember the problem in question however.

An example: udev has told systemd to listen on an AF_UNIX socket on its behalf. If now udev's socket unit is stopped it's not systemd's call to break udev, it must be udev's control when that happens. I.e. systemd is only working here on behalf of somebody else and hence should muck with the stuff as little as possible.

But yeah, I think an option for this, DeleteSocketsOnStop=yes would make sense, so that we can cover Peter's usecase. (But then again, I think it would make a lot more sense if clients simply make use of sockets instead of checking whether they exitst before, because that is necessarily racy. And using FIFOs for any serious communication is very questionnable too, since FIFOs are vulnerable to multiple clients writing at the same time and getting their stuff interleaved. What are the DM tools doing there to avoid this problem, btw? It really sounds as if you should just use proper sockets instead of FIFOs and that your client should just connect to them, instead of having "first-stat()-then-connect()" logic)

Comment 7 Peter Rajnoha 2012-03-13 15:27:27 UTC

(In reply to comment #6)
> But yeah, I think an option for this, DeleteSocketsOnStop=yes would make sense,

Yes, that would be fine.

> so that we can cover Peter's usecase. (But then again, I think it would make a
> lot more sense if clients simply make use of sockets instead of checking
> whether they exitst before, because that is necessarily racy. And using FIFOs
> for any serious communication is very questionnable too, since FIFOs are
> vulnerable to multiple clients writing at the same time and getting their stuff
> interleaved. What are the DM tools doing there to avoid this problem, btw?

(as discussed on #systemd)

Comment 8 Zbigniew Jędrzejewski-Szmek 2013-04-14 00:39:46 UTC

I think that DeleteSocketsOnStop=yes/no wouldn't really solve the proposed use case, as one would have to first check if DeleteSocketsOnStop is set for a specific .socket, and then do the "quick stat check".

If anything, it seems to me that systemd could be smart enough to know if anything is serving on a socket. I.e. if systemd has activated a service that has Accept=no, and the service is still active, then don't delete the socket. Otherwise delete it when the socket is stopped.

Comment 9 Lennart Poettering 2013-04-15 17:07:58 UTC

(In reply to comment #8)
> I think that DeleteSocketsOnStop=yes/no wouldn't really solve the proposed
> use case, as one would have to first check if DeleteSocketsOnStop is set for
> a specific .socket, and then do the "quick stat check".

Well, but since the .socket unit would is usually written by the same folks who write the client library for this, they can rely on it being set.

Of course, doing this is racy, since there's a time window between when the process died and when systemd gets the SIGCHLD for it and can remove the socket. Which makes me a bit cool for adding this feature. We should do better than adding new functionality that is inherently racy from day 1.
 
> If anything, it seems to me that systemd could be smart enough to know if
> anything is serving on a socket. I.e. if systemd has activated a service
> that has Accept=no, and the service is still active, then don't delete the
> socket. Otherwise delete it when the socket is stopped.

Well, but that would open an additional race...

No sure how much I like this. Maybe we should find a different way to handle this usecase without mucking with the socket node...

Comment 10 Zbigniew Jędrzejewski-Szmek 2013-04-15 18:18:07 UTC

(In reply to comment #9)
> > If anything, it seems to me that systemd could be smart enough to know if
> > anything is serving on a socket. I.e. if systemd has activated a service
> > that has Accept=no, and the service is still active, then don't delete the
> > socket. Otherwise delete it when the socket is stopped.
> 
> Well, but that would open an additional race...
I don't see the race. systemd is running synchronously and in the .socket stop job it can check if the activated unit is active.

I see a different race (maybe that's what you're talking about): if the .service dies during .socket shutdown, we might *not* remove the FIFO or unix socket. But's that's current behaviour, and it's not really a race, since the file we'll also be left behind if the .service dies later.
 
> No sure how much I like this. Maybe we should find a different way to handle
> this usecase without mucking with the socket node...

Comment 11 Lennart Poettering 2013-05-06 17:08:31 UTC

(In reply to comment #10)
> (In reply to comment #9)
> > > If anything, it seems to me that systemd could be smart enough to know if
> > > anything is serving on a socket. I.e. if systemd has activated a service
> > > that has Accept=no, and the service is still active, then don't delete the
> > > socket. Otherwise delete it when the socket is stopped.
> > 
> > Well, but that would open an additional race...
> I don't see the race. systemd is running synchronously and in the .socket
> stop job it can check if the activated unit is active.

Well, it would be a race where clients would get different errors on shutdown... i.e. if they are quick enough they hang for a while and then, wehn systemd closes the socket they will get ECONNRESET or so. Or, if they are slower, then they will get ECONNREFUSED or so... 

Now, which error they get is probably not too important, but it does suck a bit if a client's connect() succeeded and then fails, rather than connect() failing right-away if you see what i mean...

Comment 12 Zbigniew Jędrzejewski-Szmek 2013-06-16 00:21:31 UTC

> Now, which error they get is probably not too important, but it does suck a
> bit if a client's connect() succeeded and then fails, rather than connect()
> failing right-away if you see what i mean...
Yeah, I see the trouble. And if the socket was deleted before the service was stopped? Are existing connections allowed to finish when the socket file is deleted?

Comment 13 Peter Rajnoha 2014-06-13 13:50:17 UTC

Seems this is now in latest systemd version 214 - the "RemoveOnStop" option. Thanks!

Note You need to log in before you can comment on or make changes to this bug.