Bug 982376

Summary: systemd-notify doesn't work, since it lives too short
Product: Red Hat Enterprise Linux 7 Reporter: Chris Feist <cfeist>
Component: systemdAssignee: systemd-maint
Status: CLOSED DUPLICATE QA Contact: qe-baseos-daemons
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: apevec, btotty, cfeist, derekh, johannbg, jpazdziora, jruemker, jscotka, jsynacek, lnykryn, metherid, mschmidt, msekleta, plautrba, riehecky, rvokal, systemd-maint-list, systemd-maint, tlavigne
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: systemd-219 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 820448 Environment:
Last Closed: 2016-01-25 21:25:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1117877    
Bug Blocks: 959971, 812219, 1113520, 1203710    

Description Chris Feist 2013-07-08 20:43:40 UTC
+++ This bug was initially created as a clone of Bug #820448 +++

Using a test systemd service unit with 
Type=notify

and using a test script that contains
systemd-notify --ready
the notify is sometimes successfull and sometimes it fails with the following message in /var/log/messages
May 10 00:37:31 laptop systemd[1]: Cannot find unit for notify message of PID 3495.

when the command is successfull
> sudo systemctl start test1.service

returns almost immediately, when it fails it times out after 90 seconds

below are the files I am using along package versions

$ rpm -qa | grep -i -e systemd
systemd-units-37-19.fc16.x86_64
systemd-37-19.fc16.x86_64
systemd-sysv-37-19.fc16.x86_64

$ getenforce 
Permissive

$ cat /lib/systemd/system/test1.service 
[Unit]
Description=Tests

[Service]
User=derekh
Type=notify
ExecStart=/tmp/testservice
NotifyAccess=all

$ cat /tmp/testservice
#!/bin/bash

sleep 1

systemd-notify --ready

echo Sleeping
sleep 300
echo Done

--- Additional comment from Michal Schmidt on 2012-05-10 04:36:44 EDT ---

(In reply to comment #0)
> May 10 00:37:31 laptop systemd[1]: Cannot find unit for notify message of PID
> 3495.

systemd-notify sends a message to $NOTIFY_SOCKET and then exits.

When systemd receives the notification, the systemd-notify process may have already exited and been reaped by bash.

Ideally the cgroups membership information would be delivered with the message over the socket.

From http://0pointer.de/blog/projects/plumbers-wishlist-3.html:

AF_UNIX:

* An auxiliary meta data message for AF_UNIX called SCM_CGROUPS (or something like that), i.e. a way to attach sender cgroup membership to messages sent via AF_UNIX. This is useful in case services such as syslog shall be shared among various containers (or service cgroups), and the syslog implementation needs to be able to distinguish the sending cgroup in order to separate the logs on disk. Of course stm SCM_CREDENTIALS can be used to look up the PID of the sender followed by a check in /proc/$PID/cgroup, but that is necessarily racy, and actually a very real race in real life.


As an alternative fix, the notification could be made synchronous.

--- Additional comment from Alan Pevec on 2012-06-27 13:41:54 EDT ---

(In reply to comment #1)
> systemd-notify sends a message to $NOTIFY_SOCKET and then exits.
> 
> When systemd receives the notification, the systemd-notify process may have
> already exited and been reaped by bash.

So, as a work-around, daemon process itself should send READY=1 to $NOTIFY_SOCKET
instead of forking "systemd-notify" command.
We need this for openstack daemons, here's _untested_ code in Python
(after sd_notify implementation in sd-daemon.c):

import socket
import os
s = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)
e = os.getenv('NOTIFY_SOCKET')
s.connect(e)
s.sendall("READY=1")
s.close()

--- Additional comment from Alan Pevec on 2012-09-18 10:43:15 EDT ---

Script in comment 2 doesn't work with systemd-188-3.fc18 where NOTIFY_SOCKET is abstract namespace socket:
http://cgit.freedesktop.org/systemd/systemd/commit/?id=29252e9e5bad3b0bcfc45d9bc761aee4b0ece1da

It needs special handling if notification socket starts with @: convert to bytes and replace '@' with 0.

--- Additional comment from Alan Pevec on 2012-09-18 18:36:01 EDT ---

Patch for example script in comment 2:

 e = os.getenv('NOTIFY_SOCKET')
+if e.startswith('@'):
+    # abstract namespace socket
+    e = '\0%s' % e[1:]
 s.connect(e)

--- Additional comment from Fedora End Of Life on 2013-04-03 13:44:40 EDT ---

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 4 RHEL Program Management 2014-03-24 05:51:25 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 7 Jan Synacek 2015-06-24 11:36:11 UTC
The original bug has been fixed a long time ago.

Comment 8 Lukáš Nykrýn 2015-06-24 12:27:56 UTC
This still does not work from systemd point of view.