Description of problem: sd_notify() does not work anymore with systemd-246.6-3.fc33.x86_64 and timeouts while waiting for messages from /run/systemd/notify: | watchdog.sh[5582]: Failed to invoke barrier: Connection timed out E.g. I have a service ---------- [Unit] Description=Local watchdog service After=local-fs.target [Service] Type=notify ExecStart=/usr/local/bin/watchdog.sh TimeoutStartSec=20 WatchdogSec=80 Restart=on-failure RestartSec=5 StartLimitInterval=5min StartLimitBurst=4 #StartLimitAction=reboot-force NotifyAccess=all [Install] WantedBy=multi-user.target ---------- where /usr/local/bin/watchdog.sh does a | systemd-notify --pid=$$ --ready This message is never received by systemd. E.g. starting the service shows | # strace -t -f -p 1 | ... | 5559 1606819757.618195 sendmsg(3, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=22, msg_iov=[{iov_base="READY=1\nMAINPID=5558", iov_len=20}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=5558, uid=0, gid=0}}], msg_controllen=32, msg_flags=0}, MSG_NOSIGNAL) = 20 | ... | 5559 1606819757.618721 sendmsg(5, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=22, msg_iov=[{iov_base="BARRIER=1", iov_len=9}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], msg_controllen=24, msg_flags=0}, MSG_NOSIGNAL) = 9 | ... | 5559 1606819757.618819 ppoll([{fd=3, events=0}], 1, {tv_sec=5, tv_nsec=0}, NULL, 8) = 0 (Timeout) | ... | 5559 1606819762.624536 writev(2, [{iov_base="\33[0;1;31m", iov_len=9}, {iov_base="Failed to invoke barrier: Connec"..., iov_len=46}, {iov_base="\33[0m", iov_len=4}, {iov_base="\n", iov_len=1}], 4) = 60 | ... | 5558 1606819762.629753 +++ exited with 1 +++ | 1 1606819762.629828 <... epoll_wait resumed>[{EPOLLIN, {u32=2106666096, u64=93868616920176}}, {EPOLLIN, {u32=2106666768, u64=93868616920848}}], 126, 20573) = 2 This means, systemd (pid 1) never saw messages receiving in /run/systemd/notify. But systemd listens on this socket | # lsof -p 1 | systemd 1 root 30u unix 0x00000000917b3fba 0t0 18667 /run/systemd/notify type=DGRAM Restarting systemd shows some messages in journalctl | # systemctl daemon-reexec | # journalctl | Dez 01 12:04:57 systemd[1]: Cannot find unit for notify message of PID 4175, ignoring. | Dez 01 12:04:57 systemd[1]: Cannot find unit for notify message of PID 4230, ignoring. | Dez 01 12:04:57 systemd[1]: Failed to receive notification message: Exchange full Problem has been shown during an update from F32 (where it worked flawlessly) to F33. Version-Release number of selected component (if applicable): systemd-246.6-3.fc33.x86_64 How reproducible: 100% Steps to Reproduce: 1. create services + the script as shown above 2. try to start the service Actual results: service fails to start
Reload systemd several times (system-ctl dameon-reload) until only "Failed to receive notification message: Exchange full" is reported but not not "Cannot find unit for notify message ..." shows that the next messages are accepted again. But then systemd enters the "Exchange full" state immediately: ----- 10199 sendmsg(3, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=22, msg_iov=[{iov_base="READY=1\nMAINPID=10198", iov_len=21}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=10198, uid=0, gid=0}}], msg_controllen=32, msg_flags=0}, MSG_NOSIGNAL) = 21 10199 close(3) = 0 10199 pipe2( <unfinished ...> 1 <... epoll_wait resumed>[{EPOLLIN, {u32=342814432, u64=94051536662240}}], 137, 25099) = 1 10199 <... pipe2 resumed>[3, 4], O_CLOEXEC) = 0 10199 socket(AF_UNIX, SOCK_DGRAM|SOCK_CLOEXEC, 0) = 5 10199 getsockopt(5, SOL_SOCKET, SO_SNDBUF, <unfinished ...> 1 recvmsg(32, <unfinished ...> 10199 <... getsockopt resumed>[212992], [4]) = 0 10199 setsockopt(5, SOL_SOCKET, SO_SNDBUF, [8388608], 4) = 0 10199 getsockopt(5, SOL_SOCKET, SO_SNDBUF, <unfinished ...> 1 <... recvmsg resumed>{msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="READY=1\nMAINPID=10198", iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=10198, uid=0, gid=0}}], msg_controllen=32, msg_flags=MSG_CMSG_CLOEXEC}, MSG_TRUNC|MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 21 10199 <... getsockopt resumed>[425984], [4]) = 0 10199 setsockopt(5, SOL_SOCKET, SO_SNDBUFFORCE, [8388608], 4) = 0 10199 getuid() = 0 1 openat(AT_FDCWD, "/proc/10198/cgroup", O_RDONLY|O_CLOEXEC <unfinished ...> 10199 geteuid() = 0 10199 getgid() = 0 10199 getegid() = 0 1 <... openat resumed>) = 86 10199 sendmsg(5, {msg_name={sa_family=AF_UNIX, sun_path="/run/systemd/notify"}, msg_namelen=22, msg_iov=[{iov_base="BARRIER=1", iov_len=9}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], msg_controllen=24, msg_flags=0}, MSG_NOSIGNAL <unfinished ...> 1 fstat(86, <unfinished ...> 10199 <... sendmsg resumed>) = 9 10199 close(5 <unfinished ...> 1 <... fstat resumed>{st_mode=S_IFREG|0444, st_size=0, ...}) = 0 10199 <... close resumed>) = 0 1 read(86, <unfinished ...> 10199 close(4) = 0 1 <... read resumed>"0::/system.slice/local-watchdog.service\n", 1024) = 40 10199 ppoll([{fd=3, events=0}], 1, {tv_sec=5, tv_nsec=0}, NULL, 8 <unfinished ...> 1 ioctl(86, TCGETS, 0x7fff9ee4f910) = -1 ENOTTY (Unpassender IOCTL (I/O-Control) für das Gerät) 1 read(86, "", 1024) = 0 1 close(86) = 0 1 sendmsg(31, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=src/core/job.c\nCODE_LINE=941\nCODE_FUNC=job_log_done_status_m"..., iov_len=132}, {iov_base="MESSAGE=Started Local watchdog service.", iov_len=39}, {iov_base="\n", iov_len=1}, {iov_base="JOB_ID=25283", iov_len=12}, {iov_base="\n", iov_len=1}, {iov_base="JOB_TYPE=start", iov_len=14}, {iov_base="\n", iov_len=1}, {iov_base="JOB_RESULT=done", iov_len=15}, {iov_base="\n", iov_len=1}, {iov_base="UNIT=local-watchdog.service", iov_len=27}, {iov_base="\n", iov_len=1}, {iov_base="INVOCATION_ID=2045e8e776b3405d9e577f57403837fe", iov_len=46}, {iov_base="\n", iov_len=1}, {iov_base="MESSAGE_ID=39f53479d3a045ac8e11786248231fbf", iov_len=43}, {iov_base="\n", iov_len=1}], msg_iovlen=15, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 335 1 sendmsg(26, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0013\3\0\0\24\0\0\0\306\0\0\0\1\1o\09\0\0\0/org/freedesktop/systemd1/unit/local_2dwatchdog_2eservice\0\0\0\0\0\0\0\2\1s\0\37\0\0\0org"..., iov_len=216}, {iov_base=" \0\0\0org.freedesktop.systemd1.Service\0\0\0\0\374\1\0\0\0\0\0\0\7\0\0\0MainPID\0\1u\0\0\326'\0\0\0\0\0\0\n\0\0\0ControlPID\0\1u\0\0\0\0\0\0\0\n\0\0"..., iov_len=819}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 1035 1 sendmsg(26, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\244\3\0\0\25\0\0\0\306\0\0\0\1\1o\09\0\0\0/org/freedesktop/systemd1/unit/local_2dwatchdog_2eservice\0\0\0\0\0\0\0\2\1s\0\37\0\0\0org"..., iov_len=216}, {iov_base="\35\0\0\0org.freedesktop.systemd1.Unit\0\0\0\\\3\0\0\v\0\0\0ActiveState\0\1s\0\0\6\0\0\0active\0\0\f\0\0\0FreezerState\0\1s\0\7\0\0\0run"..., iov_len=932}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 1148 1 sendmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0013\3\0\0\327\20\1\0\236\0\0\0\1\1o\09\0\0\0/org/freedesktop/systemd1/unit/local_2dwatchdog_2eservice\0\0\0\0\0\0\0\2\1s\0\37\0\0\0org"..., iov_len=176}, {iov_base=" \0\0\0org.freedesktop.systemd1.Service\0\0\0\0\374\1\0\0\0\0\0\0\7\0\0\0MainPID\0\1u\0\0\326'\0\0\0\0\0\0\n\0\0\0ControlPID\0\1u\0\0\0\0\0\0\0\n\0\0"..., iov_len=819}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 995 1 sendmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\244\3\0\0\330\20\1\0\236\0\0\0\1\1o\09\0\0\0/org/freedesktop/systemd1/unit/local_2dwatchdog_2eservice\0\0\0\0\0\0\0\2\1s\0\37\0\0\0org"..., iov_len=176}, {iov_base="\35\0\0\0org.freedesktop.systemd1.Unit\0\0\0\\\3\0\0\v\0\0\0ActiveState\0\1s\0\0\6\0\0\0active\0\0\f\0\0\0FreezerState\0\1s\0\7\0\0\0run"..., iov_len=932}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 1108 1 sendmsg(26, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1Q\0\0\0\26\0\0\0\242\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0"..., iov_len=184}, {iov_base="\303b\0\0#\0\0\0/org/freedesktop/systemd1/job/25283\0\26\0\0\0local-watchdog.service\0\0\4\0\0\0done\0", iov_len=81}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 265 1 sendmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1Q\0\0\0\331\20\1\0z\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0 \0\0\0org.freedesktop.systemd1.Manager\0\0\0"..., iov_len=144}, {iov_base="\303b\0\0#\0\0\0/org/freedesktop/systemd1/job/25283\0\26\0\0\0local-watchdog.service\0\0\4\0\0\0done\0", iov_len=81}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 225 1 close(29) = 0 1 close(30) = 0 1 close(34) = 0 1 close(36) = 0 1 ioctl(0, TCGETS, 0x7fff9ee4b260) = -1 ENOTTY (Unpassender IOCTL (I/O-Control) für das Gerät) 1 ioctl(1, TCGETS, 0x7fff9ee4b260) = -1 ENOTTY (Unpassender IOCTL (I/O-Control) für das Gerät) 1 ioctl(2, TCGETS, 0x7fff9ee4b260) = -1 ENOTTY (Unpassender IOCTL (I/O-Control) für das Gerät) 1 sendto(12, {{len=124, type=0x46a /* AUDIT_??? */, flags=NLM_F_REQUEST|NLM_F_ACK, seq=24, pid=0}, "\x75\x6e\x69\x74\x3d\x6c\x6f\x63\x61\x6c\x2d\x77\x61\x74\x63\x68\x64\x6f\x67\x20\x63\x6f\x6d\x6d\x3d\x22\x73\x79\x73\x74\x65\x6d\x64\x22\x20\x65\x78\x65\x3d\x22\x2f\x75\x73\x72\x2f\x6c\x69\x62\x2f\x73\x79\x73\x74\x65\x6d\x64\x2f\x73\x79\x73\x74\x65\x6d\x64\x22\x20\x68\x6f\x73\x74\x6e\x61\x6d\x65\x3d\x3f\x20\x61\x64\x64\x72\x3d\x3f\x20\x74\x65\x72\x6d\x69\x6e\x61\x6c\x3d\x3f\x20\x72\x65\x73\x3d"...}, 124, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 124 1 poll([{fd=12, events=POLLIN}], 1, 500) = 1 ([{fd=12, revents=POLLIN}]) 1 recvfrom(12, {{len=36, type=NLMSG_ERROR, flags=NLM_F_CAPPED, seq=24, pid=-548192972}, {error=0, msg={len=124, type=0x46a /* AUDIT_??? */, flags=NLM_F_REQUEST|NLM_F_ACK, seq=24, pid=0}}}, 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, [12]) = 36 1 recvfrom(12, {{len=36, type=NLMSG_ERROR, flags=NLM_F_CAPPED, seq=24, pid=-548192972}, {error=0, msg={len=124, type=0x46a /* AUDIT_??? */, flags=NLM_F_REQUEST|NLM_F_ACK, seq=24, pid=0}}}, 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, [12]) = 36 1 socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 29 1 connect(29, {sa_family=AF_UNIX, sun_path=@"/org/freedesktop/plymouthd"}, 29) = -1 ECONNREFUSED (Verbindungsaufbau abgelehnt) 1 close(29) = 0 1 sendmsg(26, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0013\3\0\0\27\0\0\0\306\0\0\0\1\1o\09\0\0\0/org/freedesktop/systemd1/unit/local_2dwatchdog_2eservice\0\0\0\0\0\0\0\2\1s\0\37\0\0\0org"..., iov_len=216}, {iov_base=" \0\0\0org.freedesktop.systemd1.Service\0\0\0\0\374\1\0\0\0\0\0\0\7\0\0\0MainPID\0\1u\0\0\326'\0\0\0\0\0\0\n\0\0\0ControlPID\0\1u\0\0\0\0\0\0\0\n\0\0"..., iov_len=819}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE (Datenübergabe unterbrochen (broken pipe)) 1 sendmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\0013\3\0\0\332\20\1\0\236\0\0\0\1\1o\09\0\0\0/org/freedesktop/systemd1/unit/local_2dwatchdog_2eservice\0\0\0\0\0\0\0\2\1s\0\37\0\0\0org"..., iov_len=176}, {iov_base=" \0\0\0org.freedesktop.systemd1.Service\0\0\0\0\374\1\0\0\0\0\0\0\7\0\0\0MainPID\0\1u\0\0\326'\0\0\0\0\0\0\n\0\0\0ControlPID\0\1u\0\0\0\0\0\0\0\n\0\0"..., iov_len=819}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 995 1 sendmsg(18, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="l\4\1\1\204\3\0\0\333\20\1\0\236\0\0\0\1\1o\09\0\0\0/org/freedesktop/systemd1/unit/local_2dwatchdog_2eservice\0\0\0\0\0\0\0\2\1s\0\37\0\0\0org"..., iov_len=176}, {iov_base="\35\0\0\0org.freedesktop.systemd1.Unit\0\0\0<\3\0\0\v\0\0\0ActiveState\0\1s\0\0\6\0\0\0active\0\0\f\0\0\0FreezerState\0\1s\0\7\0\0\0run"..., iov_len=900}], msg_iovlen=2, msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 1076 1 gettid() = 1 1 epoll_ctl(4, EPOLL_CTL_MOD, 26, {0, {u32=344176544, u64=94051538024352}}) = 0 1 timerfd_settime(19, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, NULL) = 0 1 epoll_wait(4, [{EPOLLIN, {u32=342814432, u64=94051536662240}}, {EPOLLHUP, {u32=344176544, u64=94051538024352}}, {EPOLLIN, {u32=341727168, u64=94051535574976}}], 137, 25069) = 3 1 read(19, "\1\0\0\0\0\0\0\0", 8) = 8 1 recvmsg(32, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="BARRIER=1", iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=10199, uid=0, gid=0}}], msg_controllen=32, msg_flags=MSG_CTRUNC|MSG_CMSG_CLOEXEC}, MSG_TRUNC|MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 9 10199 <... ppoll resumed>) = 1 ([{fd=3, revents=POLLHUP}], left {tv_sec=4, tv_nsec=998263952}) 1 sendmsg(31, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="PRIORITY=3\nSYSLOG_FACILITY=3\nCODE_FILE=src/core/manager.c\nCODE_LINE=2397\nCODE_FUNC=manager_dispatch"..., iov_len=136}, {iov_base="MESSAGE=", iov_len=8}, {iov_base="Failed to receive notification message: Exchange full", iov_len=53}, {iov_base="\n", iov_len=1}], msg_iovlen=4, msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL <unfinished ...> 10199 close(3) = 0 1 <... sendmsg resumed>) = 198 1 epoll_ctl(4, EPOLL_CTL_DEL, 32, NULL) = 0 1 gettid() = 1 1 timerfd_settime(19, TFD_TIMER_ABSTIME, {it_interval={tv_sec=0, tv_nsec=0}, it_value={tv_sec=0, tv_nsec=1}}, NULL) = 0 1 epoll_wait(4, [{EPOLLHUP, {u32=344176544, u64=94051538024352}}, {EPOLLIN, {u32=341727168, u64=94051535574976}}], 137, 0) = 2 1 read(19, "\1\0\0\0\0\0\0\0", 8) = 8 10199 exit_group(0 <unfinished ...> 1 epoll_ctl(4, EPOLL_CTL_DEL, 26, NULL <unfinished ...> 10199 <... exit_group resumed>) = ? 1 <... epoll_ctl resumed>) = 0 1 close(26) = 0 10199 +++ exited with 0 +++ ---- E.g. the last message reports the "Exchange full".
In | 1 recvmsg(32, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="BARRIER=1", iov_len=4096}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_SOCKET, cmsg_type=SCM_CREDENTIALS, cmsg_data={pid=10199, uid=0, gid=0}}], msg_controllen=32, msg_flags=MSG_CTRUNC|MSG_CMSG_CLOEXEC}, MSG_TRUNC|MSG_DONTWAIT|MSG_CMSG_CLOEXEC) = 9 the MSG_TRUNC is translated to -EXFULL. It seems that in manager_dispatch_notify_fd() there is not enough space in the buffer so that systemd chokes on sd_notify_barrier() and never recovers (which is the real bad thing which opens vectors for denial-of-service attacks).
A more simple reproducer: NOTIFY_SOCKET=/run/systemd/notify systemd-notify --ready NOTIFY_SOCKET=/run/systemd/notify systemd-notify --ready --> second notify hangs
Caused by a dont-audit SELinux rule | type=AVC msg=audit(1606850930.657:1138): avc: denied { write } for pid=1 comm="systemd" path="pipe:[657841]" dev="pipefs" ino=657841 scontext=system_u:system_r:init_t:s0 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=fifo_file permissive=0 But problem with completely disabling notify after such conditions still stands.
FEDORA-2021-dcc8b357ea has been submitted as an update to Fedora 33. https://bodhi.fedoraproject.org/updates/FEDORA-2021-dcc8b357ea
FEDORA-2021-dcc8b357ea has been pushed to the Fedora 33 testing repository. Soon you'll be able to install the update with the following command: `sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-dcc8b357ea` You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-dcc8b357ea See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.
FEDORA-2021-dcc8b357ea has been pushed to the Fedora 33 stable repository. If problem still persists, please make note of it in this bug report.