Bug 2051991

Summary: systemd complains that sendmail and sm-client PID files can't be read after service start
Product: Red Hat Enterprise Linux 8 Reporter: Jonathan Kamens <jik>
Component: systemdAssignee: systemd maint <systemd-maint>
Status: CLOSED WONTFIX QA Contact: Frantisek Sumsal <fsumsal>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: CentOS StreamCC: bstinson, fhrdina, jwboyer, systemd-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-08-08 07:28:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
patch none

Description Jonathan Kamens 2022-02-08 13:42:50 UTC
Created attachment 1859798 [details]
patch

Jan 17 18:25:32 jik4 systemd[1]: sm-client.service: Failed to parse PID from file /run/sm-client.pid: Invalid argument
Jan 17 20:25:03 jik4 systemd[1]: sendmail.service: Can't open PID file /run/sendmail.pid (yet?) after start: No such file or directory

The fix is to put a brief sleep in an ExecStartPost command in the service unit file to give them time to fork and create their PID in the child.

See the attached patch, which introduces delays which I've empirically determined (by restarting sendmail and sm-client daily with these delays in place for several weeks) are long enough to give the PID files time to be created.

Comment 1 Jonathan Kamens 2022-02-12 21:18:57 UTC
0.2 seconds, the delay in my diff, apparently isn't a long enough sleep on reboot. I just rebooted and got the error about sendmail.pid not existing yet. I suggest using 0.3 instead of 0.2.

Comment 2 Jonathan Kamens 2022-02-19 20:28:19 UTC
Welp, apparently 0.3 isn't long enough either, so I suppose at least 0.4 seconds is necessary. Just rounding up to 1 second I suppose wouldn't hurt anything and would give a big error bar.

Comment 4 Jaroslav Škarvada 2023-07-13 16:12:17 UTC
This is both systemd.

> Jan 17 18:25:32 jik4 systemd[1]: sm-client.service: Failed to parse PID from file /run/sm-client.pid: Invalid argument

This is because the /run/sm-client.pid historically contains two lines:
# cat /run/sm-client.pid
6018
/usr/sbin/sendmail -L sm-msp-queue -Ac -q1h

I.e. the PID and the command. It behaves this way for decades and it's unlikely sendmail upstream would accept any change of it due to the backward compatibility, thus systemd should cope with it.

> Jan 17 20:25:03 jik4 systemd[1]: sendmail.service: Can't open PID file /run/sendmail.pid (yet?) after start: No such file or directory

This is because of the way the sendmail daemon is written - the parent process can exit before the child PID is written. Again it behaves this way for decades and it's unlikely sendmail upstream would accept any rewrite of the daemon forking routine. AFAIK systemd contains some mitigations for the legacy daemons behaving this way, thus I don't know why it doesn't work in your case. I think for the forking service the systemd could wait for the PID, e.g. 1 second wait limit shouldn't cause any harm. Empiric sleeps in the service file aren't the way to go. Maybe it's just the diagnostic message from the systemd that the PID isn't there at the time the parent process exited and it will correctly read the PID once it appears. In such case there could be a mechanism how to silence such diagnostic messages on the production system.

Comment 5 RHEL Program Management 2023-08-08 07:28:35 UTC
After evaluating this issue, there are no plans to address it further or fix it in an upcoming release.  Therefore, it is being closed.  If plans change such that this issue will be fixed in an upcoming release, then the bug can be reopened.