RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 2028015 - postfix master.pid file not deleted at postfix stop if postfix had been started using postfix start command
Summary: postfix master.pid file not deleted at postfix stop if postfix had been start...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: postfix
Version: 8.4
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Jaroslav Škarvada
QA Contact: František Hrdina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-01 10:16 UTC by Roger Sewell
Modified: 2023-07-07 12:45 UTC (History)
9 users (show)

Fixed In Version: postfix-3.5.8-4.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-10 15:29:44 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Proposed fix (1014 bytes, patch)
2021-12-14 00:16 UTC, Jaroslav Škarvada
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-104427 0 None None None 2021-12-01 10:16:27 UTC
Red Hat Product Errata RHBA-2022:2091 0 None None None 2022-05-10 15:29:47 UTC

Description Roger Sewell 2021-12-01 10:16:09 UTC
Description of problem: If postfix is started from the command line rather than at boot, then when postfix is stopped (whether by postfix stop or by shutdown) the master.pid file doesn't get deleted, so then at next boot postfix doesn't start (but does delete the master.pid file).


Version-Release number of selected component (if applicable):
postfix 2:3.5.8-1.el8


How reproducible: Always


Steps to Reproduce (as root):
1. If postfix is already running, execute postfix stop.

2. If the just-stopped postfix had been started at boot time, then there should be and is no /var/spool/postfix/pid/master.pid file existing. (If it had been started by postfix start from the command line, this file still exists even though the pid in it has stopped.)

3. Execute postfix start.

4. Note that /var/spool/postfix/pid/master.pid now contains the pid of the running postfix/master process.

5. Execute postfix stop.

6. Note that /var/spool/postfix/pid/master.pid still exists, even though the master process has been stopped. (Don't delete it yet though.)

7. Reboot the machine.

8. Note that postfix fails to start, leaving the message 

postfix/postfix-script[1959]: fatal: the Postfix mail system is already running

in /var/log/maillog .

9. Note, however, that /var/spool/postfix/pid/master.pid has now been deleted.

10. Reboot the machine again - this time postfix starts normally.

Actual results: As noted in point 6 above.


Expected results: Whenever postfix is stopped, no matter how it has been started, the master.pid file should be deleted.


Additional info:

Comment 1 Jaroslav Škarvada 2021-12-08 20:32:19 UTC
It seems it's caused by the SELinux policy, try with the 'setenforce 0'. Unfortunately, the postfix-script doesn't remove the pid file upon stop. It's using flock to check the exclusive lock to find out whether the pid file is valid. It's upstream design decision. The following happens here:

- 'postfix start' creates and locks the /var/spool/postfix/pid/master.pid
- 'postfis stop' unlocks the /var/spool/postfix/pid/master.pid
- 'systemd start postfix' runs 'postfix start'
- it runs '/usr/libexec/postfix/postfix-script start'
- it runs '/usr/libexec/postfix/master -t' which tries to exclusively lock the /var/spool/postfix/pid/master.pid
- the flock call fails because it's not allowed by the SELinux policy
- '/usr/libexec/postfix/master -t' thinks that the flock fails because the daemon is already running
- the condition propagates back to the systemd that signals failed service

Caught AVCs and syscalls:
type=AVC msg=audit(1638995374.054:368): avc:  denied  { read write } for  pid=10520 comm="master" name="master.pid" dev="vda1" ino=14680387 scontext=system_u:system_r:postfix_master_t:s0 tcontext=unconfined_u:object_r:var_run_t:s0 tclass=file permissive=1
type=AVC msg=audit(1638995374.054:368): avc:  denied  { open } for  pid=10520 comm="master" path="/var/spool/postfix/pid/master.pid" dev="vda1" ino=14680387 scontext=system_u:system_r:postfix_master_t:s0 tcontext=unconfined_u:object_r:var_run_t:s0 tclass=file permissive=1
type=SYSCALL msg=audit(1638995374.054:368): arch=c000003e syscall=257 success=yes exit=11 a0=ffffff9c a1=5583463b8200 a2=2 a3=0 items=0 ppid=10513 pid=10520 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="master" exe="/usr/libexec/postfix/master" subj=system_u:system_r:postfix_master_t:s0 key=(null)ARCH=x86_64 SYSCALL=openat AUID="unset" UID="root" GID="root" EUID="root" SUID="root" FSUID="root" EGID="root" SGID="root" FSGID="root"
type=AVC msg=audit(1638995374.054:369): avc:  denied  { getattr } for  pid=10520 comm="master" path="/var/spool/postfix/pid/master.pid" dev="vda1" ino=14680387 scontext=system_u:system_r:postfix_master_t:s0 tcontext=unconfined_u:object_r:var_run_t:s0 tclass=file permissive=1
type=SYSCALL msg=audit(1638995374.054:369): arch=c000003e syscall=5 success=yes exit=0 a0=b a1=7fff8d35b030 a2=7fff8d35b030 a3=0 items=0 ppid=10513 pid=10520 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="master" exe="/usr/libexec/postfix/master" subj=system_u:system_r:postfix_master_t:s0 key=(null)ARCH=x86_64 SYSCALL=fstat AUID="unset" UID="root" GID="root" EUID="root" SUID="root" FSUID="root" EGID="root" SGID="root" FSGID="root"
type=AVC msg=audit(1638995374.054:370): avc:  denied  { lock } for  pid=10520 comm="master" path="/var/spool/postfix/pid/master.pid" dev="vda1" ino=14680387 scontext=system_u:system_r:postfix_master_t:s0 tcontext=unconfined_u:object_r:var_run_t:s0 tclass=file permissive=1
type=SYSCALL msg=audit(1638995374.054:370): arch=c000003e syscall=73 success=yes exit=0 a0=b a1=6 a2=7ff4c7d914e0 a3=0 items=0 ppid=10513 pid=10520 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="master" exe="/usr/libexec/postfix/master" subj=system_u:system_r:postfix_master_t:s0 key=(null)ARCH=x86_64 SYSCALL=flock AUID="unset" UID="root" GID="root" EUID="root" SUID="root" FSUID="root" EGID="root" SGID="root" FSGID="root"

Without SELinux it works as expected, i.e. the 'master -t' detects that the daemon is not running and systemd starts the postfix service despite the existing pid file.

Reassigning to the selinux-policy component.

Comment 2 Zdenek Pytela 2021-12-09 07:54:49 UTC
From SELinux POV, this is correct behaviour as the /usr/sbin/postfix command is executed in the caller domain which is in this case probably unconfined_t so no file transitions for the pid file apply. When the postfix service is handled by systemctl start and stop, there is no problem expected.

Comment 3 Roger Sewell 2021-12-09 10:31:44 UTC
First, I confirm that stopping and starting postfix using

systemctl stop postfix.service
systemctl start postfix.service

works fine and then the pid file is correctly handled.

Second, however, I would note that 

man postfix 

gives no suggestion that postfix stop and postfix start won't work properly. Given the above, it should contain at least a warning to use the systemctl commands for this purpose. Moreover ideally use of postfix stop or postfix start should either fail with an error message telling root to use the systemctl commands, or should work properly.

Comment 4 Zdenek Pytela 2021-12-09 13:00:01 UTC
Ok, reassigning back to postfix.

Comment 8 Jaroslav Škarvada 2021-12-13 13:33:36 UTC
After discussion with the SELinux guys, let's try workaround in the postfix service file first to run restorecon on the master.pid.

Comment 9 Jaroslav Škarvada 2021-12-14 00:16:36 UTC
Created attachment 1846159 [details]
Proposed fix

Comment 10 Roger Sewell 2021-12-14 10:08:10 UTC
Jaroslav,

Thank you for posting this proposed fix.

It does partially work -- i.e. it removes the worst effect of this bug, which is that the postfix system didn't restart after a system reboot if postfix start and postfix stop had been used before. But it doesn't cause the tidy removal of the master.pid file after postfix stop. It may be that this is the effect you intended, and it is certainly a huge improvement on the previous sitution, which left an obscure bug not fixed by a reboot.

What I did to test it: I inserted the line
ExecStartPre=-/usr/sbin/restorecon -R /var/spool/postfix/pid/master.pid
just before the other ExecStartPre lines in /usr/lib/systemd/system/postfix.service 
- i.e. I didn't recompile the package, just tested the effect of the final change to the system.

I then started postfix.service by
systemctl stop postfix.service
systemctl daemon-reload
systemctl restart postfix.service

and looked at the pid file (ll is my alias for ls -l):
ll -Z /var/spool/postfix/pid/master.pid
-rw-------. 1 root root system_u:object_r:postfix_var_run_t:s0 33 Dec 14 09:36 /var/spool/postfix/pid/master.pid

Then I ran postfix stop -- so far so good:
ll -Z /var/spool/postfix/pid/master.pid
ls: cannot access '/var/spool/postfix/pid/master.pid': No such file or directory

Then I ran postfix start, followed by postfix stop (i.e. starting and stopping not via systemctl), and got
ll -Z /var/spool/postfix/pid/master.pid
-rw-------. 1 root root unconfined_u:object_r:var_run_t:s0 33 Dec 14 09:36 /var/spool/postfix/pid/master.pid
i.e. file still exists and has wrong selinux context, when it should have been removed.

But it does then restart on a system reboot or even just systemctl start postfix.service -- perhaps that was your only intention. I.e. it is a workaround which is certainly a huge improvement (and which I will leave in place), but it is not a clean fix.

Thank you,
Roger.

Comment 11 Jaroslav Škarvada 2021-12-15 17:11:07 UTC
(In reply to Roger Sewell from comment #10)
> Jaroslav,
> 
> Thank you for posting this proposed fix.
> 
> It does partially work -- i.e. it removes the worst effect of this bug,
> which is that the postfix system didn't restart after a system reboot if
> postfix start and postfix stop had been used before. But it doesn't cause
> the tidy removal of the master.pid file after postfix stop. It may be that
> this is the effect you intended, and it is certainly a huge improvement on
> the previous sitution, which left an obscure bug not fixed by a reboot.
> 
> What I did to test it: I inserted the line
> ExecStartPre=-/usr/sbin/restorecon -R /var/spool/postfix/pid/master.pid
> just before the other ExecStartPre lines in
> /usr/lib/systemd/system/postfix.service 
> - i.e. I didn't recompile the package, just tested the effect of the final
> change to the system.
> 
> I then started postfix.service by
> systemctl stop postfix.service
> systemctl daemon-reload
> systemctl restart postfix.service
> 
> and looked at the pid file (ll is my alias for ls -l):
> ll -Z /var/spool/postfix/pid/master.pid
> -rw-------. 1 root root system_u:object_r:postfix_var_run_t:s0 33 Dec 14
> 09:36 /var/spool/postfix/pid/master.pid
> 
> Then I ran postfix stop -- so far so good:
> ll -Z /var/spool/postfix/pid/master.pid
> ls: cannot access '/var/spool/postfix/pid/master.pid': No such file or
> directory
> 
> Then I ran postfix start, followed by postfix stop (i.e. starting and
> stopping not via systemctl), and got
> ll -Z /var/spool/postfix/pid/master.pid
> -rw-------. 1 root root unconfined_u:object_r:var_run_t:s0 33 Dec 14 09:36
> /var/spool/postfix/pid/master.pid
> i.e. file still exists and has wrong selinux context, when it should have
> been removed.
> 
> But it does then restart on a system reboot or even just systemctl start
> postfix.service -- perhaps that was your only intention. I.e. it is a
> workaround which is certainly a huge improvement (and which I will leave in
> place), but it is not a clean fix.
> 
> Thank you,
> Roger.

Is it problem that the PID file isn't removed?

The problem is two different approaches here:
1) postfix upstream: they do not delete the PID file, but checks its validity through the locking
2) systemd: they delete the invalid PID file

It's usually not good approach to mix controlling of the services through the systemd and non-systemd interfaces and I think it's also unsupported in RHEL.

For 1) we would have to patch postfix to delete the PID file, which would be diversion from the upstream behaviour which we are trying to avoid. Maybe upstream discussion could be opened.

Comment 12 Roger Sewell 2021-12-15 17:18:31 UTC
Jaroslav,

No, it isn't a problem for me that the PID file isn't removed - it just didn't seem a clean solution. You have explained your dilemma, which I understand - I'm therefore happy with your fix.

Thank you for the explanation - I hadn't realised about the two different upstreams.

Roger.

Comment 13 Zdenek Pytela 2022-01-13 11:01:34 UTC
Switching the component to postfix as there seems to be where the work continues now, but will keep monitoring it in case selinux-policy needs updating, too.

Comment 15 Jaroslav Škarvada 2022-02-17 22:48:10 UTC
Cloning to RHEL-9.

Comment 16 Jaroslav Škarvada 2022-02-17 22:52:29 UTC
RHEL-9 lightweight clone bug 2055915.

Comment 23 errata-xmlrpc 2022-05-10 15:29:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (postfix bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2091

Comment 24 FS3000 2022-11-30 16:21:31 UTC
Just hit this bug, postfix-3.5.8-4.el8.x86_6 on 8.7.

Not sure if this matters, but i did "systemctl enable --now postfix" instead of just "systemctl start postfix".

Comment 25 Dora O’Fee 2023-07-07 12:45:46 UTC
(In reply to FS3000 from comment #24)
> Just hit this bug, postfix-3.5.8-4.el8.x86_6 on 8.7.
> 
> Not sure if this matters, but i did "systemctl enable --now postfix" instead
> of just "systemctl start postfix".

Please refer to https://bugzilla.redhat.com/show_bug.cgi?id=2162659#c3.


Note You need to log in before you can comment on or make changes to this bug.