Bug 1058654

Summary: systemd kills and restarts postgrey repeatedly
Product: [Fedora] Fedora Reporter: Joel Uckelman <uckelman>
Component: postgreyAssignee: Christopher Meng <i>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 20CC: i, mark, matthias, scarlet, vg.aetera
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: postgrey-1.34-12.fc20 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1055638 Environment:
Last Closed: 2014-02-15 06:57:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joel Uckelman 2014-01-28 09:44:38 UTC
Description of problem:

When I start postgrey with 'systemctl start postgrey', it looks like systemd starts and then systemd kills it, over and over again several times per minute. Here's a single cycle of what I see in /var/log/messages:

Jan 28 10:35:36 localhost systemd: Starting Postfix Greylisting Service...
Jan 28 10:35:36 localhost postgrey: 2014/01/28-10:35:36 postgrey (type Net::Server::Multiplex) starting! pid(26215)
Jan 28 10:35:36 localhost postgrey: Binding to UNIX socket file "/var/spool/postfix/postgrey/socket"
Jan 28 10:35:36 localhost postgrey: Setting gid to "991 991"
Jan 28 10:35:36 localhost postgrey: Setting uid to "993"
Jan 28 10:35:41 localhost systemd: postgrey.service operation timed out. Terminating.
Jan 28 10:35:41 localhost postgrey: 2014/01/28-10:35:41 Server closing!
Jan 28 10:35:41 localhost systemd: Failed to start Postfix Greylisting Service.
Jan 28 10:35:41 localhost systemd: Unit postgrey.service entered failed state.
Jan 28 10:35:42 localhost systemd: postgrey.service holdoff time over, scheduling restart.
Jan 28 10:35:42 localhost systemd: Stopping Postfix Greylisting Service...

As far as I can tell, postgrey is starting up just fine, so it's unclear to me what's going on. I don't see the pidfile at /var/run/postgrey.pid---if that's not being written, is that causing systemd thinks postgrey didn't start?

Version-Release number of selected component (if applicable):

postgrey-1.34-11.fc20.noarch

How reproducible:

every time

Steps to Reproduce:
1. systemctl start postgrey
2.
3.

Actual results:

postgrey starts and is killed and starts and is killed...

Expected results:

postgrey should start once

Additional information:

I didn't have this problem in Fedora 19, but I'm not sure what version of postgrey I had there.

Comment 1 Christopher Meng 2014-01-28 10:09:41 UTC
I do change the service file with more strict rules for f20+.

One of the modification in the service file:

+TimeoutSec=5

This will try to restart the service if timed out. I'm not sure why you failed to start the service. 

Looking into the log I saw that it was trying to setuid and gid, yes I added these in f20, I thought it was useful but now it seems that there probably exists a problem... 

Can you do these in the service file and report again? 

1. Comment out the restart timeout line mentioned above, then try starting the service. 

2. Comment out 

--group=postgrey \ --user=postgrey \

And try again

Comment 2 Joel Uckelman 2014-01-28 10:47:04 UTC
(In reply to Christopher Meng from comment #1)
> I do change the service file with more strict rules for f20+.
> 
> One of the modification in the service file:
> 
> +TimeoutSec=5
> 
> This will try to restart the service if timed out. I'm not sure why you
> failed to start the service. 
> 
> Looking into the log I saw that it was trying to setuid and gid, yes I added
> these in f20, I thought it was useful but now it seems that there probably
> exists a problem... 
> 
> Can you do these in the service file and report again? 
> 
> 1. Comment out the restart timeout line mentioned above, then try starting
> the service. 

I commented out TimeoutSec in /usr/lib/systemd/system/postgrey.service.
Then:

systemctl dameon-reload
systemctl stop postgrey
systemctl start postgrey

'systemctl start postgrey' waits a long time and then reports failure:

"Job for postgrey.service failed. See 'systemctl status postgrey.service' and 'journalctl -xn' for details."

[root@one ~]# journalctl -xn
-- Logs begin at Sat 2014-01-25 13:51:44 CET, end at Tue 2014-01-28 11:39:28 CET
Jan 28 11:39:27 one postgrey[28130]: 2014/01/28-11:39:27 Server closing!
Jan 28 11:39:27 one systemd[1]: Failed to start Postfix Greylisting Service.
-- Subject: Unit postgrey.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit postgrey.service has failed.
-- 
-- The result is failed.
Jan 28 11:39:27 one systemd[1]: Unit postgrey.service entered failed state.
Jan 28 11:39:28 one systemd[1]: postgrey.service holdoff time over, scheduling r
Jan 28 11:39:28 one systemd[1]: Stopping Postfix Greylisting Service...
-- Subject: Unit postgrey.service has begun shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit postgrey.service has begun shutting down.
Jan 28 11:39:28 one systemd[1]: Starting Postfix Greylisting Service...
-- Subject: Unit postgrey.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit postgrey.service has begun starting up.
Jan 28 11:39:28 one postgrey[28137]: 2014/01/28-11:39:28 postgrey (type Net::Ser
Jan 28 11:39:28 one postgrey[28137]: Binding to UNIX socket file "/var/spool/pos
Jan 28 11:39:28 one postgrey[28137]: Setting gid to "991 991"
Jan 28 11:39:28 one postgrey[28137]: Setting uid to "993"


This appears just to make the kill-restart cycle longer.

> 2. Comment out 
> 
> --group=postgrey \ --user=postgrey \
> 
> And try again

I wasn't sure if you meant #1 and #2 to be applied independently or cumulatively. With #1 and #2 together, I get the same result as with #1 alone. With #2 alone, I get the same kill-restart cycle as I do with no modifications to postgrey.service.

Comment 3 Joel Uckelman 2014-01-28 10:47:30 UTC
Also: In no case do I see a pidfile being written to /var/run.

Comment 4 Mark Mielke 2014-02-01 04:58:57 UTC
If I start the command from the command line, I get a /var/run/postgrey.pid. But, if I ask systemd to start postgrey, it does not generate /var/run/postgrey.pid, and it seems likely that systemd is believing that postgrey didn't properly start as a result?

Comment 5 Mark Mielke 2014-02-01 06:51:19 UTC
It looks like it is the --greylist-text option that is causing the problem.

In particular, this seems to work:

 --greylist-text="Greylisted_for_%%s_seconds" \

But, this seems to fail:

 --greylist-text="Greylisted_for_%%s_seconds" \

I think there is an incorrect expectation around what type of content is allowed to be contained in ExecStartPre=. In particular, this seems to also fail:

 --greylist-text='Greylisted for %%s seconds' \

But, this WORKS:

 '--greylist-text=Greylisted for %%s seconds' \

I'm not a systemd expert, but I would say that:

 --parameter="value"

Is only valid as /bin/sh syntax and NOT valid as systemd ExecStartPre= syntax.

So, I suggest using this:

 '--greylist-text=Greylisted for %%s seconds' \

Comment 6 Mark Mielke 2014-02-01 06:52:26 UTC
Err typo above, sorry... the second failure should have been:

 --greylist-text="Greylisted for %%s seconds" \

That is, the original syntax in the current RPM version that fails...

Comment 7 Mark Mielke 2014-02-01 06:57:41 UTC
And, it's ExecStart=, not ExecStartPre=. Eyes bleary. Sorry for the noise, but didn't want to leave it incorrect.

Comment 8 Joel Uckelman 2014-02-02 20:34:48 UTC
(In reply to Mark Mielke from comment #5)
> 
> So, I suggest using this:
> 
>  '--greylist-text=Greylisted for %%s seconds' \

I concur, this does appear to work. The pidfile is created and systemd doesn't get into a loop of killing and restarting postgrey.

Comment 9 Christopher Meng 2014-02-03 01:37:23 UTC
*** Bug 1059738 has been marked as a duplicate of this bug. ***

Comment 10 Christopher Meng 2014-02-06 02:44:49 UTC
Hmm...

Is it a flaw in systemd? I think that I need to report a new bug against systemd.

Comment 11 Fedora Update System 2014-02-06 02:51:52 UTC
postgrey-1.34-12.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/postgrey-1.34-12.fc20

Comment 12 Fedora Update System 2014-02-07 03:07:48 UTC
Package postgrey-1.34-12.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing postgrey-1.34-12.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-2114/postgrey-1.34-12.fc20
then log in and leave karma (feedback).

Comment 13 Joel Uckelman 2014-02-10 14:09:05 UTC
The fix works for me.

Comment 14 Vladislav Grigoryev 2014-02-14 05:08:30 UTC
You can increase this update karma to force pushing to stable if you log in to admin.fedoraproject.org before leaving a feedback.

Comment 15 Fedora Update System 2014-02-15 06:57:30 UTC
postgrey-1.34-12.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.