1058654 – systemd kills and restarts postgrey repeatedly

Bug 1058654 - systemd kills and restarts postgrey repeatedly

Summary: systemd kills and restarts postgrey repeatedly

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	postgrey
Sub Component:
Version:	20
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Christopher Meng
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1059738 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-01-28 09:44 UTC by Joel Uckelman
Modified:	2014-02-15 06:57 UTC (History)
CC List:	5 users (show)
Fixed In Version:	postgrey-1.34-12.fc20
Clone Of:	1055638
Environment:
Last Closed:	2014-02-15 06:57:30 UTC
Type:	Bug
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Joel Uckelman 2014-01-28 09:44:38 UTC

Description of problem:

When I start postgrey with 'systemctl start postgrey', it looks like systemd starts and then systemd kills it, over and over again several times per minute. Here's a single cycle of what I see in /var/log/messages:

Jan 28 10:35:36 localhost systemd: Starting Postfix Greylisting Service...
Jan 28 10:35:36 localhost postgrey: 2014/01/28-10:35:36 postgrey (type Net::Server::Multiplex) starting! pid(26215)
Jan 28 10:35:36 localhost postgrey: Binding to UNIX socket file "/var/spool/postfix/postgrey/socket"
Jan 28 10:35:36 localhost postgrey: Setting gid to "991 991"
Jan 28 10:35:36 localhost postgrey: Setting uid to "993"
Jan 28 10:35:41 localhost systemd: postgrey.service operation timed out. Terminating.
Jan 28 10:35:41 localhost postgrey: 2014/01/28-10:35:41 Server closing!
Jan 28 10:35:41 localhost systemd: Failed to start Postfix Greylisting Service.
Jan 28 10:35:41 localhost systemd: Unit postgrey.service entered failed state.
Jan 28 10:35:42 localhost systemd: postgrey.service holdoff time over, scheduling restart.
Jan 28 10:35:42 localhost systemd: Stopping Postfix Greylisting Service...

As far as I can tell, postgrey is starting up just fine, so it's unclear to me what's going on. I don't see the pidfile at /var/run/postgrey.pid---if that's not being written, is that causing systemd thinks postgrey didn't start?

Version-Release number of selected component (if applicable):

postgrey-1.34-11.fc20.noarch

How reproducible:

every time

Steps to Reproduce:
1. systemctl start postgrey
2.
3.

Actual results:

postgrey starts and is killed and starts and is killed...

Expected results:

postgrey should start once

Additional information:

I didn't have this problem in Fedora 19, but I'm not sure what version of postgrey I had there.

Comment 1 Christopher Meng 2014-01-28 10:09:41 UTC

I do change the service file with more strict rules for f20+.

One of the modification in the service file:

+TimeoutSec=5

This will try to restart the service if timed out. I'm not sure why you failed to start the service. 

Looking into the log I saw that it was trying to setuid and gid, yes I added these in f20, I thought it was useful but now it seems that there probably exists a problem... 

Can you do these in the service file and report again? 

1. Comment out the restart timeout line mentioned above, then try starting the service. 

2. Comment out 

--group=postgrey \ --user=postgrey \

And try again

Comment 2 Joel Uckelman 2014-01-28 10:47:04 UTC

(In reply to Christopher Meng from comment #1)
> I do change the service file with more strict rules for f20+.
> 
> One of the modification in the service file:
> 
> +TimeoutSec=5
> 
> This will try to restart the service if timed out. I'm not sure why you
> failed to start the service. 
> 
> Looking into the log I saw that it was trying to setuid and gid, yes I added
> these in f20, I thought it was useful but now it seems that there probably
> exists a problem... 
> 
> Can you do these in the service file and report again? 
> 
> 1. Comment out the restart timeout line mentioned above, then try starting
> the service. 

I commented out TimeoutSec in /usr/lib/systemd/system/postgrey.service.
Then:

systemctl dameon-reload
systemctl stop postgrey
systemctl start postgrey

'systemctl start postgrey' waits a long time and then reports failure:

"Job for postgrey.service failed. See 'systemctl status postgrey.service' and 'journalctl -xn' for details."

[root@one ~]# journalctl -xn
-- Logs begin at Sat 2014-01-25 13:51:44 CET, end at Tue 2014-01-28 11:39:28 CET
Jan 28 11:39:27 one postgrey[28130]: 2014/01/28-11:39:27 Server closing!
Jan 28 11:39:27 one systemd[1]: Failed to start Postfix Greylisting Service.
-- Subject: Unit postgrey.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit postgrey.service has failed.
-- 
-- The result is failed.
Jan 28 11:39:27 one systemd[1]: Unit postgrey.service entered failed state.
Jan 28 11:39:28 one systemd[1]: postgrey.service holdoff time over, scheduling r
Jan 28 11:39:28 one systemd[1]: Stopping Postfix Greylisting Service...
-- Subject: Unit postgrey.service has begun shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit postgrey.service has begun shutting down.
Jan 28 11:39:28 one systemd[1]: Starting Postfix Greylisting Service...
-- Subject: Unit postgrey.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit postgrey.service has begun starting up.
Jan 28 11:39:28 one postgrey[28137]: 2014/01/28-11:39:28 postgrey (type Net::Ser
Jan 28 11:39:28 one postgrey[28137]: Binding to UNIX socket file "/var/spool/pos
Jan 28 11:39:28 one postgrey[28137]: Setting gid to "991 991"
Jan 28 11:39:28 one postgrey[28137]: Setting uid to "993"


This appears just to make the kill-restart cycle longer.

> 2. Comment out 
> 
> --group=postgrey \ --user=postgrey \
> 
> And try again

I wasn't sure if you meant #1 and #2 to be applied independently or cumulatively. With #1 and #2 together, I get the same result as with #1 alone. With #2 alone, I get the same kill-restart cycle as I do with no modifications to postgrey.service.

Comment 3 Joel Uckelman 2014-01-28 10:47:30 UTC

Also: In no case do I see a pidfile being written to /var/run.

Comment 4 Mark Mielke 2014-02-01 04:58:57 UTC

If I start the command from the command line, I get a /var/run/postgrey.pid. But, if I ask systemd to start postgrey, it does not generate /var/run/postgrey.pid, and it seems likely that systemd is believing that postgrey didn't properly start as a result?

Comment 5 Mark Mielke 2014-02-01 06:51:19 UTC

It looks like it is the --greylist-text option that is causing the problem.

In particular, this seems to work:

 --greylist-text="Greylisted_for_%%s_seconds" \

But, this seems to fail:

 --greylist-text="Greylisted_for_%%s_seconds" \

I think there is an incorrect expectation around what type of content is allowed to be contained in ExecStartPre=. In particular, this seems to also fail:

 --greylist-text='Greylisted for %%s seconds' \

But, this WORKS:

 '--greylist-text=Greylisted for %%s seconds' \

I'm not a systemd expert, but I would say that:

 --parameter="value"

Is only valid as /bin/sh syntax and NOT valid as systemd ExecStartPre= syntax.

So, I suggest using this:

 '--greylist-text=Greylisted for %%s seconds' \

Comment 6 Mark Mielke 2014-02-01 06:52:26 UTC

Err typo above, sorry... the second failure should have been:

 --greylist-text="Greylisted for %%s seconds" \

That is, the original syntax in the current RPM version that fails...

Comment 7 Mark Mielke 2014-02-01 06:57:41 UTC

And, it's ExecStart=, not ExecStartPre=. Eyes bleary. Sorry for the noise, but didn't want to leave it incorrect.

Comment 8 Joel Uckelman 2014-02-02 20:34:48 UTC

(In reply to Mark Mielke from comment #5)
> 
> So, I suggest using this:
> 
>  '--greylist-text=Greylisted for %%s seconds' \

I concur, this does appear to work. The pidfile is created and systemd doesn't get into a loop of killing and restarting postgrey.

Comment 9 Christopher Meng 2014-02-03 01:37:23 UTC

*** Bug 1059738 has been marked as a duplicate of this bug. ***

Comment 10 Christopher Meng 2014-02-06 02:44:49 UTC

Hmm...

Is it a flaw in systemd? I think that I need to report a new bug against systemd.

Comment 11 Fedora Update System 2014-02-06 02:51:52 UTC

postgrey-1.34-12.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/postgrey-1.34-12.fc20

Comment 12 Fedora Update System 2014-02-07 03:07:48 UTC

Package postgrey-1.34-12.fc20:
* should fix your issue,
* was pushed to the Fedora 20 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing postgrey-1.34-12.fc20'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2014-2114/postgrey-1.34-12.fc20
then log in and leave karma (feedback).

Comment 13 Joel Uckelman 2014-02-10 14:09:05 UTC

The fix works for me.

Comment 14 Vladislav Grigoryev 2014-02-14 05:08:30 UTC

You can increase this update karma to force pushing to stable if you log in to admin.fedoraproject.org before leaving a feedback.

Comment 15 Fedora Update System 2014-02-15 06:57:30 UTC

postgrey-1.34-12.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Note You need to log in before you can comment on or make changes to this bug.