Bug 703215

Summary: [PATCH] fix PostgreSQL boot issue
Product: [Fedora] Fedora Reporter: Michał Piotrowski <mkkp4x4>
Component: postgresqlAssignee: Tom Lane <tgl>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: devrim, hhorak, notting, tgl
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: postgresql-9.0.4-2.fc15 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-05-19 05:11:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
init script patch none

Description Michał Piotrowski 2011-05-09 16:51:09 UTC
Description of problem:
PostgreSQL init script doesn't provide LSB informations about boot order.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Michał Piotrowski 2011-05-09 16:51:49 UTC
Created attachment 497866 [details]
init script patch

Comment 2 Tom Lane 2011-05-09 18:20:22 UTC
The documentation I've been able to find about init blocks, such as
http://wiki.debian.org/LSBInitScripts
suggests that this patch is rather incomplete.  Shouldn't we have Default-Start and Default-Stop lines too?

Comment 3 Michał Piotrowski 2011-05-09 18:25:52 UTC
For F14 yes. But in systemd there is no such thing like a run level, so it seems to me that these lines are not needed. If you think otherwise, just add Default-Start and Default-Stop :)

Comment 4 Michał Piotrowski 2011-05-09 18:31:11 UTC
BTW. How about
Required-Start: $local_fs $network $syslog
Required-Stop: $local_fs $network $syslog
?
Most systemd services starts after syslog.target, so adding this might be reasonable.

Comment 5 Tom Lane 2011-05-09 18:41:49 UTC
Yeah, I will add Default-Start and Default-Stop --- these scripts are not, in my mind, meant to be for systemd platforms only.

I had already come to the conclusion that $syslog would be a good idea.  I am also wondering about using $remote_fs instead of $local_fs --- it's not unheard of for people to put databases on storage outside /var, which is the only filesystem tree that's guaranteed to be mounted by $local_fs according to what I'm reading.  Also that Debian page suggests that $remote_fs is a better idea during shutdown.  Do you see any particular downside to that choice?

Comment 6 Michał Piotrowski 2011-05-09 18:51:54 UTC
httpd uses both $remote_fs and $local_fs
Required-Start: $local_fs $remote_fs $network(In reply to comment #5)
> Yeah, I will add Default-Start and Default-Stop --- these scripts are not, in
> my mind, meant to be for systemd platforms only.
> 
> I had already come to the conclusion that $syslog would be a good idea.  I am
> also wondering about using $remote_fs instead of $local_fs --- it's not unheard
> of for people to put databases on storage outside /var, which is the only
> filesystem tree that's guaranteed to be mounted by $local_fs according to what
> I'm reading.  Also that Debian page suggests that $remote_fs is a better idea
> during shutdown.  Do you see any particular downside to that choice?

No, I don't. Httpd uses both $remote_fs and $local_fs
Required-Start: $local_fs $remote_fs $network

Comment 7 Tom Lane 2011-05-09 19:14:53 UTC
Hmm ... that Debian page says "Scripts depending on $remote_fs do not need to depend on $local_fs".
However, it seems like a good conservative thing to depend on both, so I'll do it like httpd.

Comment 8 Tom Lane 2011-05-09 19:31:30 UTC
Uh oh ... experimentation proves that this:

### BEGIN INIT INFO
# Provides: postgresql
# Required-Start: $local_fs $remote_fs $network $syslog
# Required-Stop: $local_fs $remote_fs $network $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: start and stop PostgreSQL server
# Description: PostgreSQL database server
### END INIT INFO

does not actually fix the problem.  Does "$network" not include a guarantee that DNS service is up??

Comment 9 Bill Nottingham 2011-05-09 19:41:27 UTC
It depends.

$network can be provided by either the legacy init.d/network script, or NetworkManager.

init.d/network is synchronous; NM is not by default.

NM can be made synchronous by:

1) in F-14 and prior:

  setting NETWORKWAIT=yes in /etc/sysconfig/network

2) in F-15 and later:

  running 'systemctl enable NetworkManager-wait-online.service"
  (or 'chkconfig NetworkManager-wait-online on' - it maps to the systemctl command.)

Comment 10 Michał Piotrowski 2011-05-09 19:44:42 UTC
(In reply to comment #8)
> Uh oh ... experimentation proves that this:
> 
> ### BEGIN INIT INFO
> # Provides: postgresql
> # Required-Start: $local_fs $remote_fs $network $syslog
> # Required-Stop: $local_fs $remote_fs $network $syslog
> # Default-Start: 2 3 4 5
> # Default-Stop: 0 1 6
> # Short-Description: start and stop PostgreSQL server
> # Description: PostgreSQL database server
> ### END INIT INFO
> 
> does not actually fix the problem.  Does "$network" not include a guarantee
> that DNS service is up??

There is a $named in httpd. But I don't see any systemd target installed by default that will guarantee DNS service.

Comment 11 Tom Lane 2011-05-09 19:45:44 UTC
Actually, it doesn't look to me like systemd is paying any attention whatsoever to the LSB INIT block.
Maybe I'm missing some key detail about this.  For testing purposes, I'm just editing the scripts in /etc/rc.d/init.d/.  Do I need to take some other action to whack systemd over the head and get it to re-examine the scripts?  I tried chkconfig off and then on again, but no change ...

Comment 12 Tom Lane 2011-05-09 19:47:29 UTC
(In reply to comment #10)
> There is a $named in httpd. But I don't see any systemd target installed by
> default that will guarantee DNS service.

So, let me get this straight: system provides no way to guarantee that either a network connection or DNS service exists before a network-dependent service is started.  Why is this bug filed against postgresql and not systemd?

Comment 13 Michał Piotrowski 2011-05-09 19:53:02 UTC
(In reply to comment #11)
> Actually, it doesn't look to me like systemd is paying any attention whatsoever
> to the LSB INIT block.
> Maybe I'm missing some key detail about this.  For testing purposes, I'm just
> editing the scripts in /etc/rc.d/init.d/.  Do I need to take some other action
> to whack systemd over the head and get it to re-examine the scripts?  I tried
> chkconfig off and then on again, but no change ...

systemctl daemon-reload ?

Comment 14 Tom Lane 2011-05-09 19:59:09 UTC
(In reply to comment #13)
> systemctl daemon-reload ?

Nope, still no change in behavior.  mysqld is demonstrably being started before the machine's wireless port has been bound to any address, and postgresql is demonstrably being started before DNS service is available, despite each having 

# Required-Start: $local_fs $remote_fs $network $named $syslog

in its initscript.  I think this needs to be kicked over to the system side of the fence.

Comment 15 Michał Piotrowski 2011-05-09 20:18:56 UTC
Strange. Simple
Required-Start: $local_fs $network
fix MySQL problem here.

I don't see any problem with PostgreSQL, but I have
listen_addresses = 'localhost'
in config, so I guess that it works just fine with that.

Comment 16 Tom Lane 2011-05-09 20:29:34 UTC
What I'm testing for mysql is bind-address = machine's-IP-address on a laptop with a wireless connection (and a statically assigned IP, btw, so the hard-wired address is sensible).  Possibly it being wireless has something to do with the difference in results.

I too see that setting postgres' listen_addresses = 'localhost' works.  I've added the machine's actual DNS name beside that, and that doesn't work.  Again, it's possible that this is because of the wireless rather than wired connection, but I don't see that that lets systemd off the hook.  As far as I can see, both $network and $named are useless right now because they don't guarantee a damn thing.

Comment 17 Bill Nottingham 2011-05-09 20:39:27 UTC
$named is pretty useless, in general.

It could come from the 'hosts' entry in /etc/nsswitch.conf, which could point to files, or nis, or db, or any other nsswitch module.
Given that, it could come from a local, fully populated /etc/hosts.
Or it could come from a remote nameserver specified in /etc/resolv.conf.
That could be 'nameserver localhost', which means it would come from a locally-running named (or unbound, or whatever) instance.
Or it could come from NIS.
Or (insert requirements for any other nsswitch module)...

So, any implementation which wants to 'properly' set up $named, would need to parse all these files to determine what the system is actually using for named, and then dynamically provide this as appropriate. That's certainly not doable in the old SysV code, and likely way too complex to do in systemd.

Comment 18 Michał Piotrowski 2011-05-09 20:40:43 UTC
(In reply to comment #16)
> What I'm testing for mysql is bind-address = machine's-IP-address on a laptop
> with a wireless connection (and a statically assigned IP, btw, so the
> hard-wired address is sensible).  Possibly it being wireless has something to
> do with the difference in results.

I don't use wireless on this machine, but it uses an address from dhcp

DEVICE="em1"
BOOTPROTO="dhcp"
ONBOOT=yes
NM_CONTROLLED="yes"
TYPE=Ethernet

Previously I used network.service and it worked flawless. Now I use NetworkManager

> 
> I too see that setting postgres' listen_addresses = 'localhost' works.  I've
> added the machine's actual DNS name beside that, and that doesn't work.  Again,
> it's possible that this is because of the wireless rather than wired
> connection, but I don't see that that lets systemd off the hook.  As far as I
> can see, both $network and $named are useless right now because they don't
> guarantee a damn thing.

I think that Lennart should look at this problem.

Comment 19 Bill Nottingham 2011-05-09 20:49:06 UTC
(In reply to comment #17)
> $named is pretty useless, in general.

... which is why I tried (to no avail) to get it struck from the standard many years ago.

Similarly, $network is horribly underspecified. 

Does it mean:
- I have PF_INET?
- I have a working loopback address
- I have a working public address?
- I have a working public IPv6 address?

Furthermore, (as evidenced here), the dependency in some cases isn't "I have a working public address", it's "I have this *specific* public address"; this can't be well specified with just "$network". A box may be dual-homed with em0 and em1 pointing to different networks. The actual dependency might be on the address that em1 obtains, yet the network-providing service will 'succeed' if em0 has an address but em1 doesn't.

Comment 20 Tom Lane 2011-05-09 21:20:25 UTC
I can see the potential value of having more-specific versions of these requirement symbols, but that's not really the point here.  The point is that systemd seems to have decided that it's okay to treat these symbols as not actually guaranteeing any level of service at all, and that's not acceptable under *any* useful definition of what they mean.

Comment 21 Bill Nottingham 2011-05-10 17:58:40 UTC
(In reply to comment #20)
> I can see the potential value of having more-specific versions of these
> requirement symbols, but that's not really the point here.  The point is that
> systemd seems to have decided that it's okay to treat these symbols as not
> actually guaranteeing any level of service at all, and that's not acceptable
> under *any* useful definition of what they mean.

I'm not seeing this in brief testing here.

Given a requirement on $network in a test initscript "foo", I see the following combinations (in all cases, using DHCP to get an address):

1) Legacy 'network' SysV script enabled, NM is not enabled:

'foo' is started after the network init script finishes; a proper IP exists.

2) NM is enabled, legacy 'network' SysV script is not enabled:

'foo' is started after NM finishes starting, but before IP configuration finishes. This is the previously mentioned NetworkManager behavior (it's not synchronous by default)

3) NM is enabled, legacy 'network' SysV script is also enabled:

Same as case #2 (as legacy network script is merely calling NM to do the work.)

4) NM is enabled, and the addtional 'NetworkManager-wait-online.service' is also enabled:

'foo' is started after NM has configured an address on the device.

Everything seems to be working in accordance with comment #9. I can understand the argument about making NetworkManager synchronous instead of asynchronous, but that would be a change in behavior from prior releases. However, from everything I can see here, the issues that are being encountered would also show on Fedora 14 in situations where DHCP took excessively long.

Comment 22 Tom Lane 2011-05-10 19:00:17 UTC
OK, I tried "systemctl enable NetworkManager-wait-online.service" and that did indeed resolve the failures I was seeing.  I think that should be made the default.  The argument that "it's a change in behavior" seems pretty specious: systemd is creating a far larger change in behavior, such that configurations that used to work perfectly reliably will now have race conditions (and usually fail) if NM starts asynchronously.

Comment 23 Michał Piotrowski 2011-05-10 19:19:47 UTC
(In reply to comment #22)
> OK, I tried "systemctl enable NetworkManager-wait-online.service" and that did
> indeed resolve the failures I was seeing.  I think that should be made the
> default.  The argument that "it's a change in behavior" seems pretty specious:
> systemd is creating a far larger change in behavior, such that configurations
> that used to work perfectly reliably will now have race conditions (and usually
> fail) if NM starts asynchronously.

Use cases of NetworkManager-wait-online.service should be documented somewhere.

Comment 24 Bill Nottingham 2011-05-10 19:21:00 UTC
The race was already there ... it's just more likely to happen with parallel-ized services.

Comment 25 Fedora Update System 2011-05-11 03:08:47 UTC
postgresql-9.0.4-2.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/postgresql-9.0.4-2.fc15

Comment 26 Fedora Update System 2011-05-11 05:48:48 UTC
Package postgresql-9.0.4-2.fc15:
* should fix your issue,
* was pushed to the Fedora 15 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing postgresql-9.0.4-2.fc15'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/postgresql-9.0.4-2.fc15
then log in and leave karma (feedback).

Comment 27 Fedora Update System 2011-05-19 05:11:35 UTC
postgresql-9.0.4-2.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.