Bug 1257334
Summary: | Better detection of staled postmaster.pid (e.g. after power outage) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Radek Hladik <rhladik> |
Component: | postgresql | Assignee: | Filip Januš <fjanus> |
Status: | CLOSED WONTFIX | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | low | Docs Contact: | |
Priority: | unspecified | ||
Version: | rawhide | CC: | anezbeda, devrim, hhorak, jmlich83, pkubat, praiskup, tgl |
Target Milestone: | --- | Keywords: | FutureFeature |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Enhancement | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2024-05-30 17:38:28 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Radek Hladik
2015-08-26 20:27:32 UTC
Hmm, this sounds like tough problem to solve. IMO, this situation might happen also if there was _just one_ PostgreSQL instance running on Fedora box.. due to parallel nature of systemd -- the PID of old postmaster could already be used by any other service. I don't think there is much we can do (sanely) here, but I tried to ping upstream with this: http://www.postgresql.org/message-id/1711927.hbtYs8Lf7C@nb.usersys.redhat.com If the PID's been reused by a different service, it's not a problem. The postmaster will only think that a postmaster.pid file represents a live conflict if the PID mentioned in it is live, has the same userid as the new postmaster, and is not the postmaster's immediate parent process (typically pg_ctl). I think we also had a kluge in there once upon a time to detect the grandparent process, in case that was a shell also executing as the postgres user. I'm not sure if that kluge survived the transition to systemd, or whether it'd still be of use in systemd-land. In the case described, much the safest solution would be to run the two postmasters under two different userids. If there's some step in systemd bootup that is responsible for blowing away pidfiles of other services, *and it is iron clad guaranteed to run only at system boot and never again*, you could think about having that step also delete Postgres pidfiles. However, I've seen way too many people lose data due to implementing such a manual removal incorrectly --- if there is any way for it to happen after system start, you risk removing a pidfile that corresponds to a live postmaster, and if you do that you will end up with a corrupt database eventually. Do not even think of just adding that to the "service start" sequence. Ah, wait, the kluge I mentioned is in the postgres sources, it wasn't added by Red Hat. pg_ctl tells the postmaster the PID of its parent process. It's probably worth quoting the comment in miscinit.c: * If the PID in the lockfile is our own PID or our parent's or * grandparent's PID, then the file must be stale (probably left over from * a previous system boot cycle). We need to check this because of the * likelihood that a reboot will assign exactly the same PID as we had in * the previous reboot, or one that's only one or two counts larger and * hence the lockfile's PID now refers to an ancestor shell process. We * allow pg_ctl to pass down its parent shell PID (our grandparent PID) * via the environment variable PG_GRANDPARENT_PID; this is so that * launching the postmaster via pg_ctl can be just as reliable as * launching it directly. There is no provision for detecting * further-removed ancestor processes, but if the init script is written * carefully then all but the immediate parent shell will be root-owned * processes and so the kill test will fail with EPERM. Note that we * cannot get a false negative this way, because an existing postmaster * would surely never launch a competing postmaster or pg_ctl process * directly. It'd be worth studying the systemd script with this in mind to make sure there's not more than one level of postgres-owned process above pg_ctl. However, the whole thing is risky anyway if you are launching multiple postmasters concurrently with identical userids; there's no way to tell that some other postmaster isn't actually interested in your data directory. It is no problem for me to delete the stalled PID file manually. And I will probably use the workaround with different userid and change uid of one of the instance. Or I have been suggested to use "external_pid_file" option to move PID files to /var/run (where they should get deleted on system start) - however I am not sure if this will work. But one thing I do not understand. I am not a skilled unix programmer but I do not understand why pg_ctl can not simply query the "candidate" postmaster from PID file if it is using the data directory in question. Or instead of using PID file just use some socket in data directory to "ping" the postmaster... (In reply to Radek Hladik from comment #4) > It is no problem for me to delete the stalled PID file manually. And I will > probably use the workaround with different userid and change uid of one of > the instance. I wouldn't call this workaround, we should however somehow document this -- README.rpm-dist seems to be proper place for this..? > Or I have been suggested to use "external_pid_file" option to > move PID files to /var/run (where they should get deleted on system start) - > however I am not sure if this will work. You can also use 'After=' construct in one of your service specifications, if you don't mind you'll loose a bit of parallelism (for 'postgresql.service' and 'postgresql'): # mkdir /etc/systemd/system/postgresql.d # cd !$ # cat >> 90-order.conf <<EOF [Unit] After=postgresql.service EOF (might be worth documenting too) > But one thing I do not understand. I am not a skilled unix programmer but I > do not understand why pg_ctl can not simply query the "candidate" postmaster > from PID file if it is using the data directory in question. Or instead of > using PID file just use some socket in data directory to "ping" the > postmaster... We have only 'PID' of the concurrent process. We know that it is some process under UID 26 (postgres), otherwise we would succeed. But we have no obvious place where to ping for additional info (i.e. we do not even know what is the datadir of that postmaster). Pavel (In reply to Pavel Raiskup from comment #5) > (In reply to Radek Hladik from comment #4) > > It is no problem for me to delete the stalled PID file manually. And I will > > probably use the workaround with different userid and change uid of one of > > the instance. > > I wouldn't call this workaround, we should however somehow document this -- > README.rpm-dist seems to be proper place for this..? Good place would be default systemd postgres unit file. Now there is a example how to change a port, pgdata ,etc... So note about this would be very useful > > > Or I have been suggested to use "external_pid_file" option to > > move PID files to /var/run (where they should get deleted on system start) - > > however I am not sure if this will work. > > You can also use 'After=' construct in one of your service specifications, if > you don't mind you'll loose a bit of parallelism (for 'postgresql.service' > and > 'postgresql'): > > # mkdir /etc/systemd/system/postgresql.d > # cd !$ > # cat >> 90-order.conf <<EOF > [Unit] > After=postgresql.service > EOF > > (might be worth documenting too) I tried to do this the simpliest yet the most standard Fedora way. And to be honest I am still quite unsure about various sideeffects of ordering unit files (like removing circular unit files, etc...). And I am not sure if this would solve the problem in all cases. If something before postgres would launch one child process more/less, we could run into matching PIDs too. > > > But one thing I do not understand. I am not a skilled unix programmer but I > > do not understand why pg_ctl can not simply query the "candidate" postmaster > > from PID file if it is using the data directory in question. Or instead of > > using PID file just use some socket in data directory to "ping" the > > postmaster... > > We have only 'PID' of the concurrent process. We know that it is some > process > under UID 26 (postgres), otherwise we would succeed. But we have no obvious > place where to ping for additional info (i.e. we do not even know what is the > datadir of that postmaster). Its interesting that being root and having PID of someone, you can send signals to the process, you can see, what files it has open, read its memory, kill it and much much more but you can not send it simple ping "hello, anybody out there?". But why can't you just have some additional info in the pid file. I.e. the datadir or path to some sort of ping-socket? > > Pavel (In reply to Radek Hladik from comment #6) > If something before postgres would launch one child process more/less, we > could run into matching PIDs too. Right, sorry. The problem are not processes started _before_ postgres (and its sub-processes) but rather the processes started concurrently at the same time. This means that there is no _constant_ difference between PID_A and PID_B value among different reboots -- even with 'After=' construct. > > We have only 'PID' of the concurrent process. We know that it is some > > process under UID 26 (postgres), otherwise we would succeed. But we have > > no obvious place where to ping for additional info (i.e. we do not even > > know what is the datadir of that postmaster). > > Its interesting that being root and having PID of someone The postgres-owned process knows the PID of concurrent process (we do not parse the postmaster.pid under root). Well -- we could parse the postmaster.pid somehow within PreExec (or other root action), but it does not look like the best idea (systemd has had no idea about postmaster.pid so far and its fine because postgres/postmaster is the right process to take care of it). > but you can not send it simple ping "hello, anybody out there?". But why > can't you just have some additional info in the pid file. I.e. the datadir > or path to some sort of ping-socket? That all sounds like it could work but I'm not aware of some obvious and _portable_ way to do that. And taking the overall risk into account, it is probably better to let the admin do some manual steps in situations like power-outage-recovery. This package has changed maintainer in Fedora. Reassigning to the new maintainer of this component. Based on what was said here and in the mailing thread the safe choice would be to use different users. Manipulation with PIDs is dangerous and should be done only manually when we know that the PID is not attached to a running process. It also sounds like a very specific scenario in which this conflict would/could happen and safest choice is using different configuration based on feedback here and in the aforementioned mailing list. |