Bug 896161 - Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieved
Summary: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: postgresql
Version: 18
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Pavel Raiskup
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-01-16 18:41 UTC by George Machitidze
Modified: 2013-10-29 03:40 UTC (History)
4 users (show)

Fixed In Version: postgresql-9.2.5-1.fc18
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-10-29 03:40:58 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
pg_upgrade strace (30.42 KB, text/plain)
2013-01-16 22:24 UTC, George Machitidze
no flags Details
pg_upgrade strace (30.42 KB, text/plain)
2013-01-16 22:24 UTC, George Machitidze
no flags Details
Patch for RPM spec (275 bytes, patch)
2013-01-18 19:47 UTC, George Machitidze
no flags Details | Diff
Adjust the possibly running cluster a little bit earlier. (741 bytes, patch)
2013-08-12 09:03 UTC, Pavel Raiskup
no flags Details | Diff

Description George Machitidze 2013-01-16 18:41:28 UTC
Description of problem:
Upgraded F17 to F18, after tried to upgrade PostgreSQL in documented way:
Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails with invalid message "There seems to be a postmaster servicing the old cluster".

Version-Release number of selected component (if applicable):
postgresql-9.2.2-1.fc18.x86_64
postgresql-contrib-9.2.2-1.fc18.x86_64
postgresql-jdbc-9.2.1002-1.fc18.noarch
postgresql-libs-9.2.2-1.fc18.x86_64
postgresql-plperl-9.2.2-1.fc18.x86_64
postgresql-server-9.2.2-1.fc18.x86_64
postgresql-upgrade-9.2.2-1.fc18.x86_64

How reproducible:
Always reproducible for me

Steps to Reproduce:
1. Stop running postgresql 9.1 server and upgrade postgresql packages from 9.1 to 9.2 (or upgrade F17 to F18), or import 9.1 data dir
3. Run "postgreqsql-setup upgrade" (or pg_upgrade) for 9.1 to 9.2 upgrade
  
Actual results:
Update fails, stating "There seems to be a postmaster servicing the old cluster", while its shut down.

Expected results:
Upgrade should complete without errors

Additional info:
[root@giomacdesk ~]# rpm -qa|grep postgre|sort
postgresql-9.2.2-1.fc18.x86_64
postgresql-contrib-9.2.2-1.fc18.x86_64
postgresql-jdbc-9.2.1002-1.fc18.noarch
postgresql-libs-9.2.2-1.fc18.x86_64
postgresql-plperl-9.2.2-1.fc18.x86_64
postgresql-server-9.2.2-1.fc18.x86_64
postgresql-upgrade-9.2.2-1.fc18.x86_64
[root@giomacdesk ~]# postgresql-setup upgrade
Redirecting to /bin/systemctl stop  postgresql.service
Upgrading database: failed

See /var/lib/pgsql/pgupgrade.log for details.
[root@giomacdesk ~]# cat /var/lib/pgsql/pgupgrade.log
Performing Consistency Checks
-----------------------------
Checking current, bin, and data directories                 ok

There seems to be a postmaster servicing the old cluster.
Please shutdown that postmaster and try again.
Failure, exiting
[root@giomacdesk ~]# ps auxwf|grep master|grep -v grep
[root@giomacdesk ~]#

Comment 1 Tom Lane 2013-01-16 21:55:30 UTC
Hm, worksforme.  Is there anything non-default about your postgres-related unit files, configuration, etc?

Comment 2 George Machitidze 2013-01-16 22:18:16 UTC
No, everything is default, no files changed, no systemd unit files added manually for overrides:

[root@giomacdesk ~]# rpm -Va "postgre*"
[root@giomacdesk ~]# 

I've run pg_upgrade manually, without postgresql-setup script, same error/false output.

Comment 3 George Machitidze 2013-01-16 22:24:43 UTC
Created attachment 679882 [details]
pg_upgrade strace

strace output attached

Comment 4 George Machitidze 2013-01-16 22:24:44 UTC
Created attachment 679883 [details]
pg_upgrade strace

strace output attached

Comment 5 George Machitidze 2013-01-16 22:25:52 UTC
p.s.
[root@giomacdesk ~]# getenforce
Permissive

Comment 6 Tom Lane 2013-01-17 03:25:44 UTC
The strace shows that pg_upgrade is finding a postmaster.pid lock file in the old data directory:

open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) = 5

which seems to indicate that you didn't really shut that server down cleanly.

Applying pg_controldata to the old data directory would provide some more details about exactly what state it's in, although I'm afraid you might have to reinstall 9.1.7 to get a compatible pg_controldata version.  If pg_controldata claims the cluster is shut down cleanly, then we'd have to inquire into how come there's still a postmaster.pid file in it ... but if not, I think this is just pilot error.

Memo to self: this case suggests it'd be worth the disk space to have the postgresql-upgrade package include the older version of pg_controldata.

Comment 7 George Machitidze 2013-01-18 19:10:06 UTC
>open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) = 5
>which seems to indicate that you didn't really shut that server down cleanly.

I see, but It's pg_upgrade is responsible for all that, user interaction at that moment is not possible.

I will check status with pg_controldata and update the bug. I'm very sure DB was stopped cleanly.

Comment 8 George Machitidze 2013-01-18 19:47:56 UTC
Created attachment 682660 [details]
Patch for RPM spec

Comment 9 George Machitidze 2013-01-18 19:48:55 UTC
Surprisingly, DB was shut down according to both pg_controldata versions:

[root@giomacdesk SPECS]# /usr/lib64/pgsql/postgresql-9.1/bin/pg_controldata /var/lib/pgsql/data/
pg_control version number:            903
Catalog version number:               201105231
Database system identifier:           5779942941333618555
Database cluster state:               shut down
pg_control last modified:             Fri 04 Jan 2013 08:09:38 PM GET
Latest checkpoint location:           0/E0D56900
Prior checkpoint location:            0/E0D568A8
Latest checkpoint's REDO location:    0/E0D56900
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/907918
Latest checkpoint's NextOID:          142607
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        1858
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Fri 04 Jan 2013 08:09:38 PM GET
Minimum recovery ending location:     0/0
Backup start location:                0/0
Current wal_level setting:            minimal
Current max_connections setting:      150
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value
[root@giomacdesk SPECS]# pg_controldata /var/lib/pgsql/data/
WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy.

pg_control version number:            903
Catalog version number:               201105231
Database system identifier:           5779942941333618555
Database cluster state:               shut down
pg_control last modified:             Fri 04 Jan 2013 08:09:38 PM GET
Latest checkpoint location:           0/E0D56900
Prior checkpoint location:            0/E0D568A8
Latest checkpoint's REDO location:    0/E0D56900
Latest checkpoint's TimeLineID:       1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:          907918/142607
Latest checkpoint's NextOID:          1
Latest checkpoint's NextMultiXactId:  0
Latest checkpoint's NextMultiOffset:  1858
Latest checkpoint's oldestXID:        1
Latest checkpoint's oldestXID's DB:   1357315778
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Thu 01 Jan 1970 04:00:00 AM TBIT
Minimum recovery ending location:     0/0
Backup start location:                0/96
Backup end location:                  0/40
End-of-backup record required:        yes
Current wal_level setting:            minimal
Current max_connections setting:      0
Current max_prepared_xacts setting:   1093850759
Current max_locks_per_xact setting:   8192
Maximum data alignment:               131072
Database block size:                  64
Blocks per segment of large relation: 32
WAL block size:                       1996
Bytes per WAL segment:                65793
Maximum length of identifiers:        498519280
Maximum columns in an index:          0
Maximum size of a TOAST chunk:        0
Date/time type storage:               floating-point numbers
Float4 argument passing:              by reference
Float8 argument passing:              by reference
[root@giomacdesk SPECS]#

Comment 10 George Machitidze 2013-01-18 19:49:29 UTC
"WARNING: Calculated CRC checksum does not match value stored in file.
Either the file is corrupt, or it has a different layout than this program
is expecting.  The results below are untrustworthy."

Comment 11 Tom Lane 2013-01-18 20:13:59 UTC
(In reply to comment #10)
> "WARNING: Calculated CRC checksum does not match value stored in file.
> Either the file is corrupt, or it has a different layout than this program
> is expecting.  The results below are untrustworthy."

Well, that's expected when you're using the wrong pg_controldata version.

But anyway, if it *was* shut down, as appears to be the case from the pg_control file, we need to figure out what's that postmaster.pid file doing there.

What is in the postmaster.pid file, and how does its file mod date compare to that of the global/pg_control file?  Do you remember the exact sequence of events when you were shutting down the old version?

Comment 12 George Machitidze 2013-01-18 20:40:51 UTC
Only thing I did before this was complete FC17>FC18 upgrade via yum and reboot with fresh kernel.

Comment 13 George Machitidze 2013-01-18 22:16:30 UTC
>"What is in the postmaster.pid file, and how does its file mod date compare to that of >the global/pg_control file?"
It's internal thing, I really can't tell how upgrade process is working there. pid file will contain postmaster PID. Probably pg_upgrade is checking it instantly, without delay.
We need someone from PostgreSQL.
I will post in mailing list...

Comment 14 Fedora Admin XMLRPC Client 2013-05-09 14:06:44 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 15 Fedora Admin XMLRPC Client 2013-05-16 09:59:07 UTC
This package has changed ownership in the Fedora Package Database.  Reassigning to the new owner of this component.

Comment 16 Pavel Raiskup 2013-06-10 13:18:55 UTC
Based on discussion here:

http://www.postgresql.org/message-id/flat/E1TwKHs-0005yp-39@wrigleys.postgresql.org#E1TwKHs-0005yp-39@wrigleys.postgresql.org

The fix applied upstream does not do anything very special, IIRC.  The point is
to just check _for sure_ whether postmaster running, or not (by trying to run &
stop it).  But if you are sure that the postmaster is turned off & the pid file
still exists, it most probably ended in non-expected way (the pid file is
leftover which may be removed).

I think that back-porting of this patch (which is not so trivial) is not
important enough.  George, if you can not recall the correct steps to reproduce
this issue, I guess this bug should be closed INSUFFICIENT_DATA.

One thing is worth to do (at least rawhide) to package old version of
pg_controldata (it seems to be less than 100kB binary on x86_64, which is not
so big pain).

Thanks for your update on this bug,
Pavel

Comment 17 Pavel Raiskup 2013-06-10 13:34:51 UTC
(In reply to Pavel Raiskup from comment #16)
> One thing is worth to do (at least rawhide) to package old version of
> pg_controldata (it seems to be less than 100kB binary on x86_64, which is not
> so big pain).

Ok, it is already packaged Rawhide.

Comment 18 George Machitidze 2013-06-27 23:58:46 UTC
Here we go

[root@main lib]# rpm -q postgresql-upgrade
postgresql-upgrade-9.2.4-1.fc18.x86_64
[root@main lib]# rpm -ql postgresql-upgrade|grep controldata
/usr/lib64/pgsql/postgresql-9.1/bin/pg_controldata
[root@main lib]# postgresql-setup upgrade
Redirecting to /bin/systemctl stop  postgresql.service
Upgrading database: failed

See /var/lib/pgsql/pgupgrade.log for details.
[root@main lib]# cat /var/lib/pgsql/pgupgrade.log
Performing Consistency Checks
-----------------------------
Checking current, bin, and data directories                 ok
Checking cluster versions                                   ok

connection to database failed: fe_sendauth: no password supplied


could not connect to old postmaster started with the command:
"/usr/lib64/pgsql/postgresql-9.1/bin/pg_ctl" -w -l "pg_upgrade_server.log" -D "/var/lib/pgsql/data-old" -o "-p 5432 -b  -c listen_addresses='' -c unix_socket_permissions=0700 -c unix_socket_directory='/var/lib/pgsql'" start
Failure, exiting


After, added trust to pg_hba:
[root@main lib]# cat /var/lib/pgsql/pgupgrade.log
Performing Consistency Checks
-----------------------------
Checking current, bin, and data directories                 ok

There seems to be a postmaster servicing the old cluster.
Please shutdown that postmaster and try again.
Failure, exiting

Comment 19 Pavel Raiskup 2013-08-12 07:22:43 UTC
> [snip]
>
> After, added trust to pg_hba:
> [root@main lib]# cat /var/lib/pgsql/pgupgrade.log
> Performing Consistency Checks
> -----------------------------
> Checking current, bin, and data directories                 ok
> 
> There seems to be a postmaster servicing the old cluster.
> Please shutdown that postmaster and try again.
> Failure, exiting

Thanks for your reply.  The old postmaster should be stopped after the
authentication fail.  Workaround is obvious:  stop the old server by hand, fix
pg_hba.conf and run 'postgresql-setup upgrade' again.

Comment 20 Pavel Raiskup 2013-08-12 09:03:57 UTC
Created attachment 785618 [details]
Adjust the possibly running cluster a little bit earlier.

Believed fix in attachment.

Comment 21 Pavel Raiskup 2013-08-12 10:50:07 UTC
I'll probably issue an update for F18+ (if nobody complains) once upstream
confirms that this patch is OK:

http://www.postgresql.org/message-id/1376303849-17344-1-git-send-email-praiskup@redhat.com

Comment 22 Fedora Update System 2013-08-15 06:31:45 UTC
postgresql-9.2.4-2.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/postgresql-9.2.4-2.fc18

Comment 23 Fedora Update System 2013-08-15 06:37:17 UTC
postgresql-9.2.4-7.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/postgresql-9.2.4-7.fc19

Comment 24 Fedora Update System 2013-08-15 23:27:31 UTC
Package postgresql-9.2.4-2.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing postgresql-9.2.4-2.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-14828/postgresql-9.2.4-2.fc18
then log in and leave karma (feedback).

Comment 25 Fedora Update System 2013-08-18 00:37:58 UTC
postgresql-9.2.4-7.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 26 Fedora Update System 2013-10-17 14:21:37 UTC
postgresql-9.2.5-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/postgresql-9.2.5-1.fc18

Comment 27 Fedora Update System 2013-10-18 19:49:11 UTC
Package postgresql-9.2.5-1.fc18:
* should fix your issue,
* was pushed to the Fedora 18 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing postgresql-9.2.5-1.fc18'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2013-19316/postgresql-9.2.5-1.fc18
then log in and leave karma (feedback).

Comment 28 Fedora Update System 2013-10-29 03:40:58 UTC
postgresql-9.2.5-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.