Description of problem: the summary say that all ... when running the testsuite, it failed on IA64; the problem is that the socket file is not created when starting postgresql from a shell script it works fine on other machines Version-Release number of selected component (if applicable): postgresql84-server-8.4.2-4.el5.ia64 How reproducible: always (on ia64-5s-m2.test.redhat.com) Steps to Reproduce: 1. create the testing script: .qa.[root@ia64-5s-m2 tps]# cat ptest.sh #!/bin/bash service postgresql stop rm -rf /var/lib/pgsql service postgresql initdb sed -i -e 's/ident/trust/' /var/lib/pgsql/data/pg_hba.conf service postgresql start ls -l /tmp/.s* su -c 'createdb CVE20093230' - postgres 2. ./ptest.sh Actual results: Stopping postgresql service: [ OK ] Initializing database: [ OK ] Starting postgresql service: [ OK ] ls: /tmp/.s*: No such file or directory createdb: could not connect to database postgres: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"? Expected results: the same as when doing the same steps manually: .qa.[root@ia64-5s-m2 tps]# service postgresql stop Stopping postgresql service: [ OK ] .qa.[root@ia64-5s-m2 tps]# rm -rf /var/lib/pgsql .qa.[root@ia64-5s-m2 tps]# service postgresql initdb Initializing database: [ OK ] .qa.[root@ia64-5s-m2 tps]# sed -i -e 's/ident/trust/' /var/lib/pgsql/data/pg_hba.conf .qa.[root@ia64-5s-m2 tps]# service postgresql start Starting postgresql service: [ OK ] .qa.[root@ia64-5s-m2 tps]# ls -l /tmp/.s* srwxrwxrwx 1 postgres postgres 0 Jan 22 08:25 /tmp/.s.PGSQL.5432 -rw------- 1 postgres postgres 26 Jan 22 08:25 /tmp/.s.PGSQL.5432.lock .qa.[root@ia64-5s-m2 tps]# su -c 'createdb CVE20093230' - postgres .qa.[root@ia64-5s-m2 tps]# echo $? 0 Additional info: I've never encountered such a problem with previous postgresql version, so marking regression
I don't think this is a bug. Your script just isn't waiting long enough for the postmaster to be ready to accept connections. Try adding "sleep 2" or so after the service start step. We could make the init script wait till the postmaster is open for business, but then we'd get complaints about that: if there's any recovery to be done, this could require many seconds or even minutes, and people would be unhappy that the system initialization doesn't proceed.
(In reply to comment #1) > I don't think this is a bug. Your script just isn't waiting long enough for > the postmaster to be ready to accept connections. Try adding "sleep 2" or so > after the service start step. ah, it didn't come to my mind that it could be like this, because all other machines worked ... > We could make the init script wait till the postmaster is open for business, > but then we'd get complaints about that: if there's any recovery to be done, > this could require many seconds or even minutes, and people would be unhappy > that the system initialization doesn't proceed. ok, but how does the user know that the database is up then? - I guess people will get unhappy also about the fact that the database doesn't work despite the fact it has been started I've modified the script a bit: .qa.[root@ia64-5s-m2 tps]# cat ptest.sh #!/bin/bash service postgresql stop rm -rf /var/lib/pgsql service postgresql initdb sed -i -e 's/ident/trust/' /var/lib/pgsql/data/pg_hba.conf service postgresql start service postgresql status echo $? ls -l /tmp/.s* su -c 'createdb CVE20093230' - postgres sleep 10 service postgresql status echo $? ls -l /tmp/.s* su -c 'createdb CVE20093230' - postgres .qa.[root@ia64-5s-m2 tps]# ./ptest.sh Stopping postgresql service: [ OK ] Initializing database: [ OK ] Starting postgresql service: [ OK ] postmaster (pid 2352) is running... 0 ls: /tmp/.s*: No such file or directory createdb: could not connect to database postgres: could not connect to server: No such file or directory Is the server running locally and accepting connections on Unix domain socket "/tmp/.s.PGSQL.5432"? postmaster (pid 2411 2410 2409 2408 2406 2352) is running... 0 srwxrwxrwx 1 postgres postgres 0 Jan 26 05:18 /tmp/.s.PGSQL.5432 -rw------- 1 postgres postgres 25 Jan 26 05:18 /tmp/.s.PGSQL.5432.lock so we see that waiting helps, but we do not know how long to wait, as status returns 0 even if the database is not ready IMO, this can be split into two bugs: * starting and status of postgresql84 reports success even if the database is not ready * postgresql84 startup takes ages on ia64 (the testing machine is idle, I don't see any reason why it takes several seconds while all other systems finish the startup before next script command is executed)
[shrug...] The postgres init script has always worked that way, and I don't recall getting any complaints about it before. I'm disinclined to change it in a back branch under time pressure. If you want to file this as an RFE against Fedora rawhide, it would make sense to fool around with it there. I'm not sure whether the connection to ia64 is real or just a reflection of slow disks or something. Will try it on an RHTS machine.
Hmm, I can't reproduce any problem on an RHTS machine (hp-bl870c-02.rhts.eng.bos.redhat.com running last night's RHEL5.5 build). I can do service postgresql start; psql -l -U postgres and it consistently succeeds, which is what I'd expect. The 2-second delay that's built into the init script is only intended to ensure that the postmaster has time to create its pid file, but in practice I'd expect it to be enough to reach full operational status except under extreme load or when there is crash recovery work to be done. One possible explanation is that there's some evidence of a recent kernel performance regression that causes postgres to spend a lot longer than expected figuring out the timezone during startup --- see bug #548403. I am not seeing that regression on this RHTS machine (with kernel 2.6.18-185.el5), but maybe you have a kernel that has it? If the time to do initdb is more than 30 seconds then you probably do have it.
(In reply to comment #6) > One possible explanation is that there's some evidence of a recent kernel > performance regression that causes postgres to spend a lot longer than expected > figuring out the timezone during startup --- see bug #548403. yes, you are right, I'm hitting that bug, trying with TZ=GMT fixes the issue, so closing as duplicate the kernel is 2.6.18-182: .qa.[root@ia64-5s-m2 tps]# uname -a Linux ia64-5s-m2.test.redhat.com 2.6.18-182.el5debug #1 SMP Tue Dec 15 22:23:38 EST 2009 ia64 ia64 ia64 GNU/Linux *** This bug has been marked as a duplicate of bug 548403 ***