Bug 557749 - socket not created when started from a shell script on ia64
Summary: socket not created when started from a shell script on ia64
Keywords:
Status: CLOSED DUPLICATE of bug 548403
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: postgresql84
Version: 5.5
Hardware: ia64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Tom Lane
QA Contact: qe-baseos-daemons
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-01-22 13:40 UTC by Karel Volný
Modified: 2013-07-03 03:26 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-02-01 12:58:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Karel Volný 2010-01-22 13:40:37 UTC
Description of problem:
the summary say that all ...

when running the testsuite, it failed on IA64; the problem is that the socket file is not created when starting postgresql from a shell script

it works fine on other machines


Version-Release number of selected component (if applicable):
postgresql84-server-8.4.2-4.el5.ia64


How reproducible:
always (on ia64-5s-m2.test.redhat.com)


Steps to Reproduce:
1. create the testing script:
.qa.[root@ia64-5s-m2 tps]# cat ptest.sh                                                                 
#!/bin/bash                                                                                             

service postgresql stop
rm -rf /var/lib/pgsql  
service postgresql initdb
sed -i -e 's/ident/trust/' /var/lib/pgsql/data/pg_hba.conf
service postgresql start
ls -l /tmp/.s*
su -c 'createdb CVE20093230' - postgres

2. ./ptest.sh

 
Actual results:
Stopping postgresql service:                               [  OK  ]
Initializing database:                                     [  OK  ]
Starting postgresql service:                               [  OK  ]
ls: /tmp/.s*: No such file or directory
createdb: could not connect to database postgres: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?


Expected results:
the same as when doing the same steps manually:

.qa.[root@ia64-5s-m2 tps]# service postgresql stop
Stopping postgresql service:                               [  OK  ]
.qa.[root@ia64-5s-m2 tps]# rm -rf /var/lib/pgsql
.qa.[root@ia64-5s-m2 tps]# service postgresql initdb
Initializing database:                                     [  OK  ]
.qa.[root@ia64-5s-m2 tps]# sed -i -e 's/ident/trust/' /var/lib/pgsql/data/pg_hba.conf
.qa.[root@ia64-5s-m2 tps]# service postgresql start
Starting postgresql service:                               [  OK  ]
.qa.[root@ia64-5s-m2 tps]# ls -l /tmp/.s*
srwxrwxrwx 1 postgres postgres  0 Jan 22 08:25 /tmp/.s.PGSQL.5432
-rw------- 1 postgres postgres 26 Jan 22 08:25 /tmp/.s.PGSQL.5432.lock
.qa.[root@ia64-5s-m2 tps]# su -c 'createdb CVE20093230' - postgres
.qa.[root@ia64-5s-m2 tps]# echo $?
0


Additional info:
I've never encountered such a problem with previous postgresql version, so marking regression

Comment 1 Tom Lane 2010-01-22 14:54:49 UTC
I don't think this is a bug.  Your script just isn't waiting long enough for the postmaster to be ready to accept connections.  Try adding "sleep 2" or so after the service start step.

We could make the init script wait till the postmaster is open for business, but then we'd get complaints about that: if there's any recovery to be done, this could require many seconds or even minutes, and people would be unhappy that the system initialization doesn't proceed.

Comment 4 Karel Volný 2010-01-26 10:32:09 UTC
(In reply to comment #1)
> I don't think this is a bug.  Your script just isn't waiting long enough for
> the postmaster to be ready to accept connections.  Try adding "sleep 2" or so
> after the service start step.

ah, it didn't come to my mind that it could be like this, because all other machines worked ...

> We could make the init script wait till the postmaster is open for business,
> but then we'd get complaints about that: if there's any recovery to be done,
> this could require many seconds or even minutes, and people would be unhappy
> that the system initialization doesn't proceed.    

ok, but how does the user know that the database is up then? - I guess people will get unhappy also about the fact that the database doesn't work despite the fact it has been started

I've modified the script a bit:

.qa.[root@ia64-5s-m2 tps]# cat ptest.sh
#!/bin/bash

service postgresql stop
rm -rf /var/lib/pgsql
service postgresql initdb
sed -i -e 's/ident/trust/' /var/lib/pgsql/data/pg_hba.conf
service postgresql start
service postgresql status
echo $?
ls -l /tmp/.s*
su -c 'createdb CVE20093230' - postgres

sleep 10
service postgresql status
echo $?
ls -l /tmp/.s*
su -c 'createdb CVE20093230' - postgres


.qa.[root@ia64-5s-m2 tps]# ./ptest.sh
Stopping postgresql service:                               [  OK  ]
Initializing database:                                     [  OK  ]
Starting postgresql service:                               [  OK  ]
postmaster (pid 2352) is running...
0
ls: /tmp/.s*: No such file or directory
createdb: could not connect to database postgres: could not connect to server: No such file or directory
        Is the server running locally and accepting
        connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
postmaster (pid 2411 2410 2409 2408 2406 2352) is running...
0
srwxrwxrwx 1 postgres postgres  0 Jan 26 05:18 /tmp/.s.PGSQL.5432
-rw------- 1 postgres postgres 25 Jan 26 05:18 /tmp/.s.PGSQL.5432.lock


so we see that waiting helps, but we do not know how long to wait, as status returns 0 even if the database is not ready

IMO, this can be split into two bugs:

* starting and status of postgresql84 reports success even if the database is not ready

* postgresql84 startup takes ages on ia64
(the testing machine is idle, I don't see any reason why it takes several seconds while all other systems finish the startup before next script command is executed)

Comment 5 Tom Lane 2010-01-26 14:28:49 UTC
[shrug...]  The postgres init script has always worked that way, and I don't recall getting any complaints about it before.  I'm disinclined to change it in a back branch under time pressure.  If you want to file this as an RFE against Fedora rawhide, it would make sense to fool around with it there.

I'm not sure whether the connection to ia64 is real or just a reflection of slow disks or something.  Will try it on an RHTS machine.

Comment 6 Tom Lane 2010-01-26 20:10:28 UTC
Hmm, I can't reproduce any problem on an RHTS machine (hp-bl870c-02.rhts.eng.bos.redhat.com running last night's RHEL5.5 build).  I can do

service postgresql start; psql -l -U postgres

and it consistently succeeds, which is what I'd expect.  The 2-second delay that's built into the init script is only intended to ensure that the postmaster has time to create its pid file, but in practice I'd expect it to be enough to reach full operational status except under extreme load or when there is crash recovery work to be done.

One possible explanation is that there's some evidence of a recent kernel performance regression that causes postgres to spend a lot longer than expected figuring out the timezone during startup --- see bug #548403.  I am not seeing that regression on this RHTS machine (with kernel 2.6.18-185.el5), but maybe you have a kernel that has it?  If the time to do initdb is more than 30 seconds then you probably do have it.

Comment 8 Karel Volný 2010-02-01 12:58:54 UTC
(In reply to comment #6)
> One possible explanation is that there's some evidence of a recent kernel
> performance regression that causes postgres to spend a lot longer than expected
> figuring out the timezone during startup --- see bug #548403.

yes, you are right, I'm hitting that bug, trying with TZ=GMT fixes the issue, so closing as duplicate

the kernel is 2.6.18-182:

.qa.[root@ia64-5s-m2 tps]# uname -a
Linux ia64-5s-m2.test.redhat.com 2.6.18-182.el5debug #1 SMP Tue Dec 15 22:23:38 EST 2009 ia64 ia64 ia64 GNU/Linux

*** This bug has been marked as a duplicate of bug 548403 ***


Note You need to log in before you can comment on or make changes to this bug.