This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 1281268 - PostgreSQL image doesn't work with OpenShift
PostgreSQL image doesn't work with OpenShift
Status: CLOSED NOTABUG
Product: OpenShift Origin
Classification: Red Hat
Component: Image (Show other bugs)
3.x
Unspecified Unspecified
unspecified Severity low
: ---
: ---
Assigned To: Martin Nagy
DeShuai Ma
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-12 03:24 EST by Martin Nagy
Modified: 2017-07-17 05:33 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-12 13:41:10 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Martin Nagy 2015-11-12 03:24:41 EST
[vagrant@openshiftdev ~]$ oc new-app -f /data/src/github.com/openshift/origin/examples/db-templates/postgresql-ephemeral-template.json 
[snip]
[vagrant@openshiftdev ~]$ oc get pods
NAME                 READY     STATUS             RESTARTS   AGE
postgresql-1-gth26   0/1       CrashLoopBackOff   5          3m

Sometimes the pod runs successfully, but if I kill the pod at least once, it goes into the crash loop.

Logs from the pod:
[vagrant@openshiftdev ~]$ oc logs -p postgresql-1-gth26
waiting for server to start....FATAL:  data directory "/var/lib/pgsql/data/userdata" has group or world access
DETAIL:  Permissions should be u=rwx (0700).
.... stopped waiting
pg_ctl: could not start server
Examine the log output.
Comment 1 Wenjing Zheng 2015-11-12 03:45:50 EST
Matin, we met similar issue when using persisten template, please see the bug comment here https://bugzilla.redhat.com/show_bug.cgi?id=1276326#c46; but it works for us with ephemeral template which has been tested today.
Comment 2 Martin Nagy 2015-11-12 03:58:39 EST
I restarted the pod using docker kill. Using `oc scale` has revealed another bug:
+ '[' '!' -f /var/lib/pgsql/data/userdata/postgresql.conf ']'
+ set_passwords
+ pg_ctl -w start -o '-h '\'''\'''
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start.... done
server started
+ [[ ,,simple_db, = *,simple_db,* ]]
+ psql --command 'ALTER USER "userHR2" WITH ENCRYPTED PASSWORD '\''uLXs7CicjOQHT1HP'\'';'
ERROR:  role "userHR2" does not exist


(this was with a debugging image, with set -x)
Comment 3 Martin Nagy 2015-11-12 06:44:06 EST
Wenjing you are right and there's a PR there to fix that, so sorry for duplicate. But I'm still concerned about that ALTER USER bug. Ading Honza and Pavel.
Comment 4 Pavel Raiskup 2015-11-12 07:15:46 EST
Martin, what script set the wrong permissions on */userdata directory?  That
should be created by initdb (and correctly) automatically.

(In reply to Martin Nagy from comment #2)
> I restarted the pod using docker kill. Using `oc scale` has revealed another
> bug:
> + '[' '!' -f /var/lib/pgsql/data/userdata/postgresql.conf ']'
> + set_passwords
> + pg_ctl -w start -o '-h '\'''\'''
> pg_ctl: another server might be running; trying to start server anyway

Pre-existing pidfile?

> waiting for server to start.... done
> server started
> + [[ ,,simple_db, = *,simple_db,* ]]
> + psql --command 'ALTER USER "userHR2" WITH ENCRYPTED PASSWORD
> '\''uLXs7CicjOQHT1HP'\'';'
> ERROR:  role "userHR2" does not exist
> 
> 
> (this was with a debugging image, with set -x)

Isn't this scenario teststed?  I mean, haven't you killed the first
container too early?
Comment 5 Martin Nagy 2015-11-12 08:26:14 EST
I don't think I did, but that shouldn't matter, should it?
Comment 6 Pavel Raiskup 2015-11-12 08:32:06 EST
I doubt it's safe to kill the run-postgresql-master before the initialization
phase is done (I mean before it calls 'exec postgres "$@"').
Comment 7 Martin Nagy 2015-11-12 08:35:47 EST
Well, what if it segfaults? There has to be a way to solve this. Worst casse, we can create an empty file right after we know that database is initialized.
Comment 8 Pavel Raiskup 2015-11-12 08:39:24 EST
What segfaults?
Comment 9 Pavel Raiskup 2015-11-12 08:48:19 EST
What I meant is that if you kill the script in the middle of initializing,
you can't do safe assumptions about the container.  There is at least:

| source "${CONTAINER_SCRIPTS_PATH}/common.sh"
| set_pgdata
| check_env_vars
| generate_passwd_file
| generate_postgresql_config
| if [ ! -f "$PGDATA/postgresql.conf" ]; then
|   initialize_database
| fi
| set_passwords
| unset_env_vars
| exec postgres "$@"

What if you kill the container in the middle of set_passwords () method, or
anywhere else?

About the pre-existing pid-file, note the comment:
https://github.com/fedora-cloud/Fedora-Dockerfiles/blob/67e4a8a03fee1aa5ff2934e442b46714ba1a1d24/postgresql/root/usr/libexec/cont-postgresql-preexec#L17
Comment 10 Pavel Raiskup 2015-11-12 08:56:09 EST
(In reply to Pavel Raiskup from comment #9)
> What I meant is that if you kill the script in the middle of initializing,
> you can't do safe assumptions about the container.

Reading again, sorry:
s/about the container/about the data directory/

There should be way to tell the environment (OpenShift in this case) from
within the container that container initialization phase succeeded -- and you
are safe to take the data directory as-is and start it against new container.

Until this is done, you should rather throw the data directory and start from
scratch.
Comment 11 Martin Nagy 2015-11-12 13:34:46 EST
Pavel, agreed. Here's the PR: https://github.com/openshift/postgresql/pull/82
Comment 12 Ben Parees 2015-11-12 13:37:40 EST
marking this low severity since it sounds like you only hit it if you interrupt initialization.  Please correct me if i'm wrong.
Comment 13 Ben Parees 2015-11-12 13:38:29 EST
Martin did you only hit this issue in a slave?  your original bug report was not a replication scenario, so your PR would have no impact on that scenario.
Comment 14 Ben Parees 2015-11-12 13:41:10 EST
The original reported issue here was already fixed by the EmptyDir resolution.

Note You need to log in before you can comment on or make changes to this bug.