Bug 1281268 - PostgreSQL image doesn't work with OpenShift
Summary: PostgreSQL image doesn't work with OpenShift
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OKD
Classification: Red Hat
Component: Image
Version: 3.x
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Martin Nagy
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-11-12 08:24 UTC by Martin Nagy
Modified: 2017-07-17 09:33 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-12 18:41:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Martin Nagy 2015-11-12 08:24:41 UTC
[vagrant@openshiftdev ~]$ oc new-app -f /data/src/github.com/openshift/origin/examples/db-templates/postgresql-ephemeral-template.json 
[snip]
[vagrant@openshiftdev ~]$ oc get pods
NAME                 READY     STATUS             RESTARTS   AGE
postgresql-1-gth26   0/1       CrashLoopBackOff   5          3m

Sometimes the pod runs successfully, but if I kill the pod at least once, it goes into the crash loop.

Logs from the pod:
[vagrant@openshiftdev ~]$ oc logs -p postgresql-1-gth26
waiting for server to start....FATAL:  data directory "/var/lib/pgsql/data/userdata" has group or world access
DETAIL:  Permissions should be u=rwx (0700).
.... stopped waiting
pg_ctl: could not start server
Examine the log output.

Comment 1 Wenjing Zheng 2015-11-12 08:45:50 UTC
Matin, we met similar issue when using persisten template, please see the bug comment here https://bugzilla.redhat.com/show_bug.cgi?id=1276326#c46; but it works for us with ephemeral template which has been tested today.

Comment 2 Martin Nagy 2015-11-12 08:58:39 UTC
I restarted the pod using docker kill. Using `oc scale` has revealed another bug:
+ '[' '!' -f /var/lib/pgsql/data/userdata/postgresql.conf ']'
+ set_passwords
+ pg_ctl -w start -o '-h '\'''\'''
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start.... done
server started
+ [[ ,,simple_db, = *,simple_db,* ]]
+ psql --command 'ALTER USER "userHR2" WITH ENCRYPTED PASSWORD '\''uLXs7CicjOQHT1HP'\'';'
ERROR:  role "userHR2" does not exist


(this was with a debugging image, with set -x)

Comment 3 Martin Nagy 2015-11-12 11:44:06 UTC
Wenjing you are right and there's a PR there to fix that, so sorry for duplicate. But I'm still concerned about that ALTER USER bug. Ading Honza and Pavel.

Comment 4 Pavel Raiskup 2015-11-12 12:15:46 UTC
Martin, what script set the wrong permissions on */userdata directory?  That
should be created by initdb (and correctly) automatically.

(In reply to Martin Nagy from comment #2)
> I restarted the pod using docker kill. Using `oc scale` has revealed another
> bug:
> + '[' '!' -f /var/lib/pgsql/data/userdata/postgresql.conf ']'
> + set_passwords
> + pg_ctl -w start -o '-h '\'''\'''
> pg_ctl: another server might be running; trying to start server anyway

Pre-existing pidfile?

> waiting for server to start.... done
> server started
> + [[ ,,simple_db, = *,simple_db,* ]]
> + psql --command 'ALTER USER "userHR2" WITH ENCRYPTED PASSWORD
> '\''uLXs7CicjOQHT1HP'\'';'
> ERROR:  role "userHR2" does not exist
> 
> 
> (this was with a debugging image, with set -x)

Isn't this scenario teststed?  I mean, haven't you killed the first
container too early?

Comment 5 Martin Nagy 2015-11-12 13:26:14 UTC
I don't think I did, but that shouldn't matter, should it?

Comment 6 Pavel Raiskup 2015-11-12 13:32:06 UTC
I doubt it's safe to kill the run-postgresql-master before the initialization
phase is done (I mean before it calls 'exec postgres "$@"').

Comment 7 Martin Nagy 2015-11-12 13:35:47 UTC
Well, what if it segfaults? There has to be a way to solve this. Worst casse, we can create an empty file right after we know that database is initialized.

Comment 8 Pavel Raiskup 2015-11-12 13:39:24 UTC
What segfaults?

Comment 9 Pavel Raiskup 2015-11-12 13:48:19 UTC
What I meant is that if you kill the script in the middle of initializing,
you can't do safe assumptions about the container.  There is at least:

| source "${CONTAINER_SCRIPTS_PATH}/common.sh"
| set_pgdata
| check_env_vars
| generate_passwd_file
| generate_postgresql_config
| if [ ! -f "$PGDATA/postgresql.conf" ]; then
|   initialize_database
| fi
| set_passwords
| unset_env_vars
| exec postgres "$@"

What if you kill the container in the middle of set_passwords () method, or
anywhere else?

About the pre-existing pid-file, note the comment:
https://github.com/fedora-cloud/Fedora-Dockerfiles/blob/67e4a8a03fee1aa5ff2934e442b46714ba1a1d24/postgresql/root/usr/libexec/cont-postgresql-preexec#L17

Comment 10 Pavel Raiskup 2015-11-12 13:56:09 UTC
(In reply to Pavel Raiskup from comment #9)
> What I meant is that if you kill the script in the middle of initializing,
> you can't do safe assumptions about the container.

Reading again, sorry:
s/about the container/about the data directory/

There should be way to tell the environment (OpenShift in this case) from
within the container that container initialization phase succeeded -- and you
are safe to take the data directory as-is and start it against new container.

Until this is done, you should rather throw the data directory and start from
scratch.

Comment 11 Martin Nagy 2015-11-12 18:34:46 UTC
Pavel, agreed. Here's the PR: https://github.com/openshift/postgresql/pull/82

Comment 12 Ben Parees 2015-11-12 18:37:40 UTC
marking this low severity since it sounds like you only hit it if you interrupt initialization.  Please correct me if i'm wrong.

Comment 13 Ben Parees 2015-11-12 18:38:29 UTC
Martin did you only hit this issue in a slave?  your original bug report was not a replication scenario, so your PR would have no impact on that scenario.

Comment 14 Ben Parees 2015-11-12 18:41:10 UTC
The original reported issue here was already fixed by the EmptyDir resolution.


Note You need to log in before you can comment on or make changes to this bug.