[vagrant@openshiftdev ~]$ oc new-app -f /data/src/github.com/openshift/origin/examples/db-templates/postgresql-ephemeral-template.json [snip] [vagrant@openshiftdev ~]$ oc get pods NAME READY STATUS RESTARTS AGE postgresql-1-gth26 0/1 CrashLoopBackOff 5 3m Sometimes the pod runs successfully, but if I kill the pod at least once, it goes into the crash loop. Logs from the pod: [vagrant@openshiftdev ~]$ oc logs -p postgresql-1-gth26 waiting for server to start....FATAL: data directory "/var/lib/pgsql/data/userdata" has group or world access DETAIL: Permissions should be u=rwx (0700). .... stopped waiting pg_ctl: could not start server Examine the log output.
Matin, we met similar issue when using persisten template, please see the bug comment here https://bugzilla.redhat.com/show_bug.cgi?id=1276326#c46; but it works for us with ephemeral template which has been tested today.
I restarted the pod using docker kill. Using `oc scale` has revealed another bug: + '[' '!' -f /var/lib/pgsql/data/userdata/postgresql.conf ']' + set_passwords + pg_ctl -w start -o '-h '\'''\''' pg_ctl: another server might be running; trying to start server anyway waiting for server to start.... done server started + [[ ,,simple_db, = *,simple_db,* ]] + psql --command 'ALTER USER "userHR2" WITH ENCRYPTED PASSWORD '\''uLXs7CicjOQHT1HP'\'';' ERROR: role "userHR2" does not exist (this was with a debugging image, with set -x)
Wenjing you are right and there's a PR there to fix that, so sorry for duplicate. But I'm still concerned about that ALTER USER bug. Ading Honza and Pavel.
Martin, what script set the wrong permissions on */userdata directory? That should be created by initdb (and correctly) automatically. (In reply to Martin Nagy from comment #2) > I restarted the pod using docker kill. Using `oc scale` has revealed another > bug: > + '[' '!' -f /var/lib/pgsql/data/userdata/postgresql.conf ']' > + set_passwords > + pg_ctl -w start -o '-h '\'''\''' > pg_ctl: another server might be running; trying to start server anyway Pre-existing pidfile? > waiting for server to start.... done > server started > + [[ ,,simple_db, = *,simple_db,* ]] > + psql --command 'ALTER USER "userHR2" WITH ENCRYPTED PASSWORD > '\''uLXs7CicjOQHT1HP'\'';' > ERROR: role "userHR2" does not exist > > > (this was with a debugging image, with set -x) Isn't this scenario teststed? I mean, haven't you killed the first container too early?
I don't think I did, but that shouldn't matter, should it?
I doubt it's safe to kill the run-postgresql-master before the initialization phase is done (I mean before it calls 'exec postgres "$@"').
Well, what if it segfaults? There has to be a way to solve this. Worst casse, we can create an empty file right after we know that database is initialized.
What segfaults?
What I meant is that if you kill the script in the middle of initializing, you can't do safe assumptions about the container. There is at least: | source "${CONTAINER_SCRIPTS_PATH}/common.sh" | set_pgdata | check_env_vars | generate_passwd_file | generate_postgresql_config | if [ ! -f "$PGDATA/postgresql.conf" ]; then | initialize_database | fi | set_passwords | unset_env_vars | exec postgres "$@" What if you kill the container in the middle of set_passwords () method, or anywhere else? About the pre-existing pid-file, note the comment: https://github.com/fedora-cloud/Fedora-Dockerfiles/blob/67e4a8a03fee1aa5ff2934e442b46714ba1a1d24/postgresql/root/usr/libexec/cont-postgresql-preexec#L17
(In reply to Pavel Raiskup from comment #9) > What I meant is that if you kill the script in the middle of initializing, > you can't do safe assumptions about the container. Reading again, sorry: s/about the container/about the data directory/ There should be way to tell the environment (OpenShift in this case) from within the container that container initialization phase succeeded -- and you are safe to take the data directory as-is and start it against new container. Until this is done, you should rather throw the data directory and start from scratch.
Pavel, agreed. Here's the PR: https://github.com/openshift/postgresql/pull/82
marking this low severity since it sounds like you only hit it if you interrupt initialization. Please correct me if i'm wrong.
Martin did you only hit this issue in a slave? your original bug report was not a replication scenario, so your PR would have no impact on that scenario.
The original reported issue here was already fixed by the EmptyDir resolution.