Description of problem: the postgresql-92-rhel7 image with id c10e6b2e643e cannot startup on aep env , bug can start on ec2 instance. Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1.create a project on AEP env 2.oc process -f https://raw.githubusercontent.com/openshift/origin/master/examples/db-templates/postgresql-ephemeral-template.json | oc create -f - 3.check the pod status Actual results: the pod status: [vagrant@ose test]$ oc get pod NAME READY STATUS RESTARTS AGE postgresql-1-qtnin 0/1 CrashLoopBackOff 14 44m log: [vagrant@ose test]$ oc logs -f postgresql-1-qtnin The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale "en_US.utf8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok creating subdirectories ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 32MB creating configuration files ... ok creating template1 database in /var/lib/pgsql/data/userdata/base/1 ... ok initializing pg_authid ... ok initializing dependencies ... ok creating system views ... ok loading system objects' descriptions ... ok creating collations ... ok creating conversions ... ok creating dictionaries ... ok setting privileges on built-in objects ... ok creating information schema ... ok loading PL/pgSQL server-side language ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok WARNING: enabling "trust" authentication for local connections You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. copying template1 to postgres ... ok Success. You can now start the database server using: postgres -D /var/lib/pgsql/data/userdata or pg_ctl -D /var/lib/pgsql/data/userdata -l logfile start waiting for server to start.... done server started waiting for server to shut down.... done server stopped waiting for server to start....FATAL: data directory "/var/lib/pgsql/data/userdata" has group or world access DETAIL: Permissions should be u=rwx (0700). .... stopped waiting pg_ctl: could not start server Examine the log output. Expected results: should start successfully Additional info:
Version-Release number of selected component (if applicable): openshift v3.1.0.3 kubernetes v1.1.0-origin-1107-g4c8e6f4 oc v3.1.0.3 openshift3/postgresql-92-rhel7 c10e6b2e643e
There is different behavior occuring for creating directories in an EmptyDir volume vs Ephemeral storage. EmptyDir volumes are getting a different default permission set and group ownership: /var/lb/pgsql/data (not a typo) is mounted as EmptyDir: bash-4.2$ mkdir /var/lb/pgsql/data/newdata bash-4.2$ ls -l /var/lb/pgsql/data/ total 8 drwxr-sr-x. 2 1000040000 1000040000 4096 Nov 10 19:21 newdata /var/lib/pgsql/data is ephemeral storage: bash-4.2$ mkdir /var/lib/pgsql/data/newdata bash-4.2$ ls -l /var/lib/pgsql/data/ total 8 drwxr-xr-x. 2 1000040000 root 4096 Nov 10 19:18 newdata Having group-write permission on the created directory is causing postgres to throw an error (probably mysql and mongo too). We can possibly fix this in the images (modify the permissions after creating the dirs), but i'd like the storage team to take a look first to see if this is really how we want EmptyDir to behave (I assume/hope it doesn't match to how NFS behaves...)
Reassigning to Paul Morie, as this is his feature and he understands the code.
Confirmed setting the fsGroup and supplementalGroups to RunAsAny allows the postgres image to work with an EmptyDir again.
The issue seen should be fixed with: https://github.com/openshift/origin/pull/5839 please open a new bug if not. I am leaving this bug to track the longer term issue raised here.
Verified with version atomic-openshift-3.1.0.4-1.git.0.064715c.el7aos [root@openshift-137 ~]# oc get scc NAME PRIV CAPS HOSTDIR SELINUX RUNASUSER FSGROUP SUPGROUP PRIORITY anyuid false [] false MustRunAs RunAsAny RunAsAny RunAsAny 10 hostaccess false [] true MustRunAs MustRunAsRange RunAsAny RunAsAny <none> hostmount false [] true MustRunAs MustRunAsRange RunAsAny RunAsAny <none> nonroot false [] false MustRunAs MustRunAsNonRoot RunAsAny RunAsAny <none> privileged true [] true RunAsAny RunAsAny RunAsAny RunAsAny <none> restricted false [] false MustRunAs MustRunAsRange RunAsAny RunAsAny <none>
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2016:0070