Bug 1279744

Summary: postgresql-92-rhel7 cannot startup on AEP env
Product: OpenShift Container Platform Reporter: Wang Haoran <haowang>
Component: StorageAssignee: Paul Morie <pmorie>
Status: CLOSED ERRATA QA Contact: Liang Xia <lxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.0.0CC: aos-bugs, bleanhar, bparees, jokerman, mmccomas, pruan, sdodson
Target Milestone: ---Keywords: UpcomingRelease
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: atomic-openshift-3.1.0.4-1.git.0.064715c.el7aos Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-26 19:17:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wang Haoran 2015-11-10 07:53:48 UTC
Description of problem:

the postgresql-92-rhel7 image with id c10e6b2e643e cannot startup on aep env , bug can start on ec2 instance.
Version-Release number of selected component (if applicable):


How reproducible:

always
Steps to Reproduce:
1.create a project on AEP env
2.oc process -f https://raw.githubusercontent.com/openshift/origin/master/examples/db-templates/postgresql-ephemeral-template.json | oc create -f -
3.check the pod status

Actual results:
the pod status:
[vagrant@ose test]$ oc get pod
NAME                 READY     STATUS             RESTARTS   AGE
postgresql-1-qtnin   0/1       CrashLoopBackOff   14         44m

log:
[vagrant@ose test]$ oc logs -f postgresql-1-qtnin
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
 
The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".
 
fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in /var/lib/pgsql/data/userdata/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
 
WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
copying template1 to postgres ... ok
 
Success. You can now start the database server using:
 
    postgres -D /var/lib/pgsql/data/userdata
or
    pg_ctl -D /var/lib/pgsql/data/userdata -l logfile start
 
waiting for server to start.... done
server started
waiting for server to shut down.... done
server stopped
waiting for server to start....FATAL:  data directory "/var/lib/pgsql/data/userdata" has group or world access
DETAIL:  Permissions should be u=rwx (0700).
.... stopped waiting
pg_ctl: could not start server
Examine the log output.

Expected results:

should start successfully
Additional info:

Comment 1 Wang Haoran 2015-11-10 08:03:42 UTC
Version-Release number of selected component (if applicable):
openshift v3.1.0.3
kubernetes v1.1.0-origin-1107-g4c8e6f4
oc v3.1.0.3
openshift3/postgresql-92-rhel7    c10e6b2e643e

Comment 4 Ben Parees 2015-11-10 19:58:44 UTC
There is different behavior occuring for creating directories in an EmptyDir volume vs Ephemeral storage.  

EmptyDir volumes are getting a different default permission set and group ownership:

/var/lb/pgsql/data (not a typo) is mounted as EmptyDir:
bash-4.2$ mkdir /var/lb/pgsql/data/newdata
bash-4.2$ ls -l /var/lb/pgsql/data/
total 8
drwxr-sr-x. 2 1000040000 1000040000 4096 Nov 10 19:21 newdata



/var/lib/pgsql/data is ephemeral storage:

bash-4.2$ mkdir /var/lib/pgsql/data/newdata
bash-4.2$ ls -l /var/lib/pgsql/data/
total 8
drwxr-xr-x.  2 1000040000 root 4096 Nov 10 19:18 newdata


Having group-write permission on the created directory is causing postgres to throw an error (probably mysql and mongo too).

We can possibly fix this in the images (modify the permissions after creating the dirs), but i'd like the storage team to take a look first to see if this is really how we want EmptyDir to behave (I assume/hope it doesn't match to how NFS behaves...)

Comment 5 Mark Turansky 2015-11-10 20:03:41 UTC
Reassigning to Paul Morie, as this is his feature and he understands the code.

Comment 6 Ben Parees 2015-11-10 21:02:24 UTC
Confirmed setting the fsGroup and supplementalGroups to RunAsAny allows the postgres image to work with an EmptyDir again.

Comment 7 Dan McPherson 2015-11-11 00:26:21 UTC
The issue seen should be fixed with:

https://github.com/openshift/origin/pull/5839

please open a new bug if not.  I am leaving this bug to track the longer term issue raised here.

Comment 8 Wang Haoran 2015-11-11 05:08:14 UTC
Verified with version atomic-openshift-3.1.0.4-1.git.0.064715c.el7aos
[root@openshift-137 ~]# oc get  scc
NAME         PRIV      CAPS      HOSTDIR   SELINUX     RUNASUSER          FSGROUP    SUPGROUP   PRIORITY
anyuid       false     []        false     MustRunAs   RunAsAny           RunAsAny   RunAsAny   10
hostaccess   false     []        true      MustRunAs   MustRunAsRange     RunAsAny   RunAsAny   <none>
hostmount    false     []        true      MustRunAs   MustRunAsRange     RunAsAny   RunAsAny   <none>
nonroot      false     []        false     MustRunAs   MustRunAsNonRoot   RunAsAny   RunAsAny   <none>
privileged   true      []        true      RunAsAny    RunAsAny           RunAsAny   RunAsAny   <none>
restricted   false     []        false     MustRunAs   MustRunAsRange     RunAsAny   RunAsAny   <none>

Comment 10 errata-xmlrpc 2016-01-26 19:17:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:0070