1279744 – postgresql-92-rhel7 cannot startup on AEP env

Bug 1279744 - postgresql-92-rhel7 cannot startup on AEP env

Summary: postgresql-92-rhel7 cannot startup on AEP env

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.0.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Paul Morie
QA Contact:	Liang Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-11-10 07:53 UTC by Wang Haoran
Modified:	2016-01-26 19:17 UTC (History)
CC List:	7 users (show)
Fixed In Version:	atomic-openshift-3.1.0.4-1.git.0.064715c.el7aos
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-01-26 19:17:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2016:0070	0	normal	SHIPPED_LIVE	Important: Red Hat OpenShift Enterprise 3.1.1 bug fix and enhancement update	2016-01-27 00:12:41 UTC

Description Wang Haoran 2015-11-10 07:53:48 UTC

Description of problem:

the postgresql-92-rhel7 image with id c10e6b2e643e cannot startup on aep env , bug can start on ec2 instance.
Version-Release number of selected component (if applicable):

How reproducible:

always
Steps to Reproduce:
1.create a project on AEP env
2.oc process -f https://raw.githubusercontent.com/openshift/origin/master/examples/db-templates/postgresql-ephemeral-template.json | oc create -f -
3.check the pod status

Actual results:
the pod status:
[vagrant@ose test]$ oc get pod
NAME READY STATUS RESTARTS AGE
postgresql-1-qtnin 0/1 CrashLoopBackOff 14 44m

log:
[vagrant@ose test]$ oc logs -f postgresql-1-qtnin
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

fixing permissions on existing directory /var/lib/pgsql/data/userdata ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 32MB
creating configuration files ... ok
creating template1 database in /var/lib/pgsql/data/userdata/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating collations ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or
--auth-local and --auth-host, the next time you run initdb.
copying template1 to postgres ... ok

Success. You can now start the database server using:

postgres -D /var/lib/pgsql/data/userdata
or
pg_ctl -D /var/lib/pgsql/data/userdata -l logfile start

waiting for server to start.... done
server started
waiting for server to shut down.... done
server stopped
waiting for server to start....FATAL: data directory "/var/lib/pgsql/data/userdata" has group or world access
DETAIL: Permissions should be u=rwx (0700).
.... stopped waiting
pg_ctl: could not start server
Examine the log output.

Expected results:

should start successfully
Additional info:

Comment 1 Wang Haoran 2015-11-10 08:03:42 UTC

Version-Release number of selected component (if applicable):
openshift v3.1.0.3
kubernetes v1.1.0-origin-1107-g4c8e6f4
oc v3.1.0.3
openshift3/postgresql-92-rhel7    c10e6b2e643e

Comment 4 Ben Parees 2015-11-10 19:58:44 UTC

There is different behavior occuring for creating directories in an EmptyDir volume vs Ephemeral storage.  

EmptyDir volumes are getting a different default permission set and group ownership:

/var/lb/pgsql/data (not a typo) is mounted as EmptyDir:
bash-4.2$ mkdir /var/lb/pgsql/data/newdata
bash-4.2$ ls -l /var/lb/pgsql/data/
total 8
drwxr-sr-x. 2 1000040000 1000040000 4096 Nov 10 19:21 newdata



/var/lib/pgsql/data is ephemeral storage:

bash-4.2$ mkdir /var/lib/pgsql/data/newdata
bash-4.2$ ls -l /var/lib/pgsql/data/
total 8
drwxr-xr-x.  2 1000040000 root 4096 Nov 10 19:18 newdata


Having group-write permission on the created directory is causing postgres to throw an error (probably mysql and mongo too).

We can possibly fix this in the images (modify the permissions after creating the dirs), but i'd like the storage team to take a look first to see if this is really how we want EmptyDir to behave (I assume/hope it doesn't match to how NFS behaves...)

Comment 5 Mark Turansky 2015-11-10 20:03:41 UTC

Reassigning to Paul Morie, as this is his feature and he understands the code.

Comment 6 Ben Parees 2015-11-10 21:02:24 UTC

Confirmed setting the fsGroup and supplementalGroups to RunAsAny allows the postgres image to work with an EmptyDir again.

Comment 7 Dan McPherson 2015-11-11 00:26:21 UTC

The issue seen should be fixed with:

https://github.com/openshift/origin/pull/5839

please open a new bug if not.  I am leaving this bug to track the longer term issue raised here.

Comment 8 Wang Haoran 2015-11-11 05:08:14 UTC

Verified with version atomic-openshift-3.1.0.4-1.git.0.064715c.el7aos
[root@openshift-137 ~]# oc get  scc
NAME         PRIV      CAPS      HOSTDIR   SELINUX     RUNASUSER          FSGROUP    SUPGROUP   PRIORITY
anyuid       false     []        false     MustRunAs   RunAsAny           RunAsAny   RunAsAny   10
hostaccess   false     []        true      MustRunAs   MustRunAsRange     RunAsAny   RunAsAny   <none>
hostmount    false     []        true      MustRunAs   MustRunAsRange     RunAsAny   RunAsAny   <none>
nonroot      false     []        false     MustRunAs   MustRunAsNonRoot   RunAsAny   RunAsAny   <none>
privileged   true      []        true      RunAsAny    RunAsAny           RunAsAny   RunAsAny   <none>
restricted   false     []        false     MustRunAs   MustRunAsRange     RunAsAny   RunAsAny   <none>

Comment 10 errata-xmlrpc 2016-01-26 19:17:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:0070

Note You need to log in before you can comment on or make changes to this bug.