Bug 1326059 - OSP backed Cinder Persistent storage incorrect permissions for postgresql-persistent template
Summary: OSP backed Cinder Persistent storage incorrect permissions for postgresql-per...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage
Version: 3.1.0
Hardware: All
OS: Linux
medium
low
Target Milestone: ---
: ---
Assignee: Paul Morie
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-11 16:55 UTC by Brett Thurber
Modified: 2018-01-03 22:08 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-03 22:08:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Brett Thurber 2016-04-11 16:55:40 UTC
Description of problem:
When deploying the postgresql-persistent template, using a OSP Cinder backed PV, the pod fails to deploy.  The issue comes down to the PVC used within the pod not having the correct permissions for the psql user to create the database at /var/lib/pgsql/data where the PVC is mounted.

Version-Release number of selected component (if applicable):
OSE 3.1

How reproducible:
Every time

Steps to Reproduce:
1. Create a persistent volume in OSE backed by a OSP cinder volume 
2. Deploy a new pod based on the postgresql-persistent template
3. A new pvc is created from the pv and the pod is started but fails to complete deployment

Actual results:
Pod fails to deploy due to incorrect permissions on the attached pvc.

Expected results:
The pod deploys successfully with a cinder backed pv/pvc.

Additional info:
There is a workaround for this issue when using standard NFS documented here:  https://blog.openshift.com/deploy-gitlab-openshift/

This will not work for OSP cinder volumes however as the export is 107:107 UID/GID.  If this is manually changed it will break OSP.

The correct solution here is that kubernetes assigns the proper postgres permissions needed for the mounted volume in the pod/container. 26:26 is the correct UID/GID needed for the pv/pvc.

Comment 1 Brett Thurber 2016-04-14 04:28:24 UTC
Through additional testing across multiple database and application persistent templates, I was able to resolve this issue by temporarily disabling selinux.

[root@openshift-all-in-one ~]# oc logs -f jenkins-1-xihuo
Copying Jenkins configuration to /var/lib/jenkins ...
cp: cannot create regular file '/var/lib/jenkins/config.xml.tpl': Permission denied
cp: cannot create directory '/var/lib/jenkins/jobs': Permission denied
cp: cannot create directory '/var/lib/jenkins/users': Permission denied
mkdir: cannot create directory '/var/lib/jenkins/plugins': Permission denied
Copying 1 Jenkins plugins to /var/lib/jenkins ...
cp: cannot create regular file '/var/lib/jenkins/plugins/': Not a directory
Creating initial Jenkins 'admin' user ...
sed: can't read /var/lib/jenkins/users/admin/config.xml: No such file or directory
/usr/libexec/s2i/run: line 36: /var/lib/jenkins/password: Permission denied
touch: cannot touch '/var/lib/jenkins/configured': Permission denied
Running from: /usr/lib/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Apr 13, 2016 11:59:36 PM winstone.Logger logInternal
INFO: Beginning extraction from war file
Apr 13, 2016 11:59:36 PM winstone.Logger logInternal
INFO: Winstone shutdown successfully
Apr 13, 2016 11:59:36 PM winstone.Logger logInternal
SEVERE: Container startup failed
java.io.FileNotFoundException: /var/lib/jenkins/war/META-INF/MANIFEST.MF (No such file or directory)
	at java.io.FileOutputStream.open0(Native Method)
	at java.io.FileOutputStream.open(FileOutputStream.java:270)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
	at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
	at winstone.HostConfiguration.getWebRoot(HostConfiguration.java:280)
	at winstone.HostConfiguration.<init>(HostConfiguration.java:83)
	at winstone.HostGroup.initHost(HostGroup.java:66)
	at winstone.HostGroup.<init>(HostGroup.java:45)
	at winstone.Launcher.<init>(Launcher.java:143)
	at winstone.Launcher.main(Launcher.java:354)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at Main._main(Main.java:293)
	at Main.main(Main.java:98)

[root@openshift-all-in-one ~]# setenforce 0
[root@openshift-all-in-one ~]# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28
[root@openshift-all-in-one ~]# oc logs -f jenkins-1-xihuo
Copying Jenkins configuration to /var/lib/jenkins ...
Copying 1 Jenkins plugins to /var/lib/jenkins ...
Creating initial Jenkins 'admin' user ...
Detected password change, updating Jenkins configuration ...
Processing Jenkins configuration (/var/lib/jenkins/config.xml.tpl) ...
Running from: /usr/lib/jenkins/jenkins.war
webroot: EnvVars.masterEnvVars.get("JENKINS_HOME")
Apr 14, 2016 12:02:32 AM winstone.Logger logInternal
INFO: Beginning extraction from war file
Apr 14, 2016 12:02:38 AM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: jetty-winstone-2.8
Apr 14, 2016 12:02:47 AM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: NO JSP Support for , did not find org.apache.jasper.servlet.JspServlet
Jenkins home directory: /var/lib/jenkins found at: EnvVars.masterEnvVars.get("JENKINS_HOME")
Apr 14, 2016 12:02:55 AM org.eclipse.jetty.util.log.JavaUtilLog info
INFO: Started SelectChannelConnector.0.0:8080
Apr 14, 2016 12:02:55 AM winstone.Logger logInternal
INFO: Winstone Servlet Engine v2.0 running: controlPort=disabled
Apr 14, 2016 12:02:56 AM jenkins.InitReactorRunner$1 onAttained
INFO: Started initialization
Apr 14, 2016 12:04:54 AM jenkins.InitReactorRunner$1 onAttained
INFO: Listed all plugins
Apr 14, 2016 12:04:55 AM jenkins.InitReactorRunner$1 onAttained
INFO: Prepared all plugins
Apr 14, 2016 12:04:55 AM jenkins.InitReactorRunner$1 onAttained
INFO: Started all plugins
Apr 14, 2016 12:04:56 AM jenkins.InitReactorRunner$1 onAttained
INFO: Augmented all extensions
Apr 14, 2016 12:05:48 AM jenkins.InitReactorRunner$1 onAttained
INFO: Loaded all jobs
Apr 14, 2016 12:05:49 AM hudson.model.AsyncPeriodicWork$1 run
INFO: Started Download metadata
Apr 14, 2016 12:05:54 AM org.jenkinsci.main.modules.sshd.SSHD start
INFO: Started SSHD at port 59749
Apr 14, 2016 12:05:54 AM jenkins.InitReactorRunner$1 onAttained
INFO: Completed initialization
Apr 14, 2016 12:06:01 AM org.springframework.web.context.support.StaticWebApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@7913c5b2: display name [Root WebApplicationContext]; startup date [Thu Apr 14 00:06:01 EDT 2016]; root of context hierarchy
Apr 14, 2016 12:06:01 AM org.springframework.web.context.support.StaticWebApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@7913c5b2]: org.springframework.beans.factory.support.DefaultListableBeanFactory@3a8b29f4
Apr 14, 2016 12:06:01 AM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@3a8b29f4: defining beans [authenticationManager]; root of factory hierarchy
Apr 14, 2016 12:06:12 AM org.springframework.web.context.support.StaticWebApplicationContext prepareRefresh
INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@1c8744bc: display name [Root WebApplicationContext]; startup date [Thu Apr 14 00:06:12 EDT 2016]; root of context hierarchy
Apr 14, 2016 12:06:12 AM org.springframework.web.context.support.StaticWebApplicationContext obtainFreshBeanFactory
INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@1c8744bc]: org.springframework.beans.factory.support.DefaultListableBeanFactory@57de46d
Apr 14, 2016 12:06:12 AM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons
INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@57de46d: defining beans [filter,legacy]; root of factory hierarchy
Apr 14, 2016 12:06:17 AM hudson.WebAppMain$3 run
INFO: Jenkins is fully up and running
Apr 14, 2016 12:06:46 AM hudson.model.UpdateSite updateData
INFO: Obtained the latest update center data file for UpdateSource default
Apr 14, 2016 12:06:47 AM hudson.model.DownloadService$Downloadable load
INFO: Obtained the updated data file for hudson.tasks.Maven.MavenInstaller
Apr 14, 2016 12:06:47 AM hudson.model.DownloadService$Downloadable load
INFO: Obtained the updated data file for hudson.tasks.Ant.AntInstaller
Apr 14, 2016 12:06:49 AM hudson.model.DownloadService$Downloadable load
INFO: Obtained the updated data file for hudson.tools.JDKInstaller
Apr 14, 2016 12:06:49 AM hudson.model.AsyncPeriodicWork$1 run
INFO: Finished Download metadata. 59,902 ms

Removed RFE tag as this is a product bug.

Comment 2 Brett Thurber 2016-04-14 04:30:16 UTC
OSE 3.1 Persistent templates tested:

postgresql-persistent
mysql-persistent
jenkins-persistent

Comment 3 Paul Morie 2016-04-18 16:02:03 UTC
Hi Brett-

The NFS advice you referenced is not applicable to Cinder.  For context, the NFS guidance is necessary because there is not a way to securely chown/chmod a NFS mount from the client side.

What you can do in 3.1 is specify an fsgroup manually in the pod security context.  See: https://docs.openshift.com/enterprise/3.1/install_config/persistent_storage/pod_security_context.html#fsgroup.  You should be able to use basically any group you want (example: 1000) because the postgresq image sets the permissions of the data dir itself.  The need is to have the right permissions to _create_ the directory in the volume being used.

In 3.2, fsgroup is set to be populated automatically in the 'restricted' SCC -- so you shouldn't have this problem with block devices like cinder or aws ebs.

For the record: the postgresql image uses NSS wrapper to run as user 'postgres' no matter what the effective UID is.

I'm not certain how your comments re: SELinux are related to this issue.

Comment 4 Brett Thurber 2016-04-18 16:11:24 UTC
(In reply to Paul Morie from comment #3)
> Hi Brett-
> 
> The NFS advice you referenced is not applicable to Cinder.  For context, the
> NFS guidance is necessary because there is not a way to securely chown/chmod
> a NFS mount from the client side.
> 
> What you can do in 3.1 is specify an fsgroup manually in the pod security
> context.  See:
> https://docs.openshift.com/enterprise/3.1/install_config/persistent_storage/
> pod_security_context.html#fsgroup.  You should be able to use basically any
> group you want (example: 1000) because the postgresq image sets the
> permissions of the data dir itself.  The need is to have the right
> permissions to _create_ the directory in the volume being used.
> 
> In 3.2, fsgroup is set to be populated automatically in the 'restricted' SCC
> -- so you shouldn't have this problem with block devices like cinder or aws
> ebs.
> 
> For the record: the postgresql image uses NSS wrapper to run as user
> 'postgres' no matter what the effective UID is.
> 
> I'm not certain how your comments re: SELinux are related to this issue.

Hi Paul.  Thanks for the explanation.  I am not sure the initial thought regarding NFS was the correct path.  After additional troubleshooting (https://bugzilla.redhat.com/show_bug.cgi?id=1326059#c1) it appears selinux was preventing the PVC to be used.  Setting to permissive allowed the operation to succeed.  I haven't had time to dig into the selinux side further however it should be fairly easy to reproduce.

Comment 5 Sami Wagiaalla 2016-04-19 13:28:54 UTC
Hi Brett,

Could you please post the pod description after you post it to the server:

oc -o yaml get pod <name of postgres pod>

lets see if the pod has an SELinux label assigned.

Then get the SCC for the pod
oc -o yaml get pod <name of postgres pod> | grep scc
oc -o yaml get scc <name of scc from the above line>

Comment 6 Brett Thurber 2016-04-20 00:20:29 UTC
(In reply to Sami Wagiaalla from comment #5)
> Hi Brett,
> 
> Could you please post the pod description after you post it to the server:
> 
> oc -o yaml get pod <name of postgres pod>
> 
> lets see if the pod has an SELinux label assigned.
> 
> Then get the SCC for the pod
> oc -o yaml get pod <name of postgres pod> | grep scc
> oc -o yaml get scc <name of scc from the above line>

oc -o yaml get pod <name of postgres pod>
http://pastebin.test.redhat.com/367097

oc -o yaml get pod <name of postgres pod> | grep scc
openshift.io/scc: restricted

oc -o yaml get scc <name of scc from the above line>
http://pastebin.test.redhat.com/367098

Comment 7 Sami Wagiaalla 2016-04-20 14:08:03 UTC
Thanks for the info.

It looks like the pod does not have selinux options under its security context, that is probably because openshift has not automatically populated these feilds.

Lets try this to enable automatic assignment:
  oc edit scc restricted
  # set fsGroup type and selinuxContext type to MustRunAs instead of RunAsAny
  # save an close... your pods should redeploy now
  
can you access the volume now ?

Comment 8 Brett Thurber 2016-04-22 20:12:57 UTC
(In reply to Sami Wagiaalla from comment #7)
> Thanks for the info.
> 
> It looks like the pod does not have selinux options under its security
> context, that is probably because openshift has not automatically populated
> these feilds.
> 
> Lets try this to enable automatic assignment:
>   oc edit scc restricted
>   # set fsGroup type and selinuxContext type to MustRunAs instead of RunAsAny
>   # save an close... your pods should redeploy now
>   
> can you access the volume now ?

Yes this worked when deploying the postgresql-persistent template.  :)

selinux is set to enforcing as well.

Comment 9 Brett Thurber 2016-04-22 20:17:27 UTC
(In reply to Brett Thurber from comment #8)
> (In reply to Sami Wagiaalla from comment #7)
> > Thanks for the info.
> > 
> > It looks like the pod does not have selinux options under its security
> > context, that is probably because openshift has not automatically populated
> > these feilds.
> > 
> > Lets try this to enable automatic assignment:
> >   oc edit scc restricted
> >   # set fsGroup type and selinuxContext type to MustRunAs instead of RunAsAny
> >   # save an close... your pods should redeploy now
> >   
> > can you access the volume now ?
> 
> Yes this worked when deploying the postgresql-persistent template.  :)
> 
> selinux is set to enforcing as well.

I spoke too soon:
http://pastebin.test.redhat.com/368295

Same results after making the requested changes.

Comment 10 Sami Wagiaalla 2016-04-25 14:33:07 UTC
Hmm.. Okay lets do the same diagnoses as before just to make sure that part is working.

oc -o yaml get pod <name of postgres pod>
oc -o yaml get pod <name of postgres pod> | grep scc
oc -o yaml get scc <name of scc from the above line>

in addition try to get the permissions of the actual volume:
try doing mount | grep cinder or look at the pod description to find the path to the volume. Then:

ls -laZ <path to volume>

Comment 13 Brett Thurber 2016-05-18 23:16:49 UTC
oc -o yaml get pod <name of postgres pod>
http://pastebin.test.redhat.com/375552

oc -o yaml get pod <name of postgres pod> | grep scc
openshift.io/scc: restricted

oc -o yaml get scc restricted
http://pastebin.test.redhat.com/375553

ls -laZ <path to volume>
http://pastebin.test.redhat.com/375555

Comment 14 Bradley Childs 2018-01-03 22:08:00 UTC
Closing due to age.


Note You need to log in before you can comment on or make changes to this bug.