Description of problem: When deploying the postgresql-persistent template, using a OSP Cinder backed PV, the pod fails to deploy. The issue comes down to the PVC used within the pod not having the correct permissions for the psql user to create the database at /var/lib/pgsql/data where the PVC is mounted. Version-Release number of selected component (if applicable): OSE 3.1 How reproducible: Every time Steps to Reproduce: 1. Create a persistent volume in OSE backed by a OSP cinder volume 2. Deploy a new pod based on the postgresql-persistent template 3. A new pvc is created from the pv and the pod is started but fails to complete deployment Actual results: Pod fails to deploy due to incorrect permissions on the attached pvc. Expected results: The pod deploys successfully with a cinder backed pv/pvc. Additional info: There is a workaround for this issue when using standard NFS documented here: https://blog.openshift.com/deploy-gitlab-openshift/ This will not work for OSP cinder volumes however as the export is 107:107 UID/GID. If this is manually changed it will break OSP. The correct solution here is that kubernetes assigns the proper postgres permissions needed for the mounted volume in the pod/container. 26:26 is the correct UID/GID needed for the pv/pvc.
Through additional testing across multiple database and application persistent templates, I was able to resolve this issue by temporarily disabling selinux. [root@openshift-all-in-one ~]# oc logs -f jenkins-1-xihuo Copying Jenkins configuration to /var/lib/jenkins ... cp: cannot create regular file '/var/lib/jenkins/config.xml.tpl': Permission denied cp: cannot create directory '/var/lib/jenkins/jobs': Permission denied cp: cannot create directory '/var/lib/jenkins/users': Permission denied mkdir: cannot create directory '/var/lib/jenkins/plugins': Permission denied Copying 1 Jenkins plugins to /var/lib/jenkins ... cp: cannot create regular file '/var/lib/jenkins/plugins/': Not a directory Creating initial Jenkins 'admin' user ... sed: can't read /var/lib/jenkins/users/admin/config.xml: No such file or directory /usr/libexec/s2i/run: line 36: /var/lib/jenkins/password: Permission denied touch: cannot touch '/var/lib/jenkins/configured': Permission denied Running from: /usr/lib/jenkins/jenkins.war webroot: EnvVars.masterEnvVars.get("JENKINS_HOME") Apr 13, 2016 11:59:36 PM winstone.Logger logInternal INFO: Beginning extraction from war file Apr 13, 2016 11:59:36 PM winstone.Logger logInternal INFO: Winstone shutdown successfully Apr 13, 2016 11:59:36 PM winstone.Logger logInternal SEVERE: Container startup failed java.io.FileNotFoundException: /var/lib/jenkins/war/META-INF/MANIFEST.MF (No such file or directory) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:162) at winstone.HostConfiguration.getWebRoot(HostConfiguration.java:280) at winstone.HostConfiguration.<init>(HostConfiguration.java:83) at winstone.HostGroup.initHost(HostGroup.java:66) at winstone.HostGroup.<init>(HostGroup.java:45) at winstone.Launcher.<init>(Launcher.java:143) at winstone.Launcher.main(Launcher.java:354) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at Main._main(Main.java:293) at Main.main(Main.java:98) [root@openshift-all-in-one ~]# setenforce 0 [root@openshift-all-in-one ~]# sestatus SELinux status: enabled SELinuxfs mount: /sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: targeted Current mode: permissive Mode from config file: enforcing Policy MLS status: enabled Policy deny_unknown status: allowed Max kernel policy version: 28 [root@openshift-all-in-one ~]# oc logs -f jenkins-1-xihuo Copying Jenkins configuration to /var/lib/jenkins ... Copying 1 Jenkins plugins to /var/lib/jenkins ... Creating initial Jenkins 'admin' user ... Detected password change, updating Jenkins configuration ... Processing Jenkins configuration (/var/lib/jenkins/config.xml.tpl) ... Running from: /usr/lib/jenkins/jenkins.war webroot: EnvVars.masterEnvVars.get("JENKINS_HOME") Apr 14, 2016 12:02:32 AM winstone.Logger logInternal INFO: Beginning extraction from war file Apr 14, 2016 12:02:38 AM org.eclipse.jetty.util.log.JavaUtilLog info INFO: jetty-winstone-2.8 Apr 14, 2016 12:02:47 AM org.eclipse.jetty.util.log.JavaUtilLog info INFO: NO JSP Support for , did not find org.apache.jasper.servlet.JspServlet Jenkins home directory: /var/lib/jenkins found at: EnvVars.masterEnvVars.get("JENKINS_HOME") Apr 14, 2016 12:02:55 AM org.eclipse.jetty.util.log.JavaUtilLog info INFO: Started SelectChannelConnector.0.0:8080 Apr 14, 2016 12:02:55 AM winstone.Logger logInternal INFO: Winstone Servlet Engine v2.0 running: controlPort=disabled Apr 14, 2016 12:02:56 AM jenkins.InitReactorRunner$1 onAttained INFO: Started initialization Apr 14, 2016 12:04:54 AM jenkins.InitReactorRunner$1 onAttained INFO: Listed all plugins Apr 14, 2016 12:04:55 AM jenkins.InitReactorRunner$1 onAttained INFO: Prepared all plugins Apr 14, 2016 12:04:55 AM jenkins.InitReactorRunner$1 onAttained INFO: Started all plugins Apr 14, 2016 12:04:56 AM jenkins.InitReactorRunner$1 onAttained INFO: Augmented all extensions Apr 14, 2016 12:05:48 AM jenkins.InitReactorRunner$1 onAttained INFO: Loaded all jobs Apr 14, 2016 12:05:49 AM hudson.model.AsyncPeriodicWork$1 run INFO: Started Download metadata Apr 14, 2016 12:05:54 AM org.jenkinsci.main.modules.sshd.SSHD start INFO: Started SSHD at port 59749 Apr 14, 2016 12:05:54 AM jenkins.InitReactorRunner$1 onAttained INFO: Completed initialization Apr 14, 2016 12:06:01 AM org.springframework.web.context.support.StaticWebApplicationContext prepareRefresh INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@7913c5b2: display name [Root WebApplicationContext]; startup date [Thu Apr 14 00:06:01 EDT 2016]; root of context hierarchy Apr 14, 2016 12:06:01 AM org.springframework.web.context.support.StaticWebApplicationContext obtainFreshBeanFactory INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@7913c5b2]: org.springframework.beans.factory.support.DefaultListableBeanFactory@3a8b29f4 Apr 14, 2016 12:06:01 AM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@3a8b29f4: defining beans [authenticationManager]; root of factory hierarchy Apr 14, 2016 12:06:12 AM org.springframework.web.context.support.StaticWebApplicationContext prepareRefresh INFO: Refreshing org.springframework.web.context.support.StaticWebApplicationContext@1c8744bc: display name [Root WebApplicationContext]; startup date [Thu Apr 14 00:06:12 EDT 2016]; root of context hierarchy Apr 14, 2016 12:06:12 AM org.springframework.web.context.support.StaticWebApplicationContext obtainFreshBeanFactory INFO: Bean factory for application context [org.springframework.web.context.support.StaticWebApplicationContext@1c8744bc]: org.springframework.beans.factory.support.DefaultListableBeanFactory@57de46d Apr 14, 2016 12:06:12 AM org.springframework.beans.factory.support.DefaultListableBeanFactory preInstantiateSingletons INFO: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@57de46d: defining beans [filter,legacy]; root of factory hierarchy Apr 14, 2016 12:06:17 AM hudson.WebAppMain$3 run INFO: Jenkins is fully up and running Apr 14, 2016 12:06:46 AM hudson.model.UpdateSite updateData INFO: Obtained the latest update center data file for UpdateSource default Apr 14, 2016 12:06:47 AM hudson.model.DownloadService$Downloadable load INFO: Obtained the updated data file for hudson.tasks.Maven.MavenInstaller Apr 14, 2016 12:06:47 AM hudson.model.DownloadService$Downloadable load INFO: Obtained the updated data file for hudson.tasks.Ant.AntInstaller Apr 14, 2016 12:06:49 AM hudson.model.DownloadService$Downloadable load INFO: Obtained the updated data file for hudson.tools.JDKInstaller Apr 14, 2016 12:06:49 AM hudson.model.AsyncPeriodicWork$1 run INFO: Finished Download metadata. 59,902 ms Removed RFE tag as this is a product bug.
OSE 3.1 Persistent templates tested: postgresql-persistent mysql-persistent jenkins-persistent
Hi Brett- The NFS advice you referenced is not applicable to Cinder. For context, the NFS guidance is necessary because there is not a way to securely chown/chmod a NFS mount from the client side. What you can do in 3.1 is specify an fsgroup manually in the pod security context. See: https://docs.openshift.com/enterprise/3.1/install_config/persistent_storage/pod_security_context.html#fsgroup. You should be able to use basically any group you want (example: 1000) because the postgresq image sets the permissions of the data dir itself. The need is to have the right permissions to _create_ the directory in the volume being used. In 3.2, fsgroup is set to be populated automatically in the 'restricted' SCC -- so you shouldn't have this problem with block devices like cinder or aws ebs. For the record: the postgresql image uses NSS wrapper to run as user 'postgres' no matter what the effective UID is. I'm not certain how your comments re: SELinux are related to this issue.
(In reply to Paul Morie from comment #3) > Hi Brett- > > The NFS advice you referenced is not applicable to Cinder. For context, the > NFS guidance is necessary because there is not a way to securely chown/chmod > a NFS mount from the client side. > > What you can do in 3.1 is specify an fsgroup manually in the pod security > context. See: > https://docs.openshift.com/enterprise/3.1/install_config/persistent_storage/ > pod_security_context.html#fsgroup. You should be able to use basically any > group you want (example: 1000) because the postgresq image sets the > permissions of the data dir itself. The need is to have the right > permissions to _create_ the directory in the volume being used. > > In 3.2, fsgroup is set to be populated automatically in the 'restricted' SCC > -- so you shouldn't have this problem with block devices like cinder or aws > ebs. > > For the record: the postgresql image uses NSS wrapper to run as user > 'postgres' no matter what the effective UID is. > > I'm not certain how your comments re: SELinux are related to this issue. Hi Paul. Thanks for the explanation. I am not sure the initial thought regarding NFS was the correct path. After additional troubleshooting (https://bugzilla.redhat.com/show_bug.cgi?id=1326059#c1) it appears selinux was preventing the PVC to be used. Setting to permissive allowed the operation to succeed. I haven't had time to dig into the selinux side further however it should be fairly easy to reproduce.
Hi Brett, Could you please post the pod description after you post it to the server: oc -o yaml get pod <name of postgres pod> lets see if the pod has an SELinux label assigned. Then get the SCC for the pod oc -o yaml get pod <name of postgres pod> | grep scc oc -o yaml get scc <name of scc from the above line>
(In reply to Sami Wagiaalla from comment #5) > Hi Brett, > > Could you please post the pod description after you post it to the server: > > oc -o yaml get pod <name of postgres pod> > > lets see if the pod has an SELinux label assigned. > > Then get the SCC for the pod > oc -o yaml get pod <name of postgres pod> | grep scc > oc -o yaml get scc <name of scc from the above line> oc -o yaml get pod <name of postgres pod> http://pastebin.test.redhat.com/367097 oc -o yaml get pod <name of postgres pod> | grep scc openshift.io/scc: restricted oc -o yaml get scc <name of scc from the above line> http://pastebin.test.redhat.com/367098
Thanks for the info. It looks like the pod does not have selinux options under its security context, that is probably because openshift has not automatically populated these feilds. Lets try this to enable automatic assignment: oc edit scc restricted # set fsGroup type and selinuxContext type to MustRunAs instead of RunAsAny # save an close... your pods should redeploy now can you access the volume now ?
(In reply to Sami Wagiaalla from comment #7) > Thanks for the info. > > It looks like the pod does not have selinux options under its security > context, that is probably because openshift has not automatically populated > these feilds. > > Lets try this to enable automatic assignment: > oc edit scc restricted > # set fsGroup type and selinuxContext type to MustRunAs instead of RunAsAny > # save an close... your pods should redeploy now > > can you access the volume now ? Yes this worked when deploying the postgresql-persistent template. :) selinux is set to enforcing as well.
(In reply to Brett Thurber from comment #8) > (In reply to Sami Wagiaalla from comment #7) > > Thanks for the info. > > > > It looks like the pod does not have selinux options under its security > > context, that is probably because openshift has not automatically populated > > these feilds. > > > > Lets try this to enable automatic assignment: > > oc edit scc restricted > > # set fsGroup type and selinuxContext type to MustRunAs instead of RunAsAny > > # save an close... your pods should redeploy now > > > > can you access the volume now ? > > Yes this worked when deploying the postgresql-persistent template. :) > > selinux is set to enforcing as well. I spoke too soon: http://pastebin.test.redhat.com/368295 Same results after making the requested changes.
Hmm.. Okay lets do the same diagnoses as before just to make sure that part is working. oc -o yaml get pod <name of postgres pod> oc -o yaml get pod <name of postgres pod> | grep scc oc -o yaml get scc <name of scc from the above line> in addition try to get the permissions of the actual volume: try doing mount | grep cinder or look at the pod description to find the path to the volume. Then: ls -laZ <path to volume>
oc -o yaml get pod <name of postgres pod> http://pastebin.test.redhat.com/375552 oc -o yaml get pod <name of postgres pod> | grep scc openshift.io/scc: restricted oc -o yaml get scc restricted http://pastebin.test.redhat.com/375553 ls -laZ <path to volume> http://pastebin.test.redhat.com/375555
Closing due to age.