Description of problem: If create with mongodb-persistent-template.json or postgresql-persistent-template.json, pods keep restarting with errors in pod log. a. This is error for postgresql: waiting for server to start....FATAL: data directory "/var/lib/pgsql/data/userdata" has wrong ownership HINT: The server must be started by the user that owns the data directory. b. This is error for mongodb => Waiting for container IP address ... 172.17.0.10:27017 => Waiting for MongoDB service startup ... note: noprealloc may hurt performance in many applications Mon Sep 7 03:09:30.991 [initandlisten] MongoDB starting : pid=25 port=27017 dbpath=/var/lib/mongodb/data 64-bit host=mongodb-1-f8zou Mon Sep 7 03:09:30.992 [initandlisten] db version v2.4.9 Mon Sep 7 03:09:30.992 [initandlisten] git version: nogitversion Mon Sep 7 03:09:30.992 [initandlisten] build info: Linux i-0001298a 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 BOOST_LIB_VERSION=1_53 Mon Sep 7 03:09:30.992 [initandlisten] allocator: tcmalloc Mon Sep 7 03:09:30.992 [initandlisten] options: { config: "/var/lib/mongodb/mongodb.conf", dbpath: "/var/lib/mongodb/data", nohttpinterface: "true", noprealloc: "true", oplogSize: 64, pidfilepath: "/var/lib/mongodb/mongodb.pid", port: 27017, quiet: "true", smallfiles: "true" } Mon Sep 7 03:09:30.998 [initandlisten] journal dir=/var/lib/mongodb/data/journal Mon Sep 7 03:09:30.998 [initandlisten] recover : no journal files present, no recovery needed Mon Sep 7 03:09:30.999 [initandlisten] info preallocateIsFaster couldn't run due to: couldn't open file /var/lib/mongodb/data/journal/tempLatencyTest for writing errno:1 Operation not permitted; returning false Mon Sep 7 03:09:31.000 [initandlisten] exception in initAndListen: 13516 couldn't open file /var/lib/mongodb/data/journal/j._0 for writing errno:1 Operation not permitted, terminating Mon Sep 7 03:09:31.000 dbexit: Mon Sep 7 03:09:31.000 [initandlisten] shutdown: going to close listening sockets... Mon Sep 7 03:09:31.001 [initandlisten] shutdown: going to flush diaglog... Mon Sep 7 03:09:31.001 [initandlisten] shutdown: going to close sockets... Mon Sep 7 03:09:31.001 [initandlisten] shutdown: waiting for fs preallocator... Mon Sep 7 03:09:31.001 [initandlisten] shutdown: lock for final commit... Mon Sep 7 03:09:31.001 [initandlisten] shutdown: final commit... Mon Sep 7 03:09:31.001 [initandlisten] shutdown: closing all files... Mon Sep 7 03:09:31.001 [initandlisten] closeAllFiles() finished Mon Sep 7 03:09:31.001 [initandlisten] journalCleanup... Mon Sep 7 03:09:31.001 [initandlisten] removeJournalFiles Mon Sep 7 03:09:31.004 [initandlisten] shutdown: removing fs lock... Mon Sep 7 03:09:31.006 dbexit: really exiting now => Waiting for MongoDB service startup ... => Waiting for MongoDB service startup ... => Waiting for MongoDB service startup ... Version-Release number of selected component (if applicable): openshift v1.0.5-264-g11321c4 kubernetes v1.1.0-alpha.0-1605-g44c91b1 How reproducible: always Steps to Reproduce: 1. Create a nfs server 2. Create persistent volum 3. Create a project 4. Process the persistent template and create with it Actual results: Pods keep restarting with errors in pod log Expected results: They should both be ready and can be access into Additional info: a. mysql persistent template works well. b. Steps when setup nfs server: sudo yum install -y nfs-utils sudo mkdir /myshare sudo systemctl start nfs-server cat /etc/exports /myshare *(rw,all_squash) sudo chown -R nfsnobody:nfsnobody /myshare sudo chmod 777 /myshare sudo exportfs -r c. PV create file: apiVersion: v1 kind: PersistentVolume metadata: name: pv01 spec: capacity: storage: 512Mi accessModes: - ReadWriteOnce nfs: path: /myshare server: <nfs_server_ip>
The problem here is file ownership/permissions and PVs. When you hand in a PV to a container you have to make sure the user will be able to read/write to the PV (e.g., chmod a+rw /nfs/share/entry/path). Further down, there will be a problem if you have files in your PV that were created/written/owned by user 1006, and later you try to reuse this PV in a container with user 1007. I have raised this issue before in conversations with Ben, and seems that what we can do here is: - Make it clear that whoever is providing the PV needs to take care of permissions; - (maybe) Make OpensShift automatically set ownership of files and directories in PV to be owned by the UID running the container. As for the current testing scenario, please make sure that the NFS share has file permissions that allow writing files to it.
Postgresql and mongodb pods still keep restarting, here are pod logs:http://fpaste.org/264657/44168833/ Below is my steps of setup nfs server and pv file content: http://fpaste.org/264654/68790614/
This is pod status when create with https://raw.githubusercontent.com/openshift/postgresql/master/examples/replica/postgresql_replica.json: http://fpaste.org/264659/14416888/
Wenjing, are you running the script for nfs setup multiple times? This looks like someone changed the ownership of the directory before the pod starts again, so maybe the "chown -R nfsnobody:nfsnobody" is the culprit here? If that is the case, making the chmod 777 recursive as well could maybe help solve the problem (though a better solution would be not to call the script multiple times). If the above does not help, can you please run 'ls -la /myshare' before the first start, after first start, before second start, after second start..? I find it unlikely that the images would change ownership/permission. Other possibility would be that OpenShift doesn't run the containers with same UID, thought the MySQL image would not work in that case.
As our discussion in IRC, mysql container is using user from root group to run process, so no such issue.
Ben, does it make sense that MySQL could run with different user/group than MongoDB and PostgreSQL in a way that relates to file permissions? OpenShift must not be making any differentiation regarding the contents of an image prior to instantiating it, I suppose.
Just to clarify: All three of them run with group id 0, MySQL is not the exception here. MySQL just doesn't care about ownership or permissions. And I'm not 100% sure that running with group id 0 enables the process to touch anything on the filesystem and a quick google search seems to suggest that it is not the case. (we should probably still look into it anyway) Maybe the writing is enabled because of some NFS option.
running as gid 0 gives you no special access (it's not like uid 0). (except that in our s2i images we are chgrp 0/chmod g+rw files so that as a gid 0 member you can modify them)
The problem is with our documentation on NFS [1], I have already opened a PR [2]. Running as GID 0 wasn't the cause, it was the all_squash option. It causes the NFS server to map all UIDs and GIDs to the anonymous user (nfsnobody). Wenjing: The solution for this problem should be to remove the all_squash option from /etc/exports, also see the PR [2]. [1] https://ci.openshift.redhat.com/openshift-docs-master-testing/latest/admin_guide/persistent_storage_nfs.html [2] https://github.com/openshift/openshift-docs/pull/950
After change NFS shared dir from "*(rw,all_squash)" to "*(rw)", works well for mongodb. http://fpaste.org/265519/44186340/
Removing "all_squash" works for NFS server which is setup in fedora; but still failed to NFS server setup in rhel.So assign this bug back. This is my operations: http://fpaste.org/265543/44187402/
this is a fundamental issue w/ the config/usage of volumes so sending to the storage team to decide how best to resolve it (documentation or legitimate bug).
When try with rhel7 nfs server without "all_squash", the issues cannot be reproduced.
Some discussion here: https://github.com/openshift/openshift-docs/pull/950
Moved to storage team since the images work as long as all_squash is not in the export definition, but our docs suggest all_squash is needed to ensure write permission. for postgres all_squash breaks things because postgres requires the files be owned by the postgres user, and all_squash makes them owned by nobody. so we need a more consistent story around how to make volumes writable.
(02:50:28 PM) claytonc: all_squash is intended where you don't trust your clients at all (02:50:31 PM) claytonc: that's not this case (02:50:35 PM) claytonc: (the openshift case) (02:50:46 PM) claytonc: in fact all_squash defeats project isolation on NFS (02:50:49 PM) claytonc: on shared volumes (02:50:55 PM) bparees: claytonc: so should we not recommend all_squash in our NFS docs? (02:51:25 PM) bparees: claytonc: right now it's part of our NFS setup docs. (02:51:39 PM) claytonc: bparees: i don't see a scenario where we would want it for an NFS mount that could span namespaces (02:51:42 PM) claytonc: it's a security risk in that case (02:52:07 PM) claytonc: if i have an export that i expose to two projects where i don't want people to overlap, in openshift i would simply create a pv and give it to both namespaces (02:52:22 PM) claytonc: they would be forced by UID allocation on the nodes to be in the uid range allocated to their projcets (which are disjoint) (02:53:23 PM) claytonc: when all_squash is set, both projects would be under the same user (02:53:31 PM) claytonc: which means project A and B would be able to see the projects of the other Based on this discussion, i think this bug should be resolved by updating the docs(https://docs.openshift.org/latest/admin_guide/persistent_storage/persistent_storage_nfs.html#selinux-and-nfs-export-settings) to not tell users to set all_squash.
Clayton and I both agree (see the PR), there is a consistent story. The storage must be configured such that the ssc user has read/write access to the storage. NFS should never have all_squash. (root_squash should be fine/irrelevant) Just as with a normal application, if change the uid your processes are running as (aka change the SSC user) you will need to change your storage. I believe @pmorie has some beginnings of work to make some of the uid management dynamic (mainly via supplemental groups). As a 'simple' postgress workaround if people REALLY want all_squash, I belive you could make uid -1 be the 'postgres' user inside the image as well. Although I should point out, some systems see nobody as a 16 bit -1 and some as a 32 bit -1. So you are really going to have to define the postgres user 3 times... I think that's a bad idea, and just setting up the storage for the user that needs access is the only sane option.
Removing squash and chown instructions from docs: https://github.com/openshift/openshift-docs/pull/1123 https://github.com/openshift/origin/pull/5506
https://bugzilla.redhat.com/show_bug.cgi?id=1276326 Even for just *(rw) cannot resolve this restart now (it does before), qe have reported above bug about postgresql, mongodb also has such issue, log will be in attachment.
Wenjing, It seems like Eric was able to start postgres with 777 permissions here: https://bugzilla.redhat.com/show_bug.cgi?id=1276326#c20. Not to recommend 777 but that suggests that given the correct permissions and removing all_squash things should work. What are the errors you are getting now ?
Created attachment 1089383 [details] mongdb with persistent storage start log
I test mongdb again using persistent template create pv steps are same as https://bugzilla.redhat.com/show_bug.cgi?id=1276326#c24, cannot start successfully. please check the logs in comment 24.
I test the mongodb and psql using the difference: 1. make sure :setsebool -P virt_use_nfs 1 2. make the openshift master as the nfs server mongodb and psql all can start successfully. for the comments25, the nfs server is in beijing IDC and the openshift cluster is in MTK, so there is a network problem here.so please update the status to ON_QA.
verified as the comment 26 said.