1260571 – [origin_devexp_625][origin_devexp_671]Mongo or postgresql pod keeps restarting when create from mongo-persistent-template.json or postgresql-persistent-template.json

Bug 1260571 - [origin_devexp_625][origin_devexp_671]Mongo or postgresql pod keeps restarting when create from mongo-persistent-template.json or postgresql-persistent-template.json

Summary: [origin_devexp_625][origin_devexp_671]Mongo or postgresql pod keeps restartin...

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	OKD
Classification:	Red Hat
Component:	Storage
Sub Component:
Version:	3.x
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Sami Wagiaalla
QA Contact:	Wenjing Zheng
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-09-07 09:37 UTC by Wenjing Zheng
Modified:	2016-06-07 22:46 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-11-23 21:13:39 UTC
Target Upstream Version:
Embargoed:
Flags:	xiuwang: needinfo-

Attachments	(Terms of Use)
mongdb with persistent storage start log (8.63 KB, text/plain) 2015-11-04 06:27 UTC, Wang Haoran	no flags	Details
View All

Description Wenjing Zheng 2015-09-07 09:37:09 UTC

Description of problem:
If create with mongodb-persistent-template.json or postgresql-persistent-template.json, pods keep restarting with errors in pod log.
a. This is error for postgresql:
waiting for server to start....FATAL:  data directory "/var/lib/pgsql/data/userdata" has wrong ownership
HINT:  The server must be started by the user that owns the data directory.
b. This is error for mongodb 
=> Waiting for container IP address ... 172.17.0.10:27017
=> Waiting for MongoDB service startup  ...
note: noprealloc may hurt performance in many applications
Mon  Sep  7 03:09:30.991 [initandlisten] MongoDB starting : pid=25  port=27017 dbpath=/var/lib/mongodb/data 64-bit host=mongodb-1-f8zou
Mon Sep  7 03:09:30.992 [initandlisten] db version v2.4.9
Mon Sep  7 03:09:30.992 [initandlisten] git version: nogitversion
Mon  Sep  7 03:09:30.992 [initandlisten] build info: Linux i-0001298a  2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64  BOOST_LIB_VERSION=1_53
Mon Sep  7 03:09:30.992 [initandlisten] allocator: tcmalloc
Mon  Sep  7 03:09:30.992 [initandlisten] options: { config:  "/var/lib/mongodb/mongodb.conf", dbpath: "/var/lib/mongodb/data",  nohttpinterface: "true", noprealloc: "true", oplogSize: 64, pidfilepath:  "/var/lib/mongodb/mongodb.pid", port: 27017, quiet: "true", smallfiles:  "true" }
Mon Sep  7 03:09:30.998 [initandlisten] journal dir=/var/lib/mongodb/data/journal
Mon Sep  7 03:09:30.998 [initandlisten] recover : no journal files present, no recovery needed
Mon  Sep  7 03:09:30.999 [initandlisten] info preallocateIsFaster couldn't  run due to: couldn't open file  /var/lib/mongodb/data/journal/tempLatencyTest for writing errno:1  Operation not permitted; returning false
Mon  Sep  7 03:09:31.000 [initandlisten] exception in initAndListen: 13516  couldn't open file /var/lib/mongodb/data/journal/j._0 for writing  errno:1 Operation not permitted, terminating
Mon Sep  7 03:09:31.000 dbexit: 
Mon Sep  7 03:09:31.000 [initandlisten] shutdown: going to close listening sockets...
Mon Sep  7 03:09:31.001 [initandlisten] shutdown: going to flush diaglog...
Mon Sep  7 03:09:31.001 [initandlisten] shutdown: going to close sockets...
Mon Sep  7 03:09:31.001 [initandlisten] shutdown: waiting for fs preallocator...
Mon Sep  7 03:09:31.001 [initandlisten] shutdown: lock for final commit...
Mon Sep  7 03:09:31.001 [initandlisten] shutdown: final commit...
Mon Sep  7 03:09:31.001 [initandlisten] shutdown: closing all files...
Mon Sep  7 03:09:31.001 [initandlisten] closeAllFiles() finished
Mon Sep  7 03:09:31.001 [initandlisten] journalCleanup...
Mon Sep  7 03:09:31.001 [initandlisten] removeJournalFiles
Mon Sep  7 03:09:31.004 [initandlisten] shutdown: removing fs lock...
Mon Sep  7 03:09:31.006 dbexit: really exiting now
=> Waiting for MongoDB service startup  ...
=> Waiting for MongoDB service startup  ...
=> Waiting for MongoDB service startup  ...


Version-Release number of selected component (if applicable):
openshift v1.0.5-264-g11321c4
kubernetes v1.1.0-alpha.0-1605-g44c91b1


How reproducible:
always

Steps to Reproduce:
1. Create a nfs server
2. Create persistent volum
3. Create a project
4. Process the persistent template and create with it

Actual results:
Pods keep restarting with errors in pod log

Expected results:
They should both be ready and can be access into

Additional info:
a. mysql persistent template works well.
b. Steps when setup nfs server:
sudo yum install -y nfs-utils
sudo mkdir /myshare
sudo systemctl start nfs-server
cat /etc/exports
/myshare *(rw,all_squash)
sudo chown -R nfsnobody:nfsnobody /myshare
sudo chmod 777 /myshare
sudo exportfs -r
c. PV create file:
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv01
spec:
  capacity:
    storage: 512Mi
  accessModes:
    - ReadWriteOnce
  nfs:
    path: /myshare
    server: <nfs_server_ip>

Comment 1 Rodolfo Carvalho 2015-09-07 12:06:45 UTC

The problem here is file ownership/permissions and PVs.

When you hand in a PV to a container you have to make sure the user will be able to read/write to the PV (e.g., chmod a+rw /nfs/share/entry/path).

Further down, there will be a problem if you have files in your PV that were created/written/owned by user 1006, and later you try to reuse this PV in a container with user 1007.

I have raised this issue before in conversations with Ben, and seems that what we can do here is:
- Make it clear that whoever is providing the PV needs to take care of permissions;
- (maybe) Make OpensShift automatically set ownership of files and directories in PV to be owned by the UID running the container.


As for the current testing scenario, please make sure that the NFS share has file permissions that allow writing files to it.

Comment 2 Wenjing Zheng 2015-09-08 04:59:34 UTC

Postgresql and mongodb pods still keep restarting, here are pod logs:http://fpaste.org/264657/44168833/

Below is my steps of setup nfs server and pv file content:
http://fpaste.org/264654/68790614/

Comment 3 Wenjing Zheng 2015-09-08 05:08:42 UTC

This is pod status when create with https://raw.githubusercontent.com/openshift/postgresql/master/examples/replica/postgresql_replica.json:
http://fpaste.org/264659/14416888/

Comment 4 Martin Nagy 2015-09-08 12:16:56 UTC

Wenjing, are you running the script for nfs setup multiple times? This looks like someone changed the ownership of the directory before the pod starts again, so maybe the "chown -R nfsnobody:nfsnobody" is the culprit here?

If that is the case, making the chmod 777 recursive as well could maybe help solve the problem (though a better solution would be not to call the script multiple times).

If the above does not help, can you please run 'ls -la /myshare' before the first start, after first start, before second start, after second start..?

I find it unlikely that the images would change ownership/permission. Other possibility would be that OpenShift doesn't run the containers with same UID, thought the MySQL image would not work in that case.

Comment 5 Wenjing Zheng 2015-09-09 09:57:48 UTC

As our discussion in IRC, mysql container is using user from root group to run process, so no such issue.

Comment 6 Rodolfo Carvalho 2015-09-09 10:49:31 UTC

Ben, does it make sense that MySQL could run with different user/group than MongoDB and PostgreSQL in a way that relates to file permissions?

OpenShift must not be making any differentiation regarding the contents of an image prior to instantiating it, I suppose.

Comment 7 Martin Nagy 2015-09-09 11:23:39 UTC

Just to clarify: All three of them run with group id 0, MySQL is not the exception here. MySQL just doesn't care about ownership or permissions. And I'm not 100% sure that running with group id 0 enables the process to touch anything on the filesystem and a quick google search seems to suggest that it is not the case. (we should probably still look into it anyway) Maybe the writing is enabled because of some NFS option.

Comment 8 Ben Parees 2015-09-09 14:05:48 UTC

running as gid 0 gives you no special access (it's not like uid 0).  (except that in our s2i images we are chgrp 0/chmod g+rw files so that as a gid 0 member you can modify them)

Comment 9 Martin Nagy 2015-09-09 14:30:46 UTC

The problem is with our documentation on NFS [1], I have already opened a PR [2]. Running as GID 0 wasn't the cause, it was the all_squash option. It causes the NFS server to map all UIDs and GIDs to the anonymous user (nfsnobody).

Wenjing: The solution for this problem should be to remove the all_squash option from /etc/exports, also see the PR [2].

[1] https://ci.openshift.redhat.com/openshift-docs-master-testing/latest/admin_guide/persistent_storage_nfs.html
[2] https://github.com/openshift/openshift-docs/pull/950

Comment 10 DeShuai Ma 2015-09-10 05:36:58 UTC

After change NFS shared dir from "*(rw,all_squash)" to "*(rw)", works well for mongodb.
http://fpaste.org/265519/44186340/

Comment 11 Wenjing Zheng 2015-09-10 08:34:15 UTC

Removing "all_squash" works for NFS server which is setup in fedora; but still failed to NFS server setup in rhel.So assign this bug back.
This is my operations: http://fpaste.org/265543/44187402/

Comment 12 Ben Parees 2015-09-10 15:09:14 UTC

this is a fundamental issue w/ the config/usage of volumes so sending to the storage team to decide how best to resolve it (documentation or legitimate bug).

Comment 13 Wenjing Zheng 2015-09-16 11:51:56 UTC

When try with rhel7 nfs server without "all_squash", the issues cannot be reproduced.

Comment 14 Martin Nagy 2015-09-16 11:55:14 UTC

Some discussion here: https://github.com/openshift/openshift-docs/pull/950

Comment 16 Ben Parees 2015-10-02 09:50:32 UTC

Moved to storage team since the images work as long as all_squash is not in the export definition, but our docs suggest all_squash is needed to ensure write permission.

for postgres all_squash breaks things because postgres requires the files be owned by the postgres user, and all_squash makes them owned by nobody.

so we need a more consistent story around how to make volumes writable.

Comment 17 Ben Parees 2015-10-28 18:55:22 UTC

(02:50:28 PM) claytonc: all_squash is intended where you don't trust your clients at all
(02:50:31 PM) claytonc: that's not this case
(02:50:35 PM) claytonc: (the openshift case)
(02:50:46 PM) claytonc: in fact all_squash defeats project isolation on NFS
(02:50:49 PM) claytonc: on shared volumes
(02:50:55 PM) bparees: claytonc: so should we not recommend all_squash in our NFS docs?
(02:51:25 PM) bparees: claytonc: right now it's part of our NFS setup docs.
(02:51:39 PM) claytonc: bparees: i don't see a scenario where we would want it for an NFS mount that could span namespaces
(02:51:42 PM) claytonc: it's a security risk in that case
(02:52:07 PM) claytonc: if i have an export that i expose to two projects where i don't want people to overlap, in openshift i would simply create a pv and give it to both namespaces
(02:52:22 PM) claytonc: they would be forced by UID allocation on the nodes to be in the uid range allocated to their projcets (which are disjoint)
(02:53:23 PM) claytonc: when all_squash is set, both projects would be under the same user
(02:53:31 PM) claytonc: which means project A and B would be able to see the projects of the other

Based on this discussion, i think this bug should be resolved by updating the docs(https://docs.openshift.org/latest/admin_guide/persistent_storage/persistent_storage_nfs.html#selinux-and-nfs-export-settings) to not tell users to set all_squash.

Comment 18 Eric Paris 2015-10-28 20:21:18 UTC

Clayton and I both agree (see the PR), there is a consistent story. The storage must be configured such that the ssc user has read/write access to the storage. NFS should never have all_squash. (root_squash should be fine/irrelevant)

Just as with a normal application, if change the uid your processes are running as (aka change the SSC user) you will need to change your storage.

I believe @pmorie has some beginnings of work to make some of the uid management dynamic (mainly via supplemental groups).

As a 'simple' postgress workaround if people REALLY want all_squash, I belive you could make uid -1 be the 'postgres' user inside the image as well. Although I should point out, some systems see nobody as a 16 bit -1 and some as a 32 bit -1. So you are really going to have to define the postgres user 3 times...  I think that's a bad idea, and just setting up the storage for the user that needs access is the only sane option.

Comment 21 Mark Turansky 2015-10-29 21:52:20 UTC

Removing squash and chown instructions from docs:

https://github.com/openshift/openshift-docs/pull/1123
https://github.com/openshift/origin/pull/5506

Comment 22 Wenjing Zheng 2015-11-02 09:01:02 UTC

https://bugzilla.redhat.com/show_bug.cgi?id=1276326

Even for just *(rw) cannot resolve this restart now (it does before), qe have reported above bug about postgresql, mongodb also has such issue, log will be in attachment.

Comment 23 Sami Wagiaalla 2015-11-03 21:41:50 UTC

Wenjing,

It seems like Eric was able to start postgres with 777 permissions here: https://bugzilla.redhat.com/show_bug.cgi?id=1276326#c20. Not to recommend 777 but that suggests that given the correct permissions and removing all_squash things should work.

What are the errors you are getting now ?

Comment 24 Wang Haoran 2015-11-04 06:27:53 UTC

Created attachment 1089383 [details]
mongdb with persistent storage start log

Comment 25 Wang Haoran 2015-11-04 06:30:23 UTC

I test mongdb  again using persistent template
create pv steps are same as https://bugzilla.redhat.com/show_bug.cgi?id=1276326#c24, cannot start successfully. 
please check the logs in comment 24.

Comment 26 Wang Haoran 2015-11-04 09:08:29 UTC

I test the mongodb and psql using the difference:
1. make sure :setsebool -P virt_use_nfs 1 
2. make the openshift master as the nfs server

mongodb and psql all can start successfully.

for the comments25, the nfs server is in beijing IDC and the openshift cluster is in MTK, so there is a network problem here.so please update the status to ON_QA.

Comment 27 Wang Haoran 2015-11-05 02:19:47 UTC

verified as the comment 26 said.

Note You need to log in before you can comment on or make changes to this bug.