Bug 1402257 - postgres userdata dir issues in openshift
Summary: postgres userdata dir issues in openshift
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: ImageStreams
Version: 3.3.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Ben Parees
QA Contact: Wang Haoran
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-07 06:58 UTC by Jaspreet Kaur
Modified: 2020-02-14 18:15 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-01-06 14:59:58 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jaspreet Kaur 2016-12-07 06:58:31 UTC
Description of problem:

Using default postgresql image with persistent storage always report below error :


waiting for server to start....FATAL:  data directory "/var/lib/pgsql/data/userdata" has wrong ownership
HINT:  The server must be started by the user that owns the data directory.
pg_ctl: could not start server
Examine the log output.
 stopped waiting



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. create  simple application or pod using postgresql
2. Add persistent storage to it
3. NFS server configurations

drwxrwxrwx.  3 nobody    nobody     100 Dec  6 17:53 vol1

cat /etc/exports
/var/export/vol1 *(rw,sync,all_squash)


 cat claim.json 
{
  "apiVersion": "v1",
  "kind": "PersistentVolumeClaim",
  "metadata": {
    "name": "postgresql"
  },
  "spec": {
    "accessModes": [ "ReadWriteMany" ],
    "resources": {
      "requests": {
        "storage": "1Gi"
      }
    }
  }
}

cat volume.json

{
  "apiVersion": "v1",
  "kind": "PersistentVolume",
  "metadata": {
    "name": "php-volume"
  },
  "spec": {
    "capacity": {
        "storage": "1Gi"
        },
    "accessModes": [ "ReadWriteMany" ],
    "nfs": {
        "path": "/var/export/vol1",
        "server": "nfs.example.com"
    }
  }
}

 oc volume dc/postgresql --add --overwrite -t persistentVolumeClaim --claim-name=postgresql --name=postgresql-data



Actual results: Fails to start postgresql pod :
waiting for server to start....FATAL:  data directory "/var/lib/pgsql/data/userdata" has wrong ownership
HINT:  The server must be started by the user that owns the data directory.
pg_ctl: could not start server
Examine the log output.
 stopped waiting



Expected results: It should have worked fine without any manually intervention when using default redhat images


Additional info: When using root_squash below error is seen :

chmod: changing permissions of '/var/lib/pgsql/data/userdata': Operation not permitted

Comment 1 Ben Parees 2016-12-07 15:24:01 UTC
you shouldn't be using all_squash with NFS volumes, you need to use root_squash.

all_squash is causing the ownership to get changed to nobody instead of the postgres user, which is why postgres complains.

Comment 2 Jaspreet Kaur 2016-12-08 05:08:26 UTC
Yes, we tried using root_squash but with that we get below error :

chmod: changing permissions of '/var/lib/pgsql/data/userdata': Operation not permitted

Comment 3 Wang Haoran 2016-12-08 07:34:25 UTC
(In reply to Jaspreet Kaur from comment #2)
> Yes, we tried using root_squash but with that we get below error :
> 
> chmod: changing permissions of '/var/lib/pgsql/data/userdata': Operation not
> permitted

Could you please do like this:
mkdir /haowangpv
echo '/haowangpv *(rw)' >> /etc/exports
chown -R nfsnobody:nfsnobody /haowangpv
chmod 777 /haowangpv
exportfs -r


I am wonder whether your nfs directory have correct owner/permission setup.

Comment 4 Jaspreet Kaur 2016-12-08 10:33:37 UTC
Hello,

The issue is still reproducible after checking the permission as above.


 I thing the user in the image is what denying access. Something with image user.

cat /etc/exports
/var/export/regvol *(rw,sync,all_squash)
/var/export/vol1 *(rw)
/var/export/vol2 *(rw,sync,all_squash)
/var/export/vol5 *(rw,sync,all_squash)

Tried root_sqash and all_squash still issue exists.

ls -la /var/export/
total 4
drwxr-xr-x.  6 root      root        52 Aug  9 20:49 .
drwxr-xr-x. 20 root      root      4096 Aug  3 20:10 ..
drwxrwxrwx.  2 nfsnobody nfsnobody    6 Aug  3 20:10 regvol
drwxrwxrwx.  3 nobody    nobody     100 Dec  6 17:53 vol1
drwxrwxrwx.  2 nobody    nobody      22 Aug  9 03:28 vol2
drwxrwxrwx.  2 nobody    nobody      37 Aug  9 21:04 vol5

The issue is quite reproducible with postgres image.

Comment 5 Ben Parees 2016-12-08 13:41:59 UTC
You still have all squash on 3 of those volumes and I don't know which volume is being assigned to your postgres pod, so i don't know that this proves the postgres volume has proper permissions but i can confirm this works correctly for other users, so as Wang said, it's a permission issue w/ your NFS host volumes and/or export settings.

The postgres image creates the userdata directory within the mounted volume when it starts, so it will always be owned by the correct(container) user unless NFS is squashing ownership or otherwise preventing proper ownership settings.

I suggest you consider starting with a simple hostpath PV (with 777 permissions) to convince yourself the image works properly and then revisit your NFS PV configuration.

Alternatively you can start a different image with one of these PVs mounted, rsh into it, and do a mkdir yourself and see what permissions the created directory ends up with.  If it's not owned by your container user, then your NFS configuration is incorrect.

I'm assigning this to the storage team in case they can help identify the NFS misconfiguration.

Comment 6 Matthew Wong 2016-12-08 21:48:15 UTC
This looks like the issue described here https://docs.openshift.org/latest/install_config/persistent_storage/persistent_storage_nfs.html#nfs-additional-config-and-troubleshooting, which links to this potential solution https://access.redhat.com/solutions/33455. Both the server and client (the node where the kubelet & pod are running) should have the proper domain in /etc/idmapd.conf .

Is there a message like "nss_getpwnam: name 'root' does not map into domain 'localdomain'" in /var/log/messages?


Note You need to log in before you can comment on or make changes to this bug.