Bug 1586035 - [CephFS] Dynamic provision fails
Summary: [CephFS] Dynamic provision fails
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.10.z
Assignee: Christian Huffman
QA Contact: Jianwei Hou
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks: 1568345
TreeView+ depends on / blocked
 
Reported: 2018-06-05 11:08 UTC by Jianwei Hou
Modified: 2019-11-20 18:52 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-11-20 18:52:06 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jianwei Hou 2018-06-05 11:08:00 UTC
Description of problem:
CephFS dynamic provision fails. The provisioner is deployed with openshift3/cephfs-provisioner:v0.0.2-2. It used to work well in our last test.

# oc describe pvc cephfsc       
Name:          cephfsc
Namespace:     jhou
StorageClass:  cephfs
Status:        Pending
Volume:        
Labels:        <none>
Annotations:   control-plane.alpha.kubernetes.io/leader={"holderIdentity":"58aeb5a3-68a5-11e8-b4a0-0a580a800072","leaseDurationSeconds":15,"acquireTime":"2018-06-05T10:12:59Z","renewTime":"2018-06-05T10:13:01Z","lea...
               volume.beta.kubernetes.io/storage-class=cephfs
               volume.beta.kubernetes.io/storage-provisioner=ceph.com/cephfs
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
Events:
  Type     Reason                Age              From                                                                                Message
  ----     ------                ----             ----                                                                                -------
  Normal   Provisioning          7s               ceph.com/cephfs cephfs-provisioner-dc-1-5kg95 58aeb5a3-68a5-11e8-b4a0-0a580a800072  External provisioner is provisioning volume for claim "jhou/cephfsc"
  Warning  ProvisioningFailed    7s               ceph.com/cephfs cephfs-provisioner-dc-1-5kg95 58aeb5a3-68a5-11e8-b4a0-0a580a800072  Failed to provision volume with StorageClass "cephfs": exit status 1
  Normal   ExternalProvisioning  2s (x5 over 7s)  persistentvolume-controller                                                         waiting for a volume to be created, either by external provisioner "ceph.com/cephfs" or manually created by system administrator


# oc logs cephfs-provisioner-dc-1-5kg95
E0605 10:13:27.343570       1 cephfs-provisioner.go:138] failed to provision share "kubernetes-dynamic-pvc-16a0a907-68a9-11e8-b4a0-0a580a800072" for "kubernetes-dynamic-user-16a0a977-68a9-11e8-b4a0-0a580a800072", err: exit status 1, output: Traceback (most recent call last):
  File "/lib64/python2.7/site.py", line 556, in <module>
    main()
  File "/lib64/python2.7/site.py", line 538, in main
    known_paths = addusersitepackages(known_paths)
  File "/lib64/python2.7/site.py", line 266, in addusersitepackages
    user_site = getusersitepackages()
  File "/lib64/python2.7/site.py", line 241, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/lib64/python2.7/site.py", line 231, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/lib64/python2.7/sysconfig.py", line 516, in get_config_var
    return get_config_vars().get(name)
  File "/lib64/python2.7/sysconfig.py", line 473, in get_config_vars
    _CONFIG_VARS['userbase'] = _getuserbase()
  File "/lib64/python2.7/sysconfig.py", line 187, in _getuserbase
    return env_base if env_base else joinuser("~", ".local")
  File "/lib64/python2.7/sysconfig.py", line 173, in joinuser
    return os.path.expanduser(os.path.join(*args))
  File "/lib64/python2.7/posixpath.py", line 269, in expanduser
    userhome = pwd.getpwuid(os.getuid()).pw_dir
KeyError: 'getpwuid(): uid not found: 1000000000'


Version-Release number of selected component (if applicable):
openshift v3.10.0-0.58.0
openshift3/cephfs-provisioner:v0.0.2-2

How reproducible:
Reproduced on two sets of my env.

Steps to Reproduce:
1. Deploy cephfs provisioner, prepare storageclass, pvc
2. PV can not provision, got error in the description above.

Actual results:
PV can not be provisioned. The log of cephfs provisioner pod shows that it went broken at:
```
  File "/lib64/python2.7/posixpath.py", line 269, in expanduser                                                                                                                                
    userhome = pwd.getpwuid(os.getuid()).pw_dir                                                                                                                                                
KeyError: 'getpwuid(): uid not found: 1000000000'
```

Inside the provisioner container:
>>> import os
>>> import pwd
>>> os.getuid()
1000000000
>>> pwd.getpwuid(os.getuid())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'getpwuid(): uid not found: 1000000000'
>>> 
sh-4.2$ id
uid=1000000000 gid=0(root) groups=0(root),1000000000
sh-4.2$ grep 1000000000 /etc/passwd
sh-4.2$ echo $?
1


Expected results:
CephFS PV successfully provisioned.

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:
```
{
   "kind": "PersistentVolumeClaim",
   "apiVersion": "v1",
   "metadata": {
     "name": "cephfsc",
     "annotations": {
       "volume.beta.kubernetes.io/storage-class": "cephfs"
     }
   },
   "spec": {
     "accessModes": [
       "ReadWriteMany"
     ],
     "resources": {
       "requests": {
         "storage": "1Gi"
       }
     }
   }
}
```

StorageClass Dump (if StorageClass used by PV/PVC):
```
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: 2018-04-24T09:25:47Z
  name: cephfs
  resourceVersion: "17294"
  selfLink: /apis/storage.k8s.io/v1/storageclasses/cephfs
  uid: 78f308b9-47a1-11e8-9af9-0050569fe2f0
parameters:
  adminId: admin
  adminSecretName: cephrbd-secret
  adminSecretNamespace: default
  monitors: xxx:6789
provisioner: ceph.com/cephfs
reclaimPolicy: Delete
volumeBindingMode: Immediate
```

Additional info:

Comment 1 hchen 2018-06-06 14:34:31 UTC
the pod get additional security context

# oc get pod cephfs-provisioner-dc-1-gkw8b -o yaml
.....
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
      runAsUser: 1000000000

Comment 8 Traci Morrison 2018-07-23 15:01:26 UTC
Vikram, what is the status for this bug? I have an open bug that is dependent on this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1568345

Comment 10 Traci Morrison 2018-08-21 17:19:57 UTC
Vikram, do you want me to assign this bug to Joan?

Comment 11 Joan Hoyt 2019-01-24 13:16:39 UTC
Changed Assignee to chuffman, since Christian is now working on OCP Storage.

Comment 12 Stephen Cuppett 2019-11-20 18:52:06 UTC
OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed.

[1]: https://access.redhat.com/support/policy/updates/openshift


Note You need to log in before you can comment on or make changes to this bug.