Description of problem: CephFS dynamic provision fails. The provisioner is deployed with openshift3/cephfs-provisioner:v0.0.2-2. It used to work well in our last test. # oc describe pvc cephfsc Name: cephfsc Namespace: jhou StorageClass: cephfs Status: Pending Volume: Labels: <none> Annotations: control-plane.alpha.kubernetes.io/leader={"holderIdentity":"58aeb5a3-68a5-11e8-b4a0-0a580a800072","leaseDurationSeconds":15,"acquireTime":"2018-06-05T10:12:59Z","renewTime":"2018-06-05T10:13:01Z","lea... volume.beta.kubernetes.io/storage-class=cephfs volume.beta.kubernetes.io/storage-provisioner=ceph.com/cephfs Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Provisioning 7s ceph.com/cephfs cephfs-provisioner-dc-1-5kg95 58aeb5a3-68a5-11e8-b4a0-0a580a800072 External provisioner is provisioning volume for claim "jhou/cephfsc" Warning ProvisioningFailed 7s ceph.com/cephfs cephfs-provisioner-dc-1-5kg95 58aeb5a3-68a5-11e8-b4a0-0a580a800072 Failed to provision volume with StorageClass "cephfs": exit status 1 Normal ExternalProvisioning 2s (x5 over 7s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "ceph.com/cephfs" or manually created by system administrator # oc logs cephfs-provisioner-dc-1-5kg95 E0605 10:13:27.343570 1 cephfs-provisioner.go:138] failed to provision share "kubernetes-dynamic-pvc-16a0a907-68a9-11e8-b4a0-0a580a800072" for "kubernetes-dynamic-user-16a0a977-68a9-11e8-b4a0-0a580a800072", err: exit status 1, output: Traceback (most recent call last): File "/lib64/python2.7/site.py", line 556, in <module> main() File "/lib64/python2.7/site.py", line 538, in main known_paths = addusersitepackages(known_paths) File "/lib64/python2.7/site.py", line 266, in addusersitepackages user_site = getusersitepackages() File "/lib64/python2.7/site.py", line 241, in getusersitepackages user_base = getuserbase() # this will also set USER_BASE File "/lib64/python2.7/site.py", line 231, in getuserbase USER_BASE = get_config_var('userbase') File "/lib64/python2.7/sysconfig.py", line 516, in get_config_var return get_config_vars().get(name) File "/lib64/python2.7/sysconfig.py", line 473, in get_config_vars _CONFIG_VARS['userbase'] = _getuserbase() File "/lib64/python2.7/sysconfig.py", line 187, in _getuserbase return env_base if env_base else joinuser("~", ".local") File "/lib64/python2.7/sysconfig.py", line 173, in joinuser return os.path.expanduser(os.path.join(*args)) File "/lib64/python2.7/posixpath.py", line 269, in expanduser userhome = pwd.getpwuid(os.getuid()).pw_dir KeyError: 'getpwuid(): uid not found: 1000000000' Version-Release number of selected component (if applicable): openshift v3.10.0-0.58.0 openshift3/cephfs-provisioner:v0.0.2-2 How reproducible: Reproduced on two sets of my env. Steps to Reproduce: 1. Deploy cephfs provisioner, prepare storageclass, pvc 2. PV can not provision, got error in the description above. Actual results: PV can not be provisioned. The log of cephfs provisioner pod shows that it went broken at: ``` File "/lib64/python2.7/posixpath.py", line 269, in expanduser userhome = pwd.getpwuid(os.getuid()).pw_dir KeyError: 'getpwuid(): uid not found: 1000000000' ``` Inside the provisioner container: >>> import os >>> import pwd >>> os.getuid() 1000000000 >>> pwd.getpwuid(os.getuid()) Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'getpwuid(): uid not found: 1000000000' >>> sh-4.2$ id uid=1000000000 gid=0(root) groups=0(root),1000000000 sh-4.2$ grep 1000000000 /etc/passwd sh-4.2$ echo $? 1 Expected results: CephFS PV successfully provisioned. Master Log: Node Log (of failed PODs): PV Dump: PVC Dump: ``` { "kind": "PersistentVolumeClaim", "apiVersion": "v1", "metadata": { "name": "cephfsc", "annotations": { "volume.beta.kubernetes.io/storage-class": "cephfs" } }, "spec": { "accessModes": [ "ReadWriteMany" ], "resources": { "requests": { "storage": "1Gi" } } } } ``` StorageClass Dump (if StorageClass used by PV/PVC): ``` apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: creationTimestamp: 2018-04-24T09:25:47Z name: cephfs resourceVersion: "17294" selfLink: /apis/storage.k8s.io/v1/storageclasses/cephfs uid: 78f308b9-47a1-11e8-9af9-0050569fe2f0 parameters: adminId: admin adminSecretName: cephrbd-secret adminSecretNamespace: default monitors: xxx:6789 provisioner: ceph.com/cephfs reclaimPolicy: Delete volumeBindingMode: Immediate ``` Additional info:
the pod get additional security context # oc get pod cephfs-provisioner-dc-1-gkw8b -o yaml ..... securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1000000000
Vikram, what is the status for this bug? I have an open bug that is dependent on this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1568345
Vikram, do you want me to assign this bug to Joan?
Changed Assignee to chuffman, since Christian is now working on OCP Storage.
OCP 3.6-3.10 is no longer on full support [1]. Marking CLOSED DEFERRED. If you have a customer case with a support exception or have reproduced on 3.11+, please reopen and include those details. When reopening, please set the Target Release to the appropriate version where needed. [1]: https://access.redhat.com/support/policy/updates/openshift