Created attachment 1607280 [details] Must gather output for OCP and OCS must gather Description of problem: Pod fails to move to state "Running" due to "FailedMount" event on a CSI provisioned volume. On investigating the CSI (ceph RBD in this case) nodeplugin logs and state, it is seen that the volume is mounted succesfully (and within reasonable time) on the node where the pod is scheduled. The problem hence seems to be that "other" parts of the system are unable to pick up the change in state but keep reporting "FailedMount" events. End result is that the pod fails to move into "Running" state. Version-Release number of selected component (if applicable): The setup was on AWS OCP dev clusters $ oc version Client Version: version.Info{Major:"", Minor:"", GitVersion:"v4.2.0-alpha.0-21-g3daf746", GitCommit:"3daf746", GitTreeState:"clean", BuildDate:"2019-08-21T16:17:30Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.0+757c32e", GitCommit:"757c32e", GitTreeState:"clean", BuildDate:"2019-08-21T17:56:00Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"} OpenShift Version: 4.2.0-0.ci-2019-08-22-111635 How reproducible: Happens once in 2 to 3 runs of the said reproducer in the "Steps to reproduce" section. It takes anywhere between 2-10 minutes to reproduce, when it occurs. Steps to Reproduce: 1. Setup a OCP+OCS cluster, with Rook+Ceph and a storage class created for RBD (named csi-rbd) 2. Run the ocs-monkey tool (https://github.com/JohnStrunk/ocs-monkey) against this setup with the following CLI options: - ./workload_runner.py -l ./ -m ReadWriteOnce -n testws --oc /path/to/oc --ocs-namespace <rook-ceph namespace> -s csi-rbd 3. In about 2-10 minutes the workload runner will report failure and start gathering ocs and ocp must-gather logs Actual results: Post this failure reporting and log gathering, if the failure reported by ocs-monkey is of the nature, 2019-08-22 14:17:44,014 osio - ERROR - Unhealthy: testws/osio-worker-375766658 2019-08-22 14:17:44,015 root - INFO - starting log collection Inspect the pods in the namespace "testws" to see if any are in the state "ContainerCreating" and further if described report "FailedMount" reasons in their events like below, Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11m default-scheduler Successfully assigned testws/osio-worker-375766658-7f4cf9698c-mjtgp to ip-10-0-149-116.ec2.internal Warning FailedAttachVolume 11m attachdetach-controller Multi-Attach error for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" Volume is already exclusively attached to one node and can't be attached to another Normal SuccessfulAttachVolume 10m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" Warning FailedMount 2m10s (x4 over 8m58s) kubelet, ip-10-0-149-116.ec2.internal Unable to mount volumes for pod "osio-worker-375766658-7f4cf9698c-mjtgp_testws(643b767a-c4e7-11e9-a719-0ac717fcefd8)": timeout expired waiting for volumes to attach or mount for pod "testws"/"osio-worker-375766658-7f4cf9698c-mjtgp". list of unmounted volumes=[data]. list of unattached volumes=[temp data default-token-rxwbx] Expected results: The pod should not fail on "FailedMount" events when the CSI nodeplugin has succeeded on the actual mount. Master Log: Attached ocp must-gather and ocs must-gather Node Log (of failed PODs): Attached ocp must-gather and ocs must-gather PV Dump: $ oc get pv -o yaml pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570 apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: rook-ceph.rbd.csi.ceph.com creationTimestamp: "2019-08-22T14:13:20Z" finalizers: - kubernetes.io/pv-protection name: pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570 resourceVersion: "21693" selfLink: /api/v1/persistentvolumes/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570 uid: fe7227e4-c4e6-11e9-926c-02d05bd2f570 spec: accessModes: - ReadWriteOnce capacity: storage: 3Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: pvc-375766658 namespace: testws resourceVersion: "21668" uid: fdd30dfe-c4e6-11e9-926c-02d05bd2f570 csi: driver: rook-ceph.rbd.csi.ceph.com fsType: xfs nodeStageSecretRef: name: rook-ceph-csi namespace: rook-ceph volumeAttributes: clusterID: rook-ceph pool: replicapool storage.kubernetes.io/csiProvisionerIdentity: 1566482577706-8081-rook-ceph.rbd.csi.ceph.com volumeHandle: 0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d persistentVolumeReclaimPolicy: Delete storageClassName: csi-rbd volumeMode: Filesystem status: phase: Bound PVC Dump: $ oc get pvc -n testws -o yaml pvc-375766658 apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: rook-ceph.rbd.csi.ceph.com creationTimestamp: "2019-08-22T14:13:19Z" finalizers: - kubernetes.io/pvc-protection name: pvc-375766658 namespace: testws resourceVersion: "21697" selfLink: /api/v1/namespaces/testws/persistentvolumeclaims/pvc-375766658 uid: fdd30dfe-c4e6-11e9-926c-02d05bd2f570 spec: accessModes: - ReadWriteOnce resources: requests: storage: 3Gi storageClassName: csi-rbd volumeMode: Filesystem volumeName: pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570 status: accessModes: - ReadWriteOnce capacity: storage: 3Gi phase: Bound StorageClass Dump (if StorageClass used by PV/PVC): $ oc get sc/csi-rbd -o yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"storage.k8s.io/v1","kind":"StorageClass","metadata":{"annotations":{},"name":"csi-rbd"},"parameters":{"clusterID":"rook-ceph","csi.storage.k8s.io/fstype":"xfs","csi.storage.k8s.io/node-stage-secret-name":"rook-ceph-csi","csi.storage.k8s.io/node-stage-secret-namespace":"rook-ceph","csi.storage.k8s.io/provisioner-secret-name":"rook-ceph-csi","csi.storage.k8s.io/provisioner-secret-namespace":"rook-ceph","pool":"replicapool"},"provisioner":"rook-ceph.rbd.csi.ceph.com","reclaimPolicy":"Delete"} creationTimestamp: "2019-08-22T14:01:56Z" name: csi-rbd resourceVersion: "17408" selfLink: /apis/storage.k8s.io/v1/storageclasses/csi-rbd uid: 66e6086b-c4e5-11e9-926c-02d05bd2f570 parameters: clusterID: rook-ceph csi.storage.k8s.io/fstype: xfs csi.storage.k8s.io/node-stage-secret-name: rook-ceph-csi csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph csi.storage.k8s.io/provisioner-secret-name: rook-ceph-csi csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph pool: replicapool provisioner: rook-ceph.rbd.csi.ceph.com reclaimPolicy: Delete volumeBindingMode: Immediate Additional info: ### Here is the full application pod description that failed ### $ oc describe -n testws pod/osio-worker-375766658-7f4cf9698c-mjtgp Name: osio-worker-375766658-7f4cf9698c-mjtgp Namespace: testws Priority: 0 PriorityClassName: <none> Node: ip-10-0-149-116.ec2.internal/10.0.149.116 Start Time: Thu, 22 Aug 2019 10:16:10 -0400 Labels: deployment-id=375766658 pod-template-hash=7f4cf9698c Annotations: openshift.io/scc: restricted Status: Pending IP: Controlled By: ReplicaSet/osio-worker-375766658-7f4cf9698c Containers: osio-workload: Container ID: Image: quay.io/johnstrunk/osio-workload Image ID: Port: <none> Host Port: <none> Args: --untar-rate 10 --rm-rate 10 --kernel-slots 3 State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Readiness: exec [/health.sh] delay=5s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /data from data (rw) /status from temp (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-rxwbx (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: temp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: <unset> data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pvc-375766658 ReadOnly: false default-token-rxwbx: Type: Secret (a volume populated by a Secret) SecretName: default-token-rxwbx Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 11m default-scheduler Successfully assigned testws/osio-worker-375766658-7f4cf9698c-mjtgp to ip-10-0-149-116.ec2.internal Warning FailedAttachVolume 11m attachdetach-controller Multi-Attach error for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" Volume is already exclusively attached to one node and can't be attached to another Normal SuccessfulAttachVolume 10m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" Warning FailedMount 2m10s (x4 over 8m58s) kubelet, ip-10-0-149-116.ec2.internal Unable to mount volumes for pod "osio-worker-375766658-7f4cf9698c-mjtgp_testws(643b767a-c4e7-11e9-a719-0ac717fcefd8)": timeout expired waiting for volumes to attach or mount for pod "testws"/"osio-worker-375766658-7f4cf9698c-mjtgp". list of unmounted volumes=[data]. list of unattached volumes=[temp data default-token-rxwbx] ### CSI RBD nodeplugin information ### $ oc get pods -n rook-ceph -o wide | grep rbdplugin csi-rbdplugin-6lh65 3/3 Running 0 25m 10.0.149.116 ip-10-0-149-116.ec2.internal <none> <none> csi-rbdplugin-gmlpx 3/3 Running 0 25m 10.0.139.83 ip-10-0-139-83.ec2.internal <none> <none> csi-rbdplugin-grpjn 3/3 Running 0 25m 10.0.171.100 ip-10-0-171-100.ec2.internal <none> <none> csi-rbdplugin-provisioner-0 5/5 Running 0 25m 10.131.0.13 ip-10-0-171-100.ec2.internal <none> <none> ### Pertinent logs from the RBD nodeplugin when the volume was staged and published: ### Time stamp when the volume was staged: 14:16:37.683840 2019-08-22T14:16:30.683964044Z I0822 14:16:30.681870 39231 log.go:44] ID: 100 GRPC call: /csi.v1.Node/NodeStageVolume ./ocs-monkey-173462618/ocs-must-gather/ceph/namespaces/rook-ceph/pods/csi-rbdplugin-6lh65/csi-rbdplugin/csi-rbdplugin/logs/current.log:2019-08-22T14:16:37.683960521Z I0822 14:16:37.683840 39231 log.go:50] ID: 110 rbd: successfully mounted stagingPath /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570/globalmount/0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d to targetPath /var/lib/kubelet/pods/643b767a-c4e7-11e9-a719-0ac717fcefd8/volumes/kubernetes.io~csi/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570/mount 2019-08-22T14:16:37.606735716Z I0822 14:16:37.600294 39231 log.go:44] ID: 100 GRPC response: {} 2019-08-22T14:16:37.620975755Z I0822 14:16:37.620492 39231 log.go:44] ID: 110 GRPC call: /csi.v1.Node/NodePublishVolume 2019-08-22T14:16:37.625522042Z I0822 14:16:37.620517 39231 log.go:44] ID: 110 GRPC request: {"staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570/globalmount","target_path":"/var/lib/kubelet/pods/643b767a-c4e7-11e9-a719-0ac717fcefd8/volumes/kubernetes.io~csi/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570/mount","volume_capability":{"AccessType":{"Mount":{"fs_type":"xfs"}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"rook-ceph","pool":"replicapool","storage.kubernetes.io/csiProvisionerIdentity":"1566482577706-8081-rook-ceph.rbd.csi.ceph.com"},"volume_id":"0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d"} 2019-08-22T14:16:37.683960521Z I0822 14:16:37.683840 39231 log.go:50] ID: 110 rbd: successfully mounted stagingPath /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570/globalmount/0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d to targetPath /var/lib/kubelet/pods/643b767a-c4e7-11e9-a719-0ac717fcefd8/volumes/kubernetes.io~csi/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570/mount 2019-08-22T14:16:37.684804717Z I0822 14:16:37.683869 39231 log.go:44] ID: 110 GRPC response: {} ### Partial kubelet logs from the node where volume was staged and published ### sh-4.4# journalctl -u kubelet > /tmp/kub.log sh-4.4# cat /tmp/kub.log | grep fdd30dfe-c4e6-11e9-926c-02d05bd2f570 Aug 22 14:15:57 ip-10-0-149-116 hyperkube[1163]: I0822 14:15:57.964398 1163 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-pbhcp" (UID: "5abd666d-c4e7-11e9-a719-0ac717fcefd8") Aug 22 14:15:57 ip-10-0-149-116 hyperkube[1163]: E0822 14:15:57.964525 1163 nestedpendingoperations.go:278] Operation for "\"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\"" failed. No retries permitted until 2019-08-22 14:15:58.464479379 +0000 UTC m=+1782.779281076 (durationBeforeRetry 500ms). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570\" (UniqueName: \"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\") pod \"osio-worker-375766658-7f4cf9698c-pbhcp\" (UID: \"5abd666d-c4e7-11e9-a719-0ac717fcefd8\") " Aug 22 14:15:58 ip-10-0-149-116 hyperkube[1163]: I0822 14:15:58.537435 1163 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-pbhcp" (UID: "5abd666d-c4e7-11e9-a719-0ac717fcefd8") Aug 22 14:15:58 ip-10-0-149-116 hyperkube[1163]: E0822 14:15:58.538046 1163 nestedpendingoperations.go:278] Operation for "\"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\"" failed. No retries permitted until 2019-08-22 14:15:59.538001315 +0000 UTC m=+1783.852803039 (durationBeforeRetry 1s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570\" (UniqueName: \"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\") pod \"osio-worker-375766658-7f4cf9698c-pbhcp\" (UID: \"5abd666d-c4e7-11e9-a719-0ac717fcefd8\") " Aug 22 14:15:59 ip-10-0-149-116 hyperkube[1163]: I0822 14:15:59.548020 1163 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-pbhcp" (UID: "5abd666d-c4e7-11e9-a719-0ac717fcefd8") Aug 22 14:15:59 ip-10-0-149-116 hyperkube[1163]: E0822 14:15:59.548268 1163 nestedpendingoperations.go:278] Operation for "\"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\"" failed. No retries permitted until 2019-08-22 14:16:01.548228037 +0000 UTC m=+1785.863029746 (durationBeforeRetry 2s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570\" (UniqueName: \"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\") pod \"osio-worker-375766658-7f4cf9698c-pbhcp\" (UID: \"5abd666d-c4e7-11e9-a719-0ac717fcefd8\") " Aug 22 14:16:01 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:01.605943 1163 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-pbhcp" (UID: "5abd666d-c4e7-11e9-a719-0ac717fcefd8") Aug 22 14:16:01 ip-10-0-149-116 hyperkube[1163]: E0822 14:16:01.606188 1163 nestedpendingoperations.go:278] Operation for "\"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\"" failed. No retries permitted until 2019-08-22 14:16:05.606140163 +0000 UTC m=+1789.920941841 (durationBeforeRetry 4s). Error: "Volume has not been added to the list of VolumesInUse in the node's volume status for volume \"pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570\" (UniqueName: \"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\") pod \"osio-worker-375766658-7f4cf9698c-pbhcp\" (UID: \"5abd666d-c4e7-11e9-a719-0ac717fcefd8\") " Aug 22 14:16:05 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:05.651696 1163 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-pbhcp" (UID: "5abd666d-c4e7-11e9-a719-0ac717fcefd8") Aug 22 14:16:06 ip-10-0-149-116 hyperkube[1163]: E0822 14:16:06.457283 1163 nestedpendingoperations.go:278] Operation for "\"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\"" failed. No retries permitted until 2019-08-22 14:16:14.457235875 +0000 UTC m=+1798.772037565 (durationBeforeRetry 8s). Error: "Volume not attached according to node status for volume \"pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570\" (UniqueName: \"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\") pod \"osio-worker-375766658-7f4cf9698c-pbhcp\" (UID: \"5abd666d-c4e7-11e9-a719-0ac717fcefd8\") " Aug 22 14:16:14 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:14.476106 1163 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-pbhcp" (UID: "5abd666d-c4e7-11e9-a719-0ac717fcefd8") Aug 22 14:16:14 ip-10-0-149-116 hyperkube[1163]: E0822 14:16:14.485682 1163 nestedpendingoperations.go:278] Operation for "\"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\"" failed. No retries permitted until 2019-08-22 14:16:30.485636719 +0000 UTC m=+1814.800438468 (durationBeforeRetry 16s). Error: "Volume not attached according to node status for volume \"pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570\" (UniqueName: \"kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d\") pod \"osio-worker-375766658-7f4cf9698c-pbhcp\" (UID: \"5abd666d-c4e7-11e9-a719-0ac717fcefd8\") " Aug 22 14:16:30 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:30.553588 1163 reconciler.go:207] operationExecutor.VerifyControllerAttachedVolume started for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-mjtgp" (UID: "643b767a-c4e7-11e9-a719-0ac717fcefd8") Aug 22 14:16:30 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:30.560062 1163 operation_generator.go:1339] Controller attach succeeded for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-mjtgp" (UID: "643b767a-c4e7-11e9-a719-0ac717fcefd8") device path: "csi-c3f012e05fd3cff043645fbd2b3ba824aa055564194b2e1f55574787ac5a8dbe" Aug 22 14:16:30 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:30.657457 1163 reconciler.go:252] operationExecutor.MountVolume started for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-mjtgp" (UID: "643b767a-c4e7-11e9-a719-0ac717fcefd8") Aug 22 14:16:30 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:30.658345 1163 operation_generator.go:587] MountVolume.WaitForAttach entering for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-mjtgp" (UID: "643b767a-c4e7-11e9-a719-0ac717fcefd8") DevicePath "csi-c3f012e05fd3cff043645fbd2b3ba824aa055564194b2e1f55574787ac5a8dbe" Aug 22 14:16:30 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:30.665243 1163 operation_generator.go:596] MountVolume.WaitForAttach succeeded for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-mjtgp" (UID: "643b767a-c4e7-11e9-a719-0ac717fcefd8") DevicePath "csi-c3f012e05fd3cff043645fbd2b3ba824aa055564194b2e1f55574787ac5a8dbe" Aug 22 14:16:37 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:37.603104 1163 operation_generator.go:623] MountVolume.MountDevice succeeded for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-mjtgp" (UID: "643b767a-c4e7-11e9-a719-0ac717fcefd8") device mount path "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570/globalmount"
The last log message from the Kubelet was: > Aug 22 14:16:37 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:37.603104 1163 operation_generator.go:623] MountVolume.MountDevice succeeded for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-mjtgp" (UID: "643b767a-c4e7-11e9-a719-0ac717fcefd8") device mount path "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570/globalmount" The following message should be something around the lines of: > MountVolume.SetUp succeeded for volume... Since it was shown in the output of journalctl, I believe the MountVolume.SetUp call was still running by the time the logs were taken. That's where the file ownership is set, so I believe this is a duplicate of bug #1745773. Can you confirm it's a duplicate so we can close this?
(In reply to Fabio Bertinatto from comment #2) > The last log message from the Kubelet was: > > > Aug 22 14:16:37 ip-10-0-149-116 hyperkube[1163]: I0822 14:16:37.603104 1163 operation_generator.go:623] MountVolume.MountDevice succeeded for volume "pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570" (UniqueName: "kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com^0001-0009-rook-ceph-0000000000000001-fde9dd57-c4e6-11e9-ad2b-0a580a83000d") pod "osio-worker-375766658-7f4cf9698c-mjtgp" (UID: "643b767a-c4e7-11e9-a719-0ac717fcefd8") device mount path "/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-fdd30dfe-c4e6-11e9-926c-02d05bd2f570/globalmount" > > The following message should be something around the lines of: > > > MountVolume.SetUp succeeded for volume... > > Since it was shown in the output of journalctl, I believe the > MountVolume.SetUp call was still running by the time the logs were taken. > That's where the file ownership is set, so I believe this is a duplicate of > bug #1745773. > > Can you confirm it's a duplicate so we can close this? I did not wait long enough to see if the pod never came up or not by the looks of it. It also seems related to the other bug. Closing this as duplicate and will reopen/rekindle the conversation if I see a stuck pod for much longer duration. Thanks for taking a look at this. *** This bug has been marked as a duplicate of bug 1745773 ***