Description of problem: Testing how long it takes to re-attach PVC ( with data ) to pods which are starting on different nodes in cluster. Test consist of following 1. Create Pod/PVC 2. Write some data ( in this test focus is on number of inodes ), but this is not deciding, so just create some data - 3. Delete pod 4. Create new pod and re-attach PVC to it 5. Repeat steps 3. and 4. multiple times and let pods to start on different nodes in cluster. if you want exact environment follow steps ----------------------------------------------------------------- $ git clone https://github.com/ShyamsundarR/ocs-monkey.git $ cd ./ocs-monkey $ git checkout attach-rate $ source setup-env.sh //edit conftest.py and old/test_attach_with_data.py - change SC name and number of files $ pytest -v -m attachwithdata // collect the data $ deactivate ------------------------------------------------------------------ Version-Release number of selected component (if applicable): oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.2.0-0.ci-2019-10-07-210343 True How reproducible: always Issue is not related to specific storage class / or file system. I tested gp2/ceph-rbd and xfs/ext4 Actual results: PVC re-attach is slow Expected results: PVC re-attach to be faster StorageClass Dump (if StorageClass used by PV/PVC): # oc describe pvc Name: mypvc Namespace: ns-443844244 StorageClass: gp2 Status: Bound Volume: pvc-972f0eea-ef4a-11e9-a2cf-0a6f6786a89c Labels: <none> Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs volume.kubernetes.io/selected-node: ip-10-0-168-71.us-west-2.compute.internal Finalizers: [kubernetes.io/pvc-protection] Capacity: 100Gi Access Modes: RWO VolumeMode: Filesystem Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal WaitForFirstConsumer 6m52s persistentvolume-controller waiting for first consumer to be created before binding Normal Provisioning 6m51s ebs.csi.aws.com_ip-10-0-138-54_02930534-ef48-11e9-a784-0674b8c8776c External provisioner is provisioning volume for claim "ns-443844244/mypvc" Normal ProvisioningSucceeded 6m51s persistentvolume-controller Successfully provisioned volume pvc-972f0eea-ef4a-11e9-a2cf-0a6f6786a89c using kubernetes.io/aws-ebs Mounted By: mypod-deployment-78b5ccff5-pfsjp ---- # oc describe pod Name: mypod-deployment-78b5ccff5-pfsjp Namespace: ns-443844244 Priority: 0 PriorityClassName: <none> Node: ip-10-0-168-71.us-west-2.compute.internal/10.0.168.71 Start Time: Tue, 15 Oct 2019 12:58:47 +0000 Labels: app=mypod pod-template-hash=78b5ccff5 Annotations: openshift.io/scc: restricted Status: Pending IP: Controlled By: ReplicaSet/mypod-deployment-78b5ccff5 Containers: busybox: Container ID: Image: busybox Image ID: Port: <none> Host Port: <none> Command: sleep 99999 State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /mnt from mypvc (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-x4bd7 (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: mypvc: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: mypvc ReadOnly: false default-token-x4bd7: Type: Secret (a volume populated by a Secret) SecretName: default-token-x4bd7 Optional: false QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 25s default-scheduler Successfully assigned ns-443844244/mypod-deployment-78b5ccff5-pfsjp to ip-10-0-168-71.us-west-2.compute.internal Warning FailedAttachVolume 25s attachdetach-controller Multi-Attach error for volume "pvc-972f0eea-ef4a-11e9-a2cf-0a6f6786a89c" Volume is already exclusively attached to one node and can't be attached to another Normal SuccessfulAttachVolume 6s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-972f0eea-ef4a-11e9-a2cf-0a6f6786a89c" --- # oc get sc gp2 -o yaml allowVolumeExpansion: true apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" creationTimestamp: "2019-10-09T09:35:33Z" name: gp2 ownerReferences: - apiVersion: v1 kind: clusteroperator name: storage uid: 2c90c6ee-ea77-11e9-b6cb-0611eeebf350 resourceVersion: "8385" selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2 uid: 2432529b-ea78-11e9-a365-02227df4d606 parameters: encrypted: "true" type: gp2 provisioner: kubernetes.io/aws-ebs reclaimPolicy: Delete volumeBindingMode: WaitForFirstConsumer root@ip-172-31-59-125: ~/srangana/ocs-monkey # oc get sc rook-ceph-block -o yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: creationTimestamp: "2019-10-11T16:45:48Z" name: rook-ceph-block resourceVersion: "1384630" selfLink: /apis/storage.k8s.io/v1/storageclasses/rook-ceph-block uid: 93f2b71b-ec46-11e9-b5c2-06d88a0a32a6 parameters: clusterID: rook-ceph csi.storage.k8s.io/fstype: xfs csi.storage.k8s.io/node-stage-secret-name: rook-ceph-csi csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph csi.storage.k8s.io/provisioner-secret-name: rook-ceph-csi csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph imageFeatures: layering imageFormat: "2" pool: replicapool provisioner: rook-ceph.rbd.csi.ceph.com reclaimPolicy: Delete volumeBindingMode: Immediate
Might be related to the Bug #1515907. Did you use fsGroup or seLinuxOptions in the pod spec?
(In reply to Tomas Smetana from comment #2) > Might be related to the Bug #1515907. Did you use fsGroup or seLinuxOptions > in the pod spec? yes - --- pod.yml --- apiVersion: v1 items: - apiVersion: v1 kind: Pod metadata: annotations: openshift.io/scc: restricted creationTimestamp: "2019-10-16T12:38:43Z" generateName: mypod-deployment-78b5ccff5- labels: app: mypod pod-template-hash: 78b5ccff5 name: mypod-deployment-78b5ccff5-f28kp namespace: ns-784595540 ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: mypod-deployment-78b5ccff5 uid: e373f086-f011-11e9-b5c2-06d88a0a32a6 resourceVersion: "4256278" selfLink: /api/v1/namespaces/ns-784595540/pods/mypod-deployment-78b5ccff5-f28kp uid: e37703f1-f011-11e9-b5c2-06d88a0a32a6 spec: containers: - command: - sleep - "99999" image: busybox imagePullPolicy: IfNotPresent name: busybox resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID runAsUser: 1002670000 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /mnt name: mypvc - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-js57h readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true imagePullSecrets: - name: default-dockercfg-5q9rg nodeName: ip-10-0-153-179.us-west-2.compute.internal priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1002670000 seLinuxOptions: level: s0:c52,c9 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 0 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - name: mypvc persistentVolumeClaim: claimName: mypvc - name: default-token-js57h secret: defaultMode: 420 secretName: default-token-js57h status: conditions: - lastProbeTime: null lastTransitionTime: "2019-10-16T12:38:43Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2019-10-16T12:38:43Z" message: 'containers with unready status: [busybox]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2019-10-16T12:38:43Z" message: 'containers with unready status: [busybox]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2019-10-16T12:38:43Z" status: "True" type: PodScheduled containerStatuses: - image: busybox imageID: "" lastState: {} name: busybox ready: false restartCount: 0 state: waiting: reason: ContainerCreating hostIP: 10.0.153.179 phase: Pending qosClass: BestEffort startTime: "2019-10-16T12:38:43Z" kind: List metadata: resourceVersion: "" selfLink: ""
Thanks Elvir. This is unfortunatelly still unresovled issue. We know about it and it's being worked on in the upstream. I'll close this bug as duplicate of the old one. *** This bug has been marked as a duplicate of bug 1515907 ***