Description of problem ====================== When I try to install OCP/OCS 4.2 on vmware cluster of OCS QE team, the installation of OCS fails because one PVC of rook-ceph-mon can't reach Bound state (so that no PV is provisioned), which seems to be caused by name conflict created by kubernetes.io/vsphere-volume storage provisioner. Version-Release number of selected component ============================================ cluster channel: stable-4.2 cluster version: 4.2.18 cluster image: quay.io/openshift-release-dev/ocp-release@sha256:283a1625e18e0b6d7f354b1b022a0aeaab5598f2144ec484faf89e1ecb5c7498 storage namespace openshift-cluster-storage-operator image quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9dd509754b883dab8301bc1cd50d1b902de531d012817b27bfdda2cd28c782a * quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:d9dd509754b883dab8301bc1cd50d1b902de531d012817b27bfdda2cd28c782a storage namespace openshift-storage image quay.io/rhceph-dev/cephcsi:4.2-259.72ac53b6.release_4.2 * quay.io/rhceph-dev/cephcsi@sha256:280b39004dc8983eba810f6707b15da229efd62af512839a4f14efc983df2110 image registry.redhat.io/openshift4/ose-csi-driver-registrar:4.2 * registry.redhat.io/openshift4/ose-csi-driver-registrar@sha256:873565b29b5bda21326cc6bf58bfe4bdf59c4b1da2e5e813a300c8e4b7e7ab81 image registry.redhat.io/openshift4/ose-csi-external-attacher:4.2 * registry.redhat.io/openshift4/ose-csi-external-attacher@sha256:aea27da675bdfbd7cb460030912c8386556e5f475eeba21878ac63337f97dff8 image registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7:4.2 * registry.redhat.io/openshift4/ose-csi-external-provisioner-rhel7@sha256:ae616978273612f4721bd6de05b8bcae7da3b3faae7847e0c6ca75e1c6861e39 image quay.io/rhceph-dev/mcg-operator:2.0.10-63.e905e55.2.0 * quay.io/rhceph-dev/mcg-operator@sha256:9b2df211528394938e4e4df648e746f54bf377962968d364febcfcec432074a0 image quay.io/rhceph-dev/ocs-operator:4.2-286.88017e9.release_4.2 * quay.io/rhceph-dev/ocs-operator@sha256:54202499a2a7e87f5fa6afa28fe621fdb53b6e076c8417eda21c60816004c1be image quay.io/rhceph-dev/rook-ceph:4.2-292.a26dc84c.ocs_4.2 * quay.io/rhceph-dev/rook-ceph@sha256:9addc80eb620d5d48ab4a51d3525ea6149a19a9719be71d4e3263eca2e1ceb13 How reproducible ================ 1/1 Steps to Reproduce ================== 1. Install OCP/OCS on vmware cluster (I did this via ocs ci jenkins deploy cluster job of ocs qe team) 2. Check logs and status of OCS via OCP Console Actual results ============== OCP Console reports (via Administrator view, Home -> Dashboard): > Cluster Health: ocs-storagecluster-cephcluster is not available Listing following alerts: > Deployment openshift-storage/ocs-operator has not matched the expected number of replicas for longer than 15 minutes. (see screenshot #1) Digging deeper, I see that one ceph monitor failed to start: ``` $ oc describe pod/rook-ceph-mon-c-canary-58fb966bdd-nh299 -n openshift-storage > rook-ceph-mon-c-canary-58fb966bdd-nh299 $ grep ^Status rook-ceph-mon-c-canary-58fb966bdd-nh299 Status: Pending $ tail -4 rook-ceph-mon-c-canary-58fb966bdd-nh299 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 22s (x11 over 13m) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 3 times) ``` Checking what's wrong with the PVC reveals a problem on vsphere storage provisioner side: ``` $ oc describe pvc/rook-ceph-mon-c -n openshift-storage > pvc.rook-ceph-mon-c $ grep ^Status pvc.rook-ceph-mon-c Status: Pending $ tail -5 pvc.rook-ceph-mon-c Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 35s (x29 over 49m) persistentvolume-controller Failed to provision volume with StorageClass "thin": Cannot complete the operation because the file or folder [DS-1] kubevols/mbukatov-dc2-clus-wdst7-dynamic-pvc-ec53589e-54c9-11ea-9272-005056be10c1.vmdk already exists Mounted By: rook-ceph-mon-c-canary-58fb966bdd-nh299 ``` When I look into the vsphere what is going on in DS-1 datacenter, I see that there are 3 virtual disks, one for each PVC, but the one k8s persistentvolume controller complains about has a zero size. (see screenshot #2) Expected results ================ Vshphere storage provisioner doesn't hit any virtual disk name conflict, OCS installation finishes without any problem. Additional info =============== It's safe to rule out a possibility that the Virtual Disk of the same name as persistentvolume-controller wanted to use for ceph-mon-c was already present before the installation: - I was checking the DS-1 a day before because of a different issue, and I'm quite sure there were no mbukatov-dc2* disks left. - The PVC name is afaik chosen randomly. It's possible that this is related to (or even a root cause of) BZ 1802186, but the fact that in this case OCS cluster haven't been able to recover after few hours after installation (as it was the case reported in BZ 1802186) suggests that these are 2 different problems.
Created attachment 1664842 [details] Output of oc describe pod/rook-ceph-mon-c-canary-58fb966bdd-nh299 -n openshift-storage
Created attachment 1664843 [details] Output of oc describe pvc/rook-ceph-mon-c -n openshift-storage
Created attachment 1664855 [details] Screenshot #1: Overview dashboard of OCP Console, with "ocs-storagecluster-cephcluster is not available" error.
Created attachment 1664856 [details] Screenshot #2: vcenter web ui with file list of kubevols directory in DS-1datacenter (with virtual disk for affected PVC)
Additional info =============== Master Log ---------- In the must gather data, there are: - masters/crio_service.log - masters/kubelet_service.log Node Log (of failed PODs) ------------------------- Not applicable, ceph pod was not started because of failed PVC provisioning. PV Dump ------- Not applicable, PV was not provisioned: ``` $ oc get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-d6fe39f5-54c9-11ea-9272-005056be10c1 10Gi RWO Delete Bound openshift-storage/rook-ceph-mon-a thin 5h4m pvc-e63d5b82-54c9-11ea-9272-005056be10c1 10Gi RWO Delete Bound openshift-storage/rook-ceph-mon-b thin 5h4m ``` PVC Dump -------- ``` $ oc get pvc -n openshift-storage NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE rook-ceph-mon-a Bound pvc-d6fe39f5-54c9-11ea-9272-005056be10c1 10Gi RWO thin 5h2m rook-ceph-mon-b Bound pvc-e63d5b82-54c9-11ea-9272-005056be10c1 10Gi RWO thin 5h2m rook-ceph-mon-c Pending thin 5h1m ``` ``` $ oc get pvc -n openshift-storage -o yaml apiVersion: v1 items: - apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/vsphere-volume creationTimestamp: "2020-02-21T16:47:26Z" finalizers: - kubernetes.io/pvc-protection labels: app: rook-ceph-mon ceph-version: 14.2.4 ceph_daemon_id: a mon: a mon_cluster: openshift-storage rook-version: 4.2-292.a26dc84c.ocs_4.2 rook_cluster: openshift-storage name: rook-ceph-mon-a namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 7533e883-54c9-11ea-831c-005056be67e7 resourceVersion: "27190" selfLink: /api/v1/namespaces/openshift-storage/persistentvolumeclaims/rook-ceph-mon-a uid: d6fe39f5-54c9-11ea-9272-005056be10c1 spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: thin volumeMode: Filesystem volumeName: pvc-d6fe39f5-54c9-11ea-9272-005056be10c1 status: accessModes: - ReadWriteOnce capacity: storage: 10Gi phase: Bound - apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: pv.kubernetes.io/bind-completed: "yes" pv.kubernetes.io/bound-by-controller: "yes" volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/vsphere-volume creationTimestamp: "2020-02-21T16:47:51Z" finalizers: - kubernetes.io/pvc-protection labels: app: rook-ceph-mon ceph-version: 14.2.4 ceph_daemon_id: b mon: b mon_cluster: openshift-storage rook-version: 4.2-292.a26dc84c.ocs_4.2 rook_cluster: openshift-storage name: rook-ceph-mon-b namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 7533e883-54c9-11ea-831c-005056be67e7 resourceVersion: "27287" selfLink: /api/v1/namespaces/openshift-storage/persistentvolumeclaims/rook-ceph-mon-b uid: e63d5b82-54c9-11ea-9272-005056be10c1 spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: thin volumeMode: Filesystem volumeName: pvc-e63d5b82-54c9-11ea-9272-005056be10c1 status: accessModes: - ReadWriteOnce capacity: storage: 10Gi phase: Bound - apiVersion: v1 kind: PersistentVolumeClaim metadata: annotations: volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/vsphere-volume creationTimestamp: "2020-02-21T16:48:01Z" finalizers: - kubernetes.io/pvc-protection labels: app: rook-ceph-mon ceph-version: 14.2.4 ceph_daemon_id: c mon: c mon_cluster: openshift-storage rook-version: 4.2-292.a26dc84c.ocs_4.2 rook_cluster: openshift-storage name: rook-ceph-mon-c namespace: openshift-storage ownerReferences: - apiVersion: ceph.rook.io/v1 blockOwnerDeletion: true kind: CephCluster name: ocs-storagecluster-cephcluster uid: 7533e883-54c9-11ea-831c-005056be67e7 resourceVersion: "27305" selfLink: /api/v1/namespaces/openshift-storage/persistentvolumeclaims/rook-ceph-mon-c uid: ec53589e-54c9-11ea-9272-005056be10c1 spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: thin volumeMode: Filesystem status: phase: Pending kind: List metadata: resourceVersion: "" selfLink: "" ``` StorageClass Dump (if StorageClass used by PV/PVC) -------------------------------------------------- ``` $ oc get sc/thin -o yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.kubernetes.io/is-default-class: "true" creationTimestamp: "2020-02-21T16:11:58Z" name: thin ownerReferences: - apiVersion: v1 kind: clusteroperator name: storage uid: 7451b6dd-54c1-11ea-8c75-005056bec7d8 resourceVersion: "12214" selfLink: /apis/storage.k8s.io/v1/storageclasses/thin uid: e31b69ac-54c4-11ea-8cb8-005056be67e7 parameters: diskformat: thin provisioner: kubernetes.io/vsphere-volume reclaimPolicy: Delete volumeBindingMode: Immediate ```
I retried to deploy OCP/OCS 4.2 cluster in the same way today, and I haven't hit the issue here, updating How reproducible stats to 1/2. That said, openshift-storage.cephfs.csi.ceph.com storage provisioner failed in a similar way instead (even though that the storage codebase is imho very different in each case). I'm going to report OCS BZ for that tomorrow and will link it here.
(In reply to Martin Bukatovic from comment #7) > That said, openshift-storage.cephfs.csi.ceph.com storage provisioner failed > in a similar way instead (even though that the storage codebase is imho very > different in each case). I'm going to report OCS BZ for that tomorrow and > will link it here. The problem was caused by FailedScheduling because of Insufficient cpu and memory on worker nodes. This is not a bug.
I see you limited number of CPUs for worker nodes in your used config here: http://git.engineering.redhat.com/git/users/mbukatov/ocs-ci.git/tree/conf/deployment/vsphere/upi_1az_rhcos_3m_3w.mbukatov.yaml?h=vmware_deployment_config Your cluster has 16 cores - vmware is able to share all those cores to all VMs I think. Have you tried to run with whole 16 vCPU for workers? FYI: I've just saw some discussion that 12 vCPU should be enough for vmware: https://bugzilla.redhat.com/show_bug.cgi?id=1806921 mail thread with subject: OCS 4.2 on VMware constrained by hardware - can we reduce vCPU? in sme-storage mailing list. Petr
(In reply to Petr Balogh from comment #9) > I see you limited number of CPUs for worker nodes in your used config here: > http://git.engineering.redhat.com/git/users/mbukatov/ocs-ci.git/tree/conf/ > deployment/vsphere/upi_1az_rhcos_3m_3w.mbukatov. > yaml?h=vmware_deployment_config > > Your cluster has 16 cores - vmware is able to share all those cores to all > VMs I think. Have you tried to run with whole 16 vCPU for workers? I'm using DC2 C1 cluster, which has the following resources available for it's virtual machines: - 128 GB RAM - 32 VCPUs - VMFS datastore, 7.9 TB Since the default cluster config was allocating just 4 cpus per node, I was trying to increase the allocation. I didn't try to overcommit hw allocations for the virtual machines, as the performance I see is already terrible. That said, this is totally unrelated to this bug.
The first error in the logs is this one: quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-485739555a8a5560b70d4b7be3a5a4b234d7dfe56f295c6ca8a257707a911a6b/namespaces/openshift-etcd/pods/etcd-member-control-plane-2/etcd-member/etcd-member/logs/current.log:2020-02-21T16:48:23.441557968Z 2020-02-21 16:48:23.441521 W | etcdserver: read-only range request "key:\"/kubernetes.io/persistentvolumes/pvc-ec53589e-54c9-11ea-9272-005056be10c1\" " with result "range_response_count:0 size:6" took too long (272.492327ms) to execute So I wonder if this can't be a race caused by high latency on etcd (the logs are full of it). Does this occur every time or is this error only intermittent?
(In reply to Tomas Smetana from comment #11) > So I wonder if this can't be a race caused by high latency on etcd (the logs > are full of it). Does this occur every time or is this error only > intermittent? I agree that this looks like a race condition. Since the moment I reported this bug, I tried to install OCP/OCS 4.2 additional 3 times, and I haven't seen this problem again (that said, in one case, the installation failed on something else), so the total "How reproducible?" stats is about 1/4.
After reviewing the logs from the must-gather it looks like the server got simply congested and etcd stopped being reliable -- there's too many of these and similar errors as in the comment #11: 2020-02-21T16:35:08.696568454Z 2020-02-21 16:35:08.696558 W | etcdserver: failed to send out heartbeat on time (exceeded the 100ms timeout for 1.167541657s) 2020-02-21T16:35:08.696600807Z 2020-02-21 16:35:08.696584 W | etcdserver: server is likely overloaded Could you please attach etcd pod logs (or must-gather with those) from a cluster where things work normally? I'd like to see how frequently does etcd complain there (if at all).
There's two disk managers in the VSphere cloud provider. One of them actually checks the "the file or folder ... already exists" error and ignores it: https://github.com/openshift/origin/blob/release-4.2/vendor/k8s.io/kubernetes/pkg/cloudprovider/providers/vsphere/vclib/diskmanagers/vmdm.go#L136 However the virtualDiskManager based disk manager treats this error as failure. Looks like an omission to me.
Upstream PR: https://github.com/kubernetes/kubernetes/pull/90638
Now vsphere storage provision failed on 4.6.
Verified on 4.6.0-0.nightly-2020-08-31-220837 Launch 5 scripts at the same time, each script will create 30 pvc, checked 150 pvc are all Bound. Test 10 time and all pvc Bound. So change the status to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196