Bug 2188000
| Summary: | Missing osds and mon on provider - 0/9 nodes are available | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Filip Balák <fbalak> |
| Component: | odf-managed-service | Assignee: | Ohad <omitrani> |
| Status: | CLOSED NOTABUG | QA Contact: | Neha Berry <nberry> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.12 | CC: | ocs-bugs, odf-bz-bot |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-04-19 14:11:37 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
$ oc get pods -n openshift-storage NAME READY STATUS RESTARTS AGE 875729c50f952b4913290e018f605305b1f88af3654c1c625da1364b39zbxts 0/1 Completed 0 123m managed-fusion-offering-catalog-vsxhd 1/1 Running 0 126m ocs-metrics-exporter-695dc5d6dc-bznk6 1/1 Running 0 123m ocs-operator-59cd8cd764-tfdt2 1/1 Running 0 123m ocs-provider-server-7dcdbf87fc-lgrwq 1/1 Running 0 122m rook-ceph-crashcollector-86278db17f8b36c54319352a92416617-zf6gt 1/1 Running 0 112m rook-ceph-crashcollector-93d940652afee5da0612a8bdb72a3bd4-qzq87 1/1 Running 0 112m rook-ceph-mgr-a-86d6d7d46b-tskvs 2/2 Running 0 112m rook-ceph-mon-a-54f64d7b95-4zmhs 2/2 Running 0 120m rook-ceph-mon-b-5c9966b6dc-mpz7b 2/2 Running 0 120m rook-ceph-mon-c-76f95dd57c-hs4tv 0/2 Pending 0 118m rook-ceph-operator-66fd6f59f5-xjj49 1/1 Running 0 122m rook-ceph-osd-0-84945579cc-8vsgh 2/2 Running 0 112m rook-ceph-osd-1-58bd6b9494-94nwn 2/2 Running 0 112m rook-ceph-osd-prepare-default-0-data-06kthp-jvkqk 0/1 Completed 0 112m rook-ceph-osd-prepare-default-1-data-0clt8x-ltl4v 0/1 Completed 0 112m rook-ceph-osd-prepare-default-2-data-0m96dr-sgwfx 0/1 Pending 0 112m rook-ceph-tools-78d8f5799-l4zx6 1/1 Running 0 123m $ oc get pods -n openshift-storage -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES 875729c50f952b4913290e018f605305b1f88af3654c1c625da1364b39zbxts 0/1 Completed 0 126m 10.129.2.93 ip-10-0-17-184.us-east-2.compute.internal <none> <none> managed-fusion-offering-catalog-vsxhd 1/1 Running 0 129m 10.128.2.64 ip-10-0-14-242.us-east-2.compute.internal <none> <none> ocs-metrics-exporter-695dc5d6dc-bznk6 1/1 Running 0 126m 10.129.2.109 ip-10-0-17-184.us-east-2.compute.internal <none> <none> ocs-operator-59cd8cd764-tfdt2 1/1 Running 0 126m 10.129.2.107 ip-10-0-17-184.us-east-2.compute.internal <none> <none> ocs-provider-server-7dcdbf87fc-lgrwq 1/1 Running 0 125m 10.129.2.115 ip-10-0-17-184.us-east-2.compute.internal <none> <none> rook-ceph-crashcollector-86278db17f8b36c54319352a92416617-zf6gt 1/1 Running 0 115m 10.0.14.242 ip-10-0-14-242.us-east-2.compute.internal <none> <none> rook-ceph-crashcollector-93d940652afee5da0612a8bdb72a3bd4-qzq87 1/1 Running 0 115m 10.0.17.184 ip-10-0-17-184.us-east-2.compute.internal <none> <none> rook-ceph-mgr-a-86d6d7d46b-tskvs 2/2 Running 0 115m 10.0.17.184 ip-10-0-17-184.us-east-2.compute.internal <none> <none> rook-ceph-mon-a-54f64d7b95-4zmhs 2/2 Running 0 123m 10.0.17.184 ip-10-0-17-184.us-east-2.compute.internal <none> <none> rook-ceph-mon-b-5c9966b6dc-mpz7b 2/2 Running 0 123m 10.0.14.242 ip-10-0-14-242.us-east-2.compute.internal <none> <none> rook-ceph-mon-c-76f95dd57c-hs4tv 0/2 Pending 0 120m <none> <none> <none> <none> rook-ceph-operator-66fd6f59f5-xjj49 1/1 Running 0 125m 10.129.2.116 ip-10-0-17-184.us-east-2.compute.internal <none> <none> rook-ceph-osd-0-84945579cc-8vsgh 2/2 Running 0 115m 10.0.14.242 ip-10-0-14-242.us-east-2.compute.internal <none> <none> rook-ceph-osd-1-58bd6b9494-94nwn 2/2 Running 0 115m 10.0.17.184 ip-10-0-17-184.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-default-0-data-06kthp-jvkqk 0/1 Completed 0 115m 10.0.14.242 ip-10-0-14-242.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-default-1-data-0clt8x-ltl4v 0/1 Completed 0 115m 10.0.17.184 ip-10-0-17-184.us-east-2.compute.internal <none> <none> rook-ceph-osd-prepare-default-2-data-0m96dr-sgwfx 0/1 Pending 0 115m <none> <none> <none> <none> rook-ceph-tools-78d8f5799-l4zx6 1/1 Running 0 125m 10.129.2.114 ip-10-0-17-184.us-east-2.compute.internal <none> <none> $ oc get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-13-203.us-east-2.compute.internal Ready control-plane,master 156m v1.25.7+eab9cc9 10.0.13.203 <none> Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) 4.18.0-372.49.1.el8_6.x86_64 cri-o://1.25.2-10.rhaos4.12.git0a083f9.el8 ip-10-0-14-22.us-east-2.compute.internal Ready infra,worker 134m v1.25.7+eab9cc9 10.0.14.22 <none> Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) 4.18.0-372.49.1.el8_6.x86_64 cri-o://1.25.2-10.rhaos4.12.git0a083f9.el8 ip-10-0-14-242.us-east-2.compute.internal Ready worker 148m v1.25.7+eab9cc9 10.0.14.242 <none> Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) 4.18.0-372.49.1.el8_6.x86_64 cri-o://1.25.2-10.rhaos4.12.git0a083f9.el8 ip-10-0-16-158.us-east-2.compute.internal Ready control-plane,master 156m v1.25.7+eab9cc9 10.0.16.158 <none> Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) 4.18.0-372.49.1.el8_6.x86_64 cri-o://1.25.2-10.rhaos4.12.git0a083f9.el8 ip-10-0-17-184.us-east-2.compute.internal Ready worker 145m v1.25.7+eab9cc9 10.0.17.184 <none> Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) 4.18.0-372.49.1.el8_6.x86_64 cri-o://1.25.2-10.rhaos4.12.git0a083f9.el8 ip-10-0-19-133.us-east-2.compute.internal Ready infra,worker 135m v1.25.7+eab9cc9 10.0.19.133 <none> Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) 4.18.0-372.49.1.el8_6.x86_64 cri-o://1.25.2-10.rhaos4.12.git0a083f9.el8 ip-10-0-23-214.us-east-2.compute.internal Ready infra,worker 135m v1.25.7+eab9cc9 10.0.23.214 <none> Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) 4.18.0-372.49.1.el8_6.x86_64 cri-o://1.25.2-10.rhaos4.12.git0a083f9.el8 ip-10-0-23-239.us-east-2.compute.internal Ready,SchedulingDisabled worker 148m v1.25.7+eab9cc9 10.0.23.239 <none> Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) 4.18.0-372.49.1.el8_6.x86_64 cri-o://1.25.2-10.rhaos4.12.git0a083f9.el8 ip-10-0-23-91.us-east-2.compute.internal Ready control-plane,master 156m v1.25.7+eab9cc9 10.0.23.91 <none> Red Hat Enterprise Linux CoreOS 412.86.202303241612-0 (Ootpa) 4.18.0-372.49.1.el8_6.x86_64 cri-o://1.25.2-10.rhaos4.12.git0a083f9.el8 This looks like an infrastructure problem where one of the nodes is degraded. --> CLOSED It will be reopened when reproduced again. $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-abbd35f0db6c3dd53c35d134083b08a2 True False False 3 3 3 0 159m worker rendered-worker-bd7605d18e1361d6e608cdd48564dac1 False True True 6 4 4 1 159m |
Description of problem: Installation according to Fusion aaS guide [1] fails and some pods and monitors are not deployed due to: $ oc describe pod rook-ceph-mon-c-76f95dd57c-hs4tv -n openshift-storage Name: rook-ceph-mon-c-76f95dd57c-hs4tv Namespace: openshift-storage Priority: 2000001000 Priority Class Name: system-node-critical Node: <none> Labels: app=rook-ceph-mon app.kubernetes.io/component=cephclusters.ceph.rook.io app.kubernetes.io/created-by=rook-ceph-operator app.kubernetes.io/instance=c app.kubernetes.io/managed-by=rook-ceph-operator app.kubernetes.io/name=ceph-mon app.kubernetes.io/part-of=ocs-storagecluster-cephcluster ceph_daemon_id=c ceph_daemon_type=mon mon=c mon_cluster=openshift-storage pod-template-hash=76f95dd57c pvc_name=rook-ceph-mon-c pvc_size=50Gi rook.io/operator-namespace=openshift-storage rook_cluster=openshift-storage Annotations: openshift.io/scc: rook-ceph Status: Pending (...) Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 51m default-scheduler 0/9 nodes are available: 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/9 nodes are available: 9 Preemption is not helpful for scheduling. Warning FailedScheduling 31m (x3 over 46m) default-scheduler 0/9 nodes are available: 1 node(s) were unschedulable, 2 node(s) didn't match Pod's node affinity/selector, 3 node(s) had untolerated taint {node-role.kubernetes.io/infra: }, 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }. preemption: 0/9 nodes are available: 9 Preemption is not helpful for scheduling. Version-Release number of selected component (if applicable): ROSA 4.12.12 quay.io/resoni/managed-fusion-agent-index:4.13.0-164 How reproducible: 1/1 Steps to Reproduce: 1. Deploy ODF on Fusion according to the guide [1] 2. Check storagecluster and cluster resources Actual results: Deployment of all ODF resources is blocked by unavailable resources. Expected results: All ODF resources are deployed successfully. Additional info: [1] https://docs.google.com/document/d/1Jdx8czlMjbumvilw8nZ6LtvWOMAx3H4TfwoVwiBs0nE/edit#