Bug 2156988
| Summary: | Managed Service cluster with size 20 can not be installed | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Jilju Joy <jijoy> |
| Component: | odf-managed-service | Assignee: | Leela Venkaiah Gangavarapu <lgangava> |
| Status: | CLOSED DUPLICATE | QA Contact: | Neha Berry <nberry> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.10 | CC: | aeyal, lgangava, ocs-bugs, odf-bz-bot |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-01-02 11:59:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** This bug has been marked as a duplicate of bug 2131237 *** |
Description of problem: Managed Services provider cluster with QE addon(v2.0.11) that contains changes in topology related to ODFMS-55 can not finish ODF addon installation and is stuck in Installing state. Size parameter was set to 20. $ rosa list addons -c jijoy-size20-pr | grep ocs-provider-qe ocs-provider-qe Red Hat OpenShift Data Foundation Managed Service Provider (QE) installing Some pods are in pending state due to unavailable resources. $ oc get pods | egrep -v '(Running|Completed)' NAME READY STATUS RESTARTS AGE alertmanager-managed-ocs-alertmanager-0 0/2 Pending 0 110m ocs-metrics-exporter-5dd96c885b-lf46k 0/1 Pending 0 110m rook-ceph-crashcollector-ip-10-0-142-35.ec2.internal-7d947nsp5c 0/1 Pending 0 110m rook-ceph-crashcollector-ip-10-0-154-93.ec2.internal-56688gknf4 0/1 Pending 0 111m rook-ceph-crashcollector-ip-10-0-160-72.ec2.internal-7f454ffxjm 0/1 Pending 0 109m rook-ceph-osd-14-66c75f68dc-sxs7z 0/2 Pending 0 111m rook-ceph-osd-6-55d85cccf8-kddcx 0/2 Pending 0 111m rook-ceph-osd-7-74d599f4b6-pcx7j 0/2 Pending 0 111m rook-ceph-osd-8-78bd587c97-mgz54 0/2 Pending 0 111m rook-ceph-osd-9-5b68fc68b4-2p99f 0/2 Pending 0 111m One node is in "SchedulingDisabled" state. $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-128-44.ec2.internal Ready worker 120m v1.23.12+8a6bfe4 ip-10-0-132-177.ec2.internal Ready worker 120m v1.23.12+8a6bfe4 ip-10-0-133-84.ec2.internal Ready infra,worker 121m v1.23.12+8a6bfe4 ip-10-0-136-188.ec2.internal Ready master 140m v1.23.12+8a6bfe4 ip-10-0-142-35.ec2.internal Ready worker 132m v1.23.12+8a6bfe4 ip-10-0-143-114.ec2.internal Ready worker 120m v1.23.12+8a6bfe4 ip-10-0-147-121.ec2.internal Ready worker 120m v1.23.12+8a6bfe4 ip-10-0-151-231.ec2.internal Ready master 140m v1.23.12+8a6bfe4 ip-10-0-153-87.ec2.internal Ready worker 120m v1.23.12+8a6bfe4 ip-10-0-154-93.ec2.internal Ready worker 131m v1.23.12+8a6bfe4 ip-10-0-155-208.ec2.internal Ready worker 120m v1.23.12+8a6bfe4 ip-10-0-157-56.ec2.internal Ready infra,worker 121m v1.23.12+8a6bfe4 ip-10-0-160-174.ec2.internal Ready infra,worker 121m v1.23.12+8a6bfe4 ip-10-0-160-72.ec2.internal Ready,SchedulingDisabled worker 120m v1.23.12+8a6bfe4 ip-10-0-161-135.ec2.internal Ready worker 134m v1.23.12+8a6bfe4 ip-10-0-162-68.ec2.internal Ready master 140m v1.23.12+8a6bfe4 ip-10-0-164-82.ec2.internal Ready worker 120m v1.23.12+8a6bfe4 ip-10-0-168-82.ec2.internal Ready worker 120m v1.23.12+8a6bfe4 Events from one of the pods (rook-ceph-osd-14-66c75f68dc-sxs7z): Warning FailedScheduling 25m (x112 over 114m) default-scheduler 0/18 nodes are available: 1 Insufficient memory, 1 node(s) were unschedulable, 2 Insufficient cpu, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 9 node(s) didn't match Pod's node affinity/selector. $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.10.9 NooBaa Operator 4.10.9 mcg-operator.v4.10.8 Succeeded observability-operator.v0.0.17 Observability Operator 0.0.17 observability-operator.v0.0.17-rc Succeeded ocs-operator.v4.10.9 OpenShift Container Storage 4.10.9 ocs-operator.v4.10.8 Installing ocs-osd-deployer.v2.0.11 OCS OSD Deployer 2.0.11 ocs-osd-deployer.v2.0.10 Installing odf-csi-addons-operator.v4.10.9 CSI Addons 4.10.9 odf-csi-addons-operator.v4.10.8 Succeeded odf-operator.v4.10.9 OpenShift Data Foundation 4.10.9 odf-operator.v4.10.8 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.451-3df1ed1 Route Monitor Operator 0.1.451-3df1ed1 route-monitor-operator.v0.1.450-6e98c37 Succeeded [Note: ocs-operator.v4.10.9 showed Failed state also. ocs-osd-deployer.v2.0.11 showed Pending state also.] managedocs status: status: components: alertmanager: state: Pending prometheus: state: Ready storageCluster: state: Ready OCS and OCP must-gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-size20-pr/jijoy-size20-pr_20221229T154719/logs/failed_testcase_ocs_logs_1672329486/deployment_ocs_logs/ =========================================================================== Version-Release number of selected component (if applicable): ocs-osd-deployer.v2.0.11 odf-operator.v4.10.9 =========================================================================== How reproducible: 2/2 =========================================================================== Steps to Reproduce: 1. Deploy MS provider cluster with QE addon (v2.0.11) Actual results: Installation is not completing. Some pods are in Pending state due to unavailable resources. One node in "SchedulingDisabled" state. Expected results: Installation should be successful. Additional info: