Description of problem: Running https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-24078 on a bare metal deployment with IPv6 control plane. After creating toleration.yaml there are many pods left in not Running state: [kni@provisionhost-0 ~]$ oc get pods -A | grep -v Running | grep -v Complete NAMESPACE NAME READY STATUS RESTARTS AGE openshift-authentication-operator authentication-operator-b4d85bfd8-ncsfw 0/1 ContainerCreating 0 71m openshift-authentication oauth-openshift-bcb7f86db-tx9j4 0/1 ContainerCreating 0 71m openshift-authentication oauth-openshift-bcb7f86db-x9s77 0/1 ContainerCreating 0 71m openshift-cluster-node-tuning-operator cluster-node-tuning-operator-5d48686554-z26mp 0/1 ContainerCreating 0 71m openshift-cluster-samples-operator cluster-samples-operator-55b6755466-q6xrh 0/2 ContainerCreating 0 71m openshift-console console-6d7c9f64f6-rhmdw 0/1 ContainerCreating 0 71m openshift-console downloads-664fc66646-cft8c 0/1 ContainerCreating 0 71m openshift-image-registry cluster-image-registry-operator-6845546d69-vw97g 0/2 ContainerCreating 0 71m openshift-ingress-operator ingress-operator-d7fbcfd57-qk56n 0/2 ContainerCreating 0 71m openshift-ingress router-default-6c95df6b4d-b5gvp 0/1 Pending 0 71m openshift-ingress router-default-6c95df6b4d-j5755 0/1 Pending 0 71m openshift-machine-api machine-api-operator-7f8cf8f4cb-jph46 0/2 ContainerCreating 0 71m openshift-machine-config-operator etcd-quorum-guard-f66bdbcf5-gx9p4 0/1 Pending 0 71m openshift-machine-config-operator etcd-quorum-guard-f66bdbcf5-rr84j 0/1 Pending 0 71m openshift-machine-config-operator machine-config-controller-77d75cd78f-zj948 0/1 ContainerCreating 0 71m openshift-marketplace marketplace-operator-8656745c5b-xqj74 0/1 ContainerCreating 0 71m openshift-monitoring alertmanager-main-1 0/3 ContainerCreating 0 70m openshift-monitoring alertmanager-main-2 0/3 ContainerCreating 0 70m openshift-monitoring grafana-654988bdbb-7ql4t 0/2 ContainerCreating 0 65m openshift-monitoring grafana-795c64fd8d-jljbz 0/2 ContainerCreating 0 71m openshift-monitoring kube-state-metrics-64b5c49b85-c6t95 0/3 ContainerCreating 0 65m openshift-monitoring openshift-state-metrics-5b45b55d4f-xl9df 0/3 ContainerCreating 0 71m openshift-monitoring openshift-state-metrics-6d987dbcf7-jhm6r 0/3 ContainerCreating 0 65m openshift-monitoring prometheus-adapter-74854f85d4-pv2bh 0/1 ContainerCreating 0 65m openshift-monitoring prometheus-adapter-77bdd66c6b-r5xqd 0/1 ContainerCreating 0 71m openshift-monitoring prometheus-adapter-77bdd66c6b-wxnkf 0/1 ContainerCreating 0 71m openshift-monitoring prometheus-k8s-0 0/7 ContainerCreating 0 70m openshift-monitoring prometheus-operator-5859b9c4cf-w5q58 0/1 ContainerCreating 0 71m openshift-monitoring prometheus-operator-64fb7c8f9b-2qk8v 0/1 ContainerCreating 0 65m openshift-monitoring thanos-querier-5fd69767b5-dwpdt 0/4 ContainerCreating 0 71m openshift-must-gather-8hq7j must-gather-k626f 0/1 Init:0/1 0 15m openshift-operator-lifecycle-manager packageserver-66f75d869f-qdddm 0/1 ContainerCreating 0 5m52s openshift-operator-lifecycle-manager packageserver-98859c6bc-nznfr 0/1 ContainerCreating 0 51s openshift-service-ca-operator service-ca-operator-84c855cddd-v594p 0/1 ContainerCreating 0 71m openshift-service-ca apiservice-cabundle-injector-578dc5b9fd-w9rkc 0/1 ContainerCreating 0 71m openshift-service-ca configmap-cabundle-injector-846d56484b-2jqvp 0/1 ContainerCreating 0 71m openshift-service-catalog-apiserver-operator openshift-service-catalog-apiserver-operator-d5c487cc-pbz8j 0/1 ContainerCreating 0 71m openshift-service-catalog-controller-manager-operator openshift-service-catalog-controller-manager-operator-785f6bbn6 0/1 ContainerCreating 0 71m Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2020-03-09-172027 How reproducible: 100% Steps to Reproduce: 1. Deploy bare metal IPI with IPv6 control plane 3 x masters + 2 x workers 2. Run steps described in https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-24078 Actual results: pods stuck in ContainerCreating or pending state Expected results: pods are in Running state Additional info: At this point must-gather gets stuck as described in BZ#1809614 Please let me know if there's any info I can pull manually from the cluster nodes.
Note that pods actually get into ContainerCreating state after: for i in $(kubectl get node --no-headers | grep -v master-0.ocp-edge-cluster.qe.lab.redhat.com | awk '{print $1}'); do echo $i; kubectl taint nodes $i monitoring=true:NoExecute; done
(In reply to Marius Cornea from comment #1) > Note that pods actually get into ContainerCreating state after: > > for i in $(kubectl get node --no-headers | grep -v > master-0.ocp-edge-cluster.qe.lab.redhat.com | awk '{print $1}'); do echo $i; > kubectl taint nodes $i monitoring=true:NoExecute; done This is expected, since you did not add tolerations to monitoring pods, see the step 4 from the case
Seems to me like the issue is with what Junqi said in https://bugzilla.redhat.com/show_bug.cgi?id=1812219#c2 @Marius please confirm that tolerations were added properly
(In reply to Pawel Krupa from comment #6) > Seems to me like the issue is with what Junqi said in > https://bugzilla.redhat.com/show_bug.cgi?id=1812219#c2 > > @Marius please confirm that tolerations were added properly Closing this as not a bug as it is not relevant anymore.