Bug 1812219

Summary: In IPv6 bare metal deployment making tolerations configurable for monitoring components with NoExecute leaves pods in ContainerCreating and Pending state
Product: OpenShift Container Platform Reporter: Marius Cornea <mcornea>
Component: MonitoringAssignee: Pawel Krupa <pkrupa>
Status: CLOSED NOTABUG QA Contact: Junqi Zhao <juzhao>
Severity: low Docs Contact:
Priority: unspecified    
Version: 4.3.zCC: alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-07 13:41:06 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marius Cornea 2020-03-10 19:10:33 UTC
Description of problem:

Running https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-24078 on a bare metal deployment with IPv6 control plane. After creating toleration.yaml there are many pods left in not Running state:

[kni@provisionhost-0 ~]$ oc get pods -A | grep -v Running | grep -v Complete
NAMESPACE                                               NAME                                                                   READY   STATUS              RESTARTS   AGE
openshift-authentication-operator                       authentication-operator-b4d85bfd8-ncsfw                                0/1     ContainerCreating   0          71m
openshift-authentication                                oauth-openshift-bcb7f86db-tx9j4                                        0/1     ContainerCreating   0          71m
openshift-authentication                                oauth-openshift-bcb7f86db-x9s77                                        0/1     ContainerCreating   0          71m
openshift-cluster-node-tuning-operator                  cluster-node-tuning-operator-5d48686554-z26mp                          0/1     ContainerCreating   0          71m
openshift-cluster-samples-operator                      cluster-samples-operator-55b6755466-q6xrh                              0/2     ContainerCreating   0          71m
openshift-console                                       console-6d7c9f64f6-rhmdw                                               0/1     ContainerCreating   0          71m
openshift-console                                       downloads-664fc66646-cft8c                                             0/1     ContainerCreating   0          71m
openshift-image-registry                                cluster-image-registry-operator-6845546d69-vw97g                       0/2     ContainerCreating   0          71m
openshift-ingress-operator                              ingress-operator-d7fbcfd57-qk56n                                       0/2     ContainerCreating   0          71m
openshift-ingress                                       router-default-6c95df6b4d-b5gvp                                        0/1     Pending             0          71m
openshift-ingress                                       router-default-6c95df6b4d-j5755                                        0/1     Pending             0          71m
openshift-machine-api                                   machine-api-operator-7f8cf8f4cb-jph46                                  0/2     ContainerCreating   0          71m
openshift-machine-config-operator                       etcd-quorum-guard-f66bdbcf5-gx9p4                                      0/1     Pending             0          71m
openshift-machine-config-operator                       etcd-quorum-guard-f66bdbcf5-rr84j                                      0/1     Pending             0          71m
openshift-machine-config-operator                       machine-config-controller-77d75cd78f-zj948                             0/1     ContainerCreating   0          71m
openshift-marketplace                                   marketplace-operator-8656745c5b-xqj74                                  0/1     ContainerCreating   0          71m
openshift-monitoring                                    alertmanager-main-1                                                    0/3     ContainerCreating   0          70m
openshift-monitoring                                    alertmanager-main-2                                                    0/3     ContainerCreating   0          70m
openshift-monitoring                                    grafana-654988bdbb-7ql4t                                               0/2     ContainerCreating   0          65m
openshift-monitoring                                    grafana-795c64fd8d-jljbz                                               0/2     ContainerCreating   0          71m
openshift-monitoring                                    kube-state-metrics-64b5c49b85-c6t95                                    0/3     ContainerCreating   0          65m
openshift-monitoring                                    openshift-state-metrics-5b45b55d4f-xl9df                               0/3     ContainerCreating   0          71m
openshift-monitoring                                    openshift-state-metrics-6d987dbcf7-jhm6r                               0/3     ContainerCreating   0          65m
openshift-monitoring                                    prometheus-adapter-74854f85d4-pv2bh                                    0/1     ContainerCreating   0          65m
openshift-monitoring                                    prometheus-adapter-77bdd66c6b-r5xqd                                    0/1     ContainerCreating   0          71m
openshift-monitoring                                    prometheus-adapter-77bdd66c6b-wxnkf                                    0/1     ContainerCreating   0          71m
openshift-monitoring                                    prometheus-k8s-0                                                       0/7     ContainerCreating   0          70m
openshift-monitoring                                    prometheus-operator-5859b9c4cf-w5q58                                   0/1     ContainerCreating   0          71m
openshift-monitoring                                    prometheus-operator-64fb7c8f9b-2qk8v                                   0/1     ContainerCreating   0          65m
openshift-monitoring                                    thanos-querier-5fd69767b5-dwpdt                                        0/4     ContainerCreating   0          71m
openshift-must-gather-8hq7j                             must-gather-k626f                                                      0/1     Init:0/1            0          15m
openshift-operator-lifecycle-manager                    packageserver-66f75d869f-qdddm                                         0/1     ContainerCreating   0          5m52s
openshift-operator-lifecycle-manager                    packageserver-98859c6bc-nznfr                                          0/1     ContainerCreating   0          51s
openshift-service-ca-operator                           service-ca-operator-84c855cddd-v594p                                   0/1     ContainerCreating   0          71m
openshift-service-ca                                    apiservice-cabundle-injector-578dc5b9fd-w9rkc                          0/1     ContainerCreating   0          71m
openshift-service-ca                                    configmap-cabundle-injector-846d56484b-2jqvp                           0/1     ContainerCreating   0          71m
openshift-service-catalog-apiserver-operator            openshift-service-catalog-apiserver-operator-d5c487cc-pbz8j            0/1     ContainerCreating   0          71m
openshift-service-catalog-controller-manager-operator   openshift-service-catalog-controller-manager-operator-785f6bbn6        0/1     ContainerCreating   0          71m


Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2020-03-09-172027

How reproducible:
100%

Steps to Reproduce:
1. Deploy bare metal IPI with IPv6 control plane 3 x masters + 2 x workers
2. Run steps described in https://polarion.engineering.redhat.com/polarion/#/project/OSE/workitem?id=OCP-24078

Actual results:
pods stuck in ContainerCreating or pending state

Expected results:
pods are in Running state

Additional info: 
At this point must-gather gets stuck as described in BZ#1809614

Please let me know if there's any info I can pull manually from the cluster nodes.

Comment 1 Marius Cornea 2020-03-10 22:33:00 UTC
Note that pods actually get into ContainerCreating state after:

for i in $(kubectl get node --no-headers | grep -v  master-0.ocp-edge-cluster.qe.lab.redhat.com | awk '{print $1}'); do echo $i;  kubectl taint nodes $i monitoring=true:NoExecute; done

Comment 2 Junqi Zhao 2020-03-11 03:30:26 UTC
(In reply to Marius Cornea from comment #1)
> Note that pods actually get into ContainerCreating state after:
> 
> for i in $(kubectl get node --no-headers | grep -v 
> master-0.ocp-edge-cluster.qe.lab.redhat.com | awk '{print $1}'); do echo $i;
> kubectl taint nodes $i monitoring=true:NoExecute; done

This is expected, since you did not add tolerations to monitoring pods, see the step 4 from the case

Comment 6 Pawel Krupa 2020-03-31 13:19:50 UTC
Seems to me like the issue is with what Junqi said in https://bugzilla.redhat.com/show_bug.cgi?id=1812219#c2

@Marius please confirm that tolerations were added properly

Comment 7 Marius Cornea 2020-04-07 13:41:06 UTC
(In reply to Pawel Krupa from comment #6)
> Seems to me like the issue is with what Junqi said in
> https://bugzilla.redhat.com/show_bug.cgi?id=1812219#c2
> 
> @Marius please confirm that tolerations were added properly

Closing this as not a bug as it is not relevant anymore.