Bug 1995575 - clusteroperator/dns condition/Degraded status/True reason/DNS default is degraded
Summary: clusteroperator/dns condition/Degraded status/True reason/DNS default is degr...
Keywords:
Status: CLOSED DUPLICATE of bug 1939723
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: aos-network-edge-staff
QA Contact: Hongan Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-19 12:42 UTC by Jan Chaloupka
Modified: 2022-08-04 22:39 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
job=periodic-ci-openshift-release-master-ci-4.9-e2e-aws-serial=all
Last Closed: 2021-08-19 16:21:01 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jan Chaloupka 2021-08-19 12:42:02 UTC
From https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-serial/1427548375770730496:
```
1 unexpected clusteroperator state transitions during e2e test run

Aug 17 09:42:56.401 - 20s   E clusteroperator/dns condition/Degraded status/True reason/DNS default is degraded
```

From https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-serial/1427548375770730496/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-dns-operator_dns-operator-b8d54b65-xfrz2_dns-operator.log:
```
time="2021-08-17T09:42:52Z" level=info msg="updated DNS default status:
old:
  v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764787149, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"False\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790164, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"All DNS and node-resolver pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786855, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786834, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}},
new:
  v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764787149, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790172, loc:(*time.Location)(0x243cac0)}}, Reason:\"Reconciling\", Message:\"Have 6 available DNS pods, want 7.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786855, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786834, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"

time="2021-08-17T09:42:56Z" level=info msg="updated DNS default status:
old:
  v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764787149, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790172, loc:(*time.Location)(0x243cac0)}}, Reason:\"Reconciling\", Message:\"Have 6 available DNS pods, want 7.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786855, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786834, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}},

new:
  v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790176, loc:(*time.Location)(0x243cac0)}}, Reason:\"MaxUnavailableDNSPodsExceeded\", Message:\"Too many DNS pods are unavailable (2 > 1 max unavailable).\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790176, loc:(*time.Location)(0x243cac0)}}, Reason:\"Reconciling\", Message:\"Have 6 available DNS pods, want 8.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786855, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786834, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"

time="2021-08-17T09:43:16Z" level=info msg="updated DNS default status:
old:
  v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790176, loc:(*time.Location)(0x243cac0)}}, Reason:\"MaxUnavailableDNSPodsExceeded\", Message:\"Too many DNS pods are unavailable (2 > 1 max unavailable).\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790176, loc:(*time.Location)(0x243cac0)}}, Reason:\"Reconciling\", Message:\"Have 6 available DNS pods, want 8.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786855, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786834, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}},
new:
  v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790196, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790196, loc:(*time.Location)(0x243cac0)}}, Reason:\"Reconciling\", Message:\"Have 7 available DNS pods, want 8.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786855, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786834, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"

time="2021-08-17T09:43:30Z" level=info msg="updated DNS default status:
old:
  v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790196, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790196, loc:(*time.Location)(0x243cac0)}}, Reason:\"Reconciling\", Message:\"Have 7 available DNS pods, want 8.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786855, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786834, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}},
new:
  v1.DNSStatus{ClusterIP:\"172.30.0.10\", ClusterDomain:\"cluster.local\", Conditions:[]v1.OperatorCondition{v1.OperatorCondition{Type:\"Degraded\", Status:\"False\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790196, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"Enough DNS pods are available, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Progressing\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764790210, loc:(*time.Location)(0x243cac0)}}, Reason:\"Reconciling\", Message:\"Have 6 available DNS pods, want 7.\\nHave 7 available node-resolver pods, want 8.\"}, v1.OperatorCondition{Type:\"Available\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786855, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"The DNS daemonset has available pods, and the DNS service has a cluster IP address.\"}, v1.OperatorCondition{Type:\"Upgradeable\", Status:\"True\", LastTransitionTime:v1.Time{Time:time.Time{wall:0x0, ext:63764786834, loc:(*time.Location)(0x243cac0)}}, Reason:\"AsExpected\", Message:\"DNS Operator can be upgraded\"}}}"
```

- "MaxUnavailableDNSPodsExceeded: Too many DNS pods are unavailable (2 > 1 max unavailable)"
- "Reconciling: Have 6 available DNS pods, want 8"

Checking kubelet logs for all dns-default-XXXXX pods:
openshift-dns/dns-default-t5vwq
openshift-dns/dns-default-4bbbm
openshift-dns/dns-default-ghvj7
openshift-dns/dns-default-4m9m9
openshift-dns/dns-default-ps4jg
openshift-dns/dns-default-4b2jc
openshift-dns/dns-default-nlsr9
openshift-dns/dns-default-zk2jc (kubelet logs for this pod not available)
```
ip-10-0-171-241.us-east-2.compute.internal:7591:Aug 17 08:47:38.765247 ip-10-0-171-241 hyperkube[1360]: I0817 08:47:38.765223    1360 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-dns/dns-default-t5vwq"

ip-10-0-153-173.us-east-2.compute.internal:4568:Aug 17 09:43:16.820839 ip-10-0-153-173 hyperkube[1355]: I0817 09:43:16.820814    1355 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-dns/dns-default-4bbbm"
ip-10-0-153-173.us-east-2.compute.internal:6842:Aug 17 09:54:26.697898 ip-10-0-153-173 hyperkube[1355]: I0817 09:54:26.697863    1355 kubelet.go:2081] "SyncLoop DELETE" source="api" pods=[openshift-dns/dns-default-4bbbm]
ip-10-0-153-173.us-east-2.compute.internal:6870:Aug 17 09:54:26.890846 ip-10-0-153-173 crio[1316]: time="2021-08-17 09:54:26.890818400Z" level=info msg="Stopped container 6f11ca6abc7aa78abb094584a8f32bae41190a55806d6b9439e5a9413d754b72: openshift-dns/dns-default-4bbbm/kube-rbac-proxy" id=6fb5f315-b7cd-4cda-abae-8e5f506da196 name=/runtime.v1alpha2.RuntimeService/StopContainer
ip-10-0-153-173.us-east-2.compute.internal:7011:Aug 17 09:54:46.851871 ip-10-0-153-173 crio[1316]: time="2021-08-17 09:54:46.851831930Z" level=info msg="Stopped container f0df0bfeff6e8fba39d50a868bd4cae5cf50fd4d00b3b934c4a9161aeecf738b: openshift-dns/dns-default-4bbbm/dns" id=ef803c75-262b-4057-baa5-dc26c2e29046 name=/runtime.v1alpha2.RuntimeService/StopContainer

ip-10-0-153-173.us-east-2.compute.internal:7226:Aug 17 09:55:43.766897 ip-10-0-153-173 hyperkube[1355]: I0817 09:55:43.766867    1355 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-dns/dns-default-ghvj7"

ip-10-0-149-160.us-east-2.compute.internal:6349:Aug 17 08:47:38.765156 ip-10-0-149-160 hyperkube[1362]: I0817 08:47:38.765138    1362 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-dns/dns-default-4m9m9"

ip-10-0-153-121.us-east-2.compute.internal:4400:Aug 17 08:52:29.217828 ip-10-0-153-121 hyperkube[1368]: I0817 08:52:29.217784    1368 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-dns/dns-default-ps4jg"

ip-10-0-197-18.us-east-2.compute.internal:6295:Aug 17 08:47:35.798749 ip-10-0-197-18 hyperkube[1354]: I0817 08:47:35.798723    1354 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-dns/dns-default-4b2jc"

ip-10-0-222-193.us-east-2.compute.internal:4917:Aug 17 08:52:52.314510 ip-10-0-222-193 hyperkube[1351]: I0817 08:52:52.314484    1351 kubelet.go:2175] "SyncLoop (probe)" probe="readiness" status="ready" pod="openshift-dns/dns-default-nlsr9"
```

Sorted by the creation timestamp:
- <08:47:38.765156; ???>: dns-default-4m9m9
- <08:47:38.765223; ???>: dns-default-t5vwq
- <08:47:35.798749; ???>: dns-default-4b2jc
- <08:52:29.217828; ???>: dns-default-ps4jg
- <08:52:52.314510; ???>: dns-default-nlsr9
- <09:42:56.175977; 09:44:39.907036>: dns-default-zk2jc (timestamps from KCM logs)
- <09:43:16.820814; 09:54:26.697863>: dns-default-4bbbm
- <09:55:43.766867; ???>: dns-default-ghvj7

dns-default-4bbbm was deleted (at 09:54:26.697863) and replaced by dns-default-ghvj7 (at 09:55:43.766867).

From https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-serial/1427548375770730496/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-machine-api_machine-api-controllers-55cdbf6848-p7tnl_machine-controller.log:
```
I0817 08:47:28.559467       1 controller.go:174] ci-op-31mwgb8m-170bf-f8td6-master-0: reconciling Machine
I0817 08:47:28.602129       1 controller.go:174] ci-op-31mwgb8m-170bf-f8td6-master-1: reconciling Machine
I0817 08:47:28.631648       1 controller.go:174] ci-op-31mwgb8m-170bf-f8td6-master-2: reconciling Machine

I0817 09:39:44.213344       1 controller.go:174] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-ddmwq: reconciling Machine
I0817 09:39:44.288082       1 reconciler.go:265] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-ddmwq: Instance does not exist
I0817 09:42:56.844204       1 actuator.go:159] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-ddmwq: actuator deleting machine
I0817 09:42:56.844912       1 reconciler.go:111] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-ddmwq: deleting machine
I0817 09:42:56.845143       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Node \"ip-10-0-248-166.us-east-2.compute.internal\" drained" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-ddmwq","uid":"4b1f824c-c3cf-4892-806b-8c7be0c1e6eb","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"66045"} "reason"="Deleted"

I0817 08:47:28.714582       1 controller.go:174] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-l6v6p: reconciling Machine
I0817 08:48:14.649421       1 reconciler.go:265] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-l6v6p: Instance does not exist
I0817 09:49:16.141156       1 machine_scope.go:140] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-l6v6p: Updating status
I0817 09:49:16.207629       1 machine_scope.go:167] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-l6v6p: finished calculating AWS status
I0817 09:49:16.207652       1 machine_scope.go:86] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-l6v6p: patching machine

I0817 08:47:28.652937       1 controller.go:174] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-f9z59: reconciling Machine
I0817 08:48:14.353852       1 reconciler.go:265] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-f9z59: Instance does not exist
I0817 09:42:57.337271       1 controller.go:438] drain successful for machine "ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-f9z59"
I0817 09:42:57.337293       1 actuator.go:159] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-f9z59: actuator deleting machine
I0817 09:42:57.337972       1 recorder.go:104] controller-runtime/manager/events "msg"="Normal"  "message"="Node \"ip-10-0-159-125.us-east-2.compute.internal\" drained" "object"={"kind":"Machine","namespace":"openshift-machine-api","name":"ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-f9z59","uid":"d8f655a5-09d8-41c5-aa6a-0d865f169286","apiVersion":"machine.openshift.io/v1beta1","resourceVersion":"66058"} "reason"="Deleted"

I0817 09:39:45.748890       1 controller.go:174] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-m4r7f: reconciling Machine
I0817 09:39:45.999192       1 reconciler.go:265] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-m4r7f: Instance does not exist
I0817 09:49:15.716395       1 machine_scope.go:140] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-m4r7f: Updating status
I0817 09:49:15.785140       1 machine_scope.go:167] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-m4r7f: finished calculating AWS status
I0817 09:49:15.785165       1 machine_scope.go:86] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-m4r7f: patching machine

I0817 08:47:28.674951       1 controller.go:174] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-mfv8p: reconciling Machine
I0817 08:48:14.486305       1 reconciler.go:265] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-mfv8p: Instance does not exist
I0817 09:49:15.923159       1 machine_scope.go:140] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-mfv8p: Updating status
I0817 09:49:15.994410       1 machine_scope.go:167] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-mfv8p: finished calculating AWS status
I0817 09:49:15.994431       1 machine_scope.go:86] ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-mfv8p: patching machine
```

Sorted by the reconciliation timestamp:
```
- <09:39:44.213344;09:42:56.845143>: ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-ddmwq
- <09:39:45.748890;???>: ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-m4r7f
- <08:47:28.559467;???>: ci-op-31mwgb8m-170bf-f8td6-master-0
- <08:47:28.602129;???>: ci-op-31mwgb8m-170bf-f8td6-master-1
- <08:47:28.631648;???>: ci-op-31mwgb8m-170bf-f8td6-master-2
- <08:47:28.652937;09:42:57.337972>: ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-f9z59
- <08:47:28.674951;???>: ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-mfv8p
- <08:47:28.714582;???>: ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-l6v6p
```

machine to node mapping (based on https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.9-e2e-aws-serial/1427548375770730496/artifacts/e2e-aws-serial/gather-extra/artifacts/pods/openshift-machine-api_machine-api-controllers-55cdbf6848-p7tnl_nodelink-controller.log):
```
Found machine "ci-op-31mwgb8m-170bf-f8td6-master-0" for node "ip-10-0-171-241.us-east-2.compute.internal" with providerID "aws:///us-east-2c/i-02a34cc9bfea88383"
Found machine "ci-op-31mwgb8m-170bf-f8td6-master-1" for node "ip-10-0-197-18.us-east-2.compute.internal" with providerID "aws:///us-east-2b/i-04f54b08c1bebc344"
Found machine "ci-op-31mwgb8m-170bf-f8td6-master-2" for node "ip-10-0-149-160.us-east-2.compute.internal" with providerID "aws:///us-east-2c/i-0338f0a287ae99e53"
Found machine "ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-ddmwq" for node "ip-10-0-248-166.us-east-2.compute.internal" with providerID "aws:///us-east-2b/i-078edd17688a06336"
Found machine "ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-l6v6p" for node "ip-10-0-222-193.us-east-2.compute.internal" with providerID "aws:///us-east-2b/i-0a5b6a55ed6792c66"
Found machine "ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-f9z59" for node "ip-10-0-159-125.us-east-2.compute.internal" with providerID "aws:///us-east-2c/i-02a573a5abbd66500"
Found machine "ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-m4r7f" for node "ip-10-0-153-173.us-east-2.compute.internal" with providerID "aws:///us-east-2c/i-09c2314503f31634a"
Found machine "ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-mfv8p" for node "ip-10-0-153-121.us-east-2.compute.internal" with providerID "aws:///us-east-2c/i-0effad117910e69d7"
```
- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-ddmwq -> ip-10-0-248-166.us-east-2.compute.internal
- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-m4r7f -> ip-10-0-153-173.us-east-2.compute.internal
- ci-op-31mwgb8m-170bf-f8td6-master-0 -> ip-10-0-171-241.us-east-2.compute.internal
- ci-op-31mwgb8m-170bf-f8td6-master-1 -> ip-10-0-197-18.us-east-2.compute.internal
- ci-op-31mwgb8m-170bf-f8td6-master-2 -> ip-10-0-149-160.us-east-2.compute.internal
- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-f9z59 -> ip-10-0-159-125.us-east-2.compute.internal
- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-mfv8p -> ip-10-0-153-121.us-east-2.compute.internal
- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-l6v6p -> ip-10-0-222-193.us-east-2.compute.internal

# Machine <creation;deletion> -> node (dns pods) mapping

- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-ddmwq <09:39:44.213344;09:42:56.845143> -> ip-10-0-248-166.us-east-2.compute.internal (dns-default-zk2jc)
- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-m4r7f <09:39:45.748890;???????????????> -> ip-10-0-153-173.us-east-2.compute.internal (dns-default-4bbbm, dns-default-ghvj7)
- ci-op-31mwgb8m-170bf-f8td6-master-0 <08:47:28.559467;???????????????> -> ip-10-0-171-241.us-east-2.compute.internal (dns-default-t5vwq)
- ci-op-31mwgb8m-170bf-f8td6-master-1 <08:47:28.602129;???????????????> -> ip-10-0-197-18.us-east-2.compute.internal (dns-default-4b2jc)
- ci-op-31mwgb8m-170bf-f8td6-master-2 <08:47:28.631648;???????????????> -> ip-10-0-149-160.us-east-2.compute.internal (dns-default-4m9m9)
- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-f9z59 <08:47:28.652937;09:42:57.337972> -> ip-10-0-159-125.us-east-2.compute.internal
- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2c-mfv8p <08:47:28.674951;???????????????> -> ip-10-0-153-121.us-east-2.compute.internal (dns-default-ps4jg)
- ci-op-31mwgb8m-170bf-f8td6-worker-us-east-2b-l6v6p <08:47:28.714582;???????????????> -> ip-10-0-222-193.us-east-2.compute.internal (dns-default-nlsr9)

# Conlusion:
The dns operator changed the Degraded condition to True at 09:42:56. By that time there were 8 nodes. Right after the condition Degraded changed to true (at 09:42:56.845143), the 8-th node got deleted and the condition Degraded changed back to false at 09:43:16. Although, the operator reported 6 pods available (of 7) from 09:42:52. With nodes ip-10-0-153-173.us-east-2.compute.internal observed by KCM at 09:42:05.127375 and ip-10-0-248-166.us-east-2.compute.internal observed by KCM at 09:42:10.128504. Both nodes appeared to quickly in a row (09:42:05.127375 and 09:42:10.128504) and the dns DS had only 51s to accommodate for the new dns pod (so the max unavailable is 1 at most) since the 7-th node appeared. The dns-default-zk2jc pod got created at 09:42:56.175977 giving 6 pods available in total. The next dns-default-4bbbm at 09:43:16.820814 after the condition Degraded changed to true.

Expected behavior: Either the dns pods are created on time or the operator recognizes the pods are missing due to new nodes added and waits a bit longer before the new DNS pods are spinned.

Other failures: https://search.ci.openshift.org/?search=clusteroperator%2Fdns+condition%2FDegraded+status%2FTrue+reason%2FDNS+default+is+degraded&maxAge=24h&context=1&type=junit&name=periodic-ci-openshift-release-master-ci-4.9-e2e-aws-serial&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

Comment 1 Miciah Dashiel Butler Masters 2021-08-19 16:21:01 UTC

*** This bug has been marked as a duplicate of bug 1939723 ***


Note You need to log in before you can comment on or make changes to this bug.