Bug 1910417
| Summary: | e2e-metal-compact-4.7 failing test [sig-arch] Managed cluster should have no crashlooping pods in core namespaces over four minutes | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | jamo luhrsen <jluhrsen> |
| Component: | Monitoring | Assignee: | Sudha Ponnaganti <sponnaga> |
| Status: | CLOSED DUPLICATE | QA Contact: | Jianwei Hou <jhou> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.7 | CC: | alegrand, anpicker, aos-bugs, eparis, erooth, jokerman, kakkoyun, lcosic, mloibl, pkrupa, surbania, wking |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-12-23 21:41:16 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
jamo luhrsen
2020-12-23 19:45:42 UTC
Confirming the tls.crt crashloop cause in one of those jobs: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-compact-4.6/1341603354345738240/artifacts/e2e-metal/pods.json | jq -r '.items[] | .metadata as $m | .status.containerStatuses[] | select(.restartCount > 2) | $m.namespace + " " + $m.name + " " + .name + " " + (.restartCount | tostring) + "\n" + .lastState.terminated.message ' openshift-monitoring cluster-monitoring-operator-5f697b97cf-zxk8z kube-rbac-proxy 3 I1223 05:08:52.636459 1 main.go:188] Valid token audiences: I1223 05:08:52.636553 1 main.go:261] Reading certificate files F1223 05:08:52.636578 1 main.go:265] Failed to initialize certificate reloader: error loading certificates: error loading certificate: open /etc/tls/private/tls.crt: no such file or directory Ah, looking at one of the 4.7 jobs [1]: $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-compact-4.7/1334219109352607744/artifacts/e2e-metal/pods.json | jq -r '.items[] | .metadata as $m | .status.containerStatuses[] | select(.restartCount > 2) | $m.namespace + " " + $m.name + " " + .name + " " + (.restartCount | tostring) + "\n" + .lastState.terminated.message' openshift-controller-manager-operator openshift-controller-manager-operator-7d84c45df8-25l9v openshift-controller-manager-operator 3 openshift-etcd-operator etcd-operator-5f8d959d79-vhjxg etcd-operator 3 openshift-kube-controller-manager-operator kube-controller-manager-operator-696fffdf8-2mvf5 kube-controller-manager-operator 3 openshift-kube-scheduler openshift-kube-scheduler-master-2.ci-op-44cfs12v-22d79.origin-ci-int-aws.dev.rhcloud.com kube-scheduler-recovery-controller 7 127.0.0.1:10443 0.0.0.0:* ' ']' + sleep 1 ++ ss -Htanop '(' sport = 10443 ')' + '[' -n 'LISTEN 0 128 127.0.0.1:10443 0.0.0.0:* ' ']' + sleep 1 ... ++ ss -Htanop '(' sport = 10443 ')' + '[' -n 'LISTEN 0 128 127.0.0.1:10443 0.0.0.0:* ' ']' + sleep 1 openshift-kube-storage-version-migrator-operator kube-storage-version-migrator-operator-c87ff95dc-4k6jh kube-storage-version-migrator-operator 3 *That* is bug 1908145. I'm going to guess this is a dup of that one (which is still POST), since this is a compact cluster. We can re-open if I'm wrong ;). [1]: https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-metal-compact-4.7/1334219109352607744 *** This bug has been marked as a duplicate of bug 1908145 *** |