Bug 1930537
Summary: | [sig-arch] Managed cluster should have no crashlooping pods in core namespaces over four minutes | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Miciah Dashiel Butler Masters <mmasters> | |
Component: | OLM | Assignee: | Joe Lanford <jlanford> | |
OLM sub component: | OperatorHub | QA Contact: | Tom Buskey <tbuskey> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | jdelft, nhale | |
Version: | 4.6 | |||
Target Milestone: | --- | |||
Target Release: | 4.6.z | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1937167 (view as bug list) | Environment: |
[sig-arch] Managed cluster should have no crashlooping pods in core namespaces over four minutes
|
|
Last Closed: | 2021-04-20 19:27:20 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1937170 | |||
Bug Blocks: |
Description
Miciah Dashiel Butler Masters
2021-02-19 05:43:45 UTC
Checking the logs for the given example, it looks like the test is failing pods that started -- and are still pending -- within seconds of the check: Feb 11 06:17:53.263: INFO: Pod status openshift-marketplace/community-operators-zxvzd: { "phase": "Pending", "conditions": [ { "type": "Initialized", "status": "True", "lastProbeTime": null, "lastTransitionTime": "2021-02-11T06:17:45Z" }, { "type": "Ready", "status": "False", "lastProbeTime": null, "lastTransitionTime": "2021-02-11T06:17:45Z", "reason": "ContainersNotReady", "message": "containers with unready status: [registry-server]" }, { "type": "ContainersReady", "status": "False", "lastProbeTime": null, "lastTransitionTime": "2021-02-11T06:17:45Z", "reason": "ContainersNotReady", "message": "containers with unready status: [registry-server]" }, { "type": "PodScheduled", "status": "True", "lastProbeTime": null, "lastTransitionTime": "2021-02-11T06:17:45Z" } ], "hostIP": "10.0.32.4", "startTime": "2021-02-11T06:17:45Z", "containerStatuses": [ { "name": "registry-server", "state": { "waiting": { "reason": "ContainerCreating" } }, "lastState": {}, "ready": false, "restartCount": 0, "image": "registry.redhat.io/redhat/community-operator-index:latest", "imageID": "" } ], "qosClass": "Burstable" } Feb 11 06:17:53.272: INFO: Running AfterSuite actions on all nodes Feb 11 06:17:53.272: INFO: Running AfterSuite actions on node 1 fail [github.com/openshift/origin/test/extended/operators/cluster.go:151]: Expected <[]string | len:1, cap:1>: [ "Pod openshift-marketplace/community-operators-zxvzd was pending entire time: unknown error", ] to be empty Checking the code, it seems like release-4.6 is missing some logic, which exists in master, to prevent such pods from failing the test: - master: https://github.com/openshift/origin/blob/7e958d0a1fddefe8f47c50c40c33a9c5096f2d75/test/extended/operators/cluster.go#L140 - release-4.6: https://github.com/openshift/origin/blob/ae4a31dc9325a685e050768d80670071c242e6d8/test/extended/operators/cluster.go#L134 It looks like there's already an open PR against test tests in release-4.6: https://github.com/openshift/origin/pull/25600 If it looks good, maybe we can push it through. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6.25 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:1153 |