Bug 1949991
Summary: | openshift-marketplace pods are crashlooping | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Oleg Bulatov <obulatov> |
Component: | OLM | Assignee: | Anik <anbhatta> |
OLM sub component: | OperatorHub | QA Contact: | Tom Buskey <tbuskey> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | anbhatta, bandrade, dsover, krizza, lakshmi.ravichandran1, nhale, rgudimet |
Version: | 4.8 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | 4.8.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Changes submitted for bug 1949991 Email sent to: anbhatta@redhat.com, rgudimet@redhat.com, inout@strikr.io, sankarshan.mukhopadhyay@gmail.com, nhale@redhat.com, jason.brenneman@bkfs.com, dsover@redhat.com, tbuskey@redhat.com, obulatov@redhat.com, ccornejo@redhat.com, krizza@redhat.com, lakshmi.ravichandran1@ibm.com, ableisch@redhat.com | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: |
[sig-arch] Managed cluster should have no crashlooping pods in core namespaces over four minutes [Suite:openshift/conformance/parallel]
|
|
Last Closed: | 2021-07-27 23:01:13 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Oleg Bulatov
2021-04-15 14:39:15 UTC
*** This bug has been marked as a duplicate of bug 1949337 *** Closing 1949337 as the duplicate and keeping re-opening this for tracking. *** Bug 1949337 has been marked as a duplicate of this bug. *** Looks like this could have been caused due to bad default CatalogSource images built in the pipeline. It appears to be transient however, and the images fixed since the last case of crashlooping pods. Running the marketplace-operator from latest master in a 4.8.0-0.ci-2021-04-20-092252 cluster did not show any crahlooping pods in the openshift-marketplace namespace: ``` $ oc get pods NAME READY STATUS RESTARTS AGE certified-operators-ttbgw 1/1 Running 0 121m community-operators-htdwg 1/1 Running 0 121m marketplace-operator-7fd8f5d9fb-tbzld 1/1 Running 0 17m redhat-marketplace-6trz7 1/1 Running 0 121m redhat-operators-p6n9q 1/1 Running 0 105m ``` The problem is still there, please don't close this BZ until the CI failure rate decreases: https://triage.dptools.openshift.org/?text=openshift-marketplace&job=4.8&test=Managed%20cluster%20should%20have%20no%20crashlooping%20pods%20in%20core%20namespaces%20over%20four%20minutes An example of a failed job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-azure/1384791615964450816 Failure message: fail [github.com/openshift/origin/test/extended/operators/cluster.go:160]: Expected <[]string | len:1, cap:1>: [ "Pod openshift-marketplace/community-operators-28lkj is not healthy: container registry-server exited with non-zero exit code", ] to be empty Either these pods should be fixed and shouldn't have non-zero exit codes, or an exception should be added for them [1]. As this pod is not present after e2e tests have finished, most likely this pod was created (and deleted) by a test. [1]: https://github.com/openshift/origin/blob/e945cb88da780e21c021b6c8b430454bcfb881cf/test/extended/operators/cluster.go#L47 Moving the priority of this bz to urgent given that this is blocking CI Since the pods are getting deleted after the tests, the logs from the tests have no indication regarding why the pods were getting killed in a loop apart from the log that states that the registry container exit with exit code 2. Opened https://bugzilla.redhat.com/show_bug.cgi?id=1952238 to report back logs from catalog pods to catalog operator on termination. Once the fix for https://bugzilla.redhat.com/show_bug.cgi?id=1952238 goes in, we'll investigate the logs again. *** Bug 1951617 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 |