Bug 1977241
| Summary: | registry-server is crashlooping due to livness probe failing | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Eran Cohen <ercohen> |
| Component: | OLM | Assignee: | Kevin Rizza <krizza> |
| OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> |
| Status: | CLOSED DUPLICATE | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | rfreiman |
| Version: | 4.8 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-06-29 12:42:42 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** This bug has been marked as a duplicate of bug 1976326 *** |
Description of problem: This issue is blocking SNO CI On Single node Openshift the registry-server keeps failing the liveness probe which leads to a crash loop: events: 00:58:54 openshift-marketplace kubelet redhat-operators-ghmtz Created Created container registry-server 00:58:55 openshift-marketplace kubelet redhat-operators-ghmtz Started Started container registry-server 00:59:04 openshift-marketplace kubelet redhat-operators-ghmtz Killing Stopping container registry-server 00:59:04 openshift-marketplace kubelet community-operators-6x2kn Created Created container registry-server 00:59:05 openshift-marketplace kubelet community-operators-6x2kn Started Started container registry-server 00:59:16 openshift-marketplace kubelet community-operators-6x2kn Killing Stopping container registry-server Kubelet log: Jun 26 01:28:19.024704 ip-10-0-130-102 hyperkube[1665]: I0626 01:28:19.024669 1665 kuberuntime_manager.go:683] "Message for Container of pod" containerName="registry-server" containerStatusID={Type:cri-o ID:ddb48f03bc37d335c7d3ff17305c2f57a7b059132f6fcdec2b3e98c465df64f7} pod="openshift-marketplace/certified-operators-f2wrk" containerMessage="Container registry-server failed liveness probe, will be restarted" Jun 26 01:28:19.024806 ip-10-0-130-102 hyperkube[1665]: I0626 01:28:19.024741 1665 kuberuntime_container.go:720] "Killing container with a grace period override" pod="openshift-marketplace/certified-operators-f2wrk" podUID=a984ce74-619f-4f08-ab75-f7b9ed00551d containerName="registry-server" containerID="cri-o://ddb48f03bc37d335c7d3ff17305c2f57a7b059132f6fcdec2b3e98c465df64f7" gracePeriod=30 This issue is failing a test in SNO CI job https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-single-node/1408576091152453632 https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-single-node/1407488936774733824 Version-Release number of selected component (if applicable): 4.8.0-0.nightly-2021-06-15-181825 * The issue still exists in 4.8.0-0.nightly-2021-06-28-165738 "name": "operator-marketplace", "annotations": { "io.openshift.build.commit.id": "e39ff59d5abc3e27effc7b726329d06a37644f2e", "io.openshift.build.source-location": "https://github.com/operator-framework/operator-marketplace" }, "from": { "kind": "DockerImage", "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3e3a8102eac7bac8dc67fb5303e7f842e4fbe44f4bf30c08178442e6ad68312d" "name": "operator-registry", "annotations": { "io.openshift.build.commit.id": "f25f670c03e849ba0fd53a56daa0d8a697f68d16", "io.openshift.build.source-location": "https://github.com/openshift/operator-framework-olm" }, "from": { "kind": "DockerImage", "name": "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0e2b9a90811f8a25e6cc9c6c825098861e8d70004c19e5f8f9e0f5527c8d99be" How reproducible: 100% The following test fail in the CI due to this issue: https://testgrid.k8s.io/redhat-single-node#periodic-ci-openshift-release-master-nightly-4.8-e2e-aws-single-node&show-stale-tests=&include-filter-by-regex=AlertmanagerReceiversNotConfigured&include-filter-by-regex=%20Alerts%20shouldn't%20report%20any%20alerts%20in%20firing%20or%20pending%20state%20apart%20from%20Watchdog%20and%20AlertmanagerReceiversNotConfigured Steps to Reproduce: 1. Run the 4.8-e2e-aws-single-node job 2. 3. Actual results: The marketplace registry-server is in a crashloop Expected results: Expected the pod not to crash Additional info: This issue is prominent in the single node e2e, but it’s also happening across a. bunch of other 4.8 suites: https://search.ci.openshift.org/chart?search=KubePodCrashLooping.*registry-server&maxAge[…]=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job