Bug 1785534
| Summary: | [swift]image-registry crashes if change managementState from Removed to Managed for old swift container is forbidden to be removed | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Wenjing Zheng <wzheng> | |
| Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> | |
| Status: | CLOSED ERRATA | QA Contact: | Wenjing Zheng <wzheng> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | low | |||
| Version: | 4.3.0 | CC: | adam.kaplan, aos-bugs | |
| Target Milestone: | --- | Keywords: | Regression | |
| Target Release: | 4.5.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause: the image registry operator didn't cleanup storage status when it were removing storage
Consequence: when the registry was switched back to managed, it wasn't able to detect that storage needs to be bootstrapped.
Fix: cleanup storage status
Result: the operator creates storage when it's switched back to managed state
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1806757 (view as bug list) | Environment: | ||
| Last Closed: | 2020-07-13 17:12:48 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1806757 | |||
Verified on 4.5.0-0.nightly-2020-03-25-223812. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |
Description of problem: image registry crashes if change managementState from Removed to Managed: [wzheng@openshift-qe 4.3]$ oc get pods NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-84dd486bdd-ttf8t 2/2 Running 0 26m image-registry-5d4f5f8d4f-2th4p 0/1 CrashLoopBackOff 3 83s image-registry-79966c8f54-8dl8h 0/1 CrashLoopBackOff 3 83s node-ca-2xjld 1/1 Running 0 21m node-ca-m77kj 1/1 Running 0 23m node-ca-mbzqc 1/1 Running 0 21m node-ca-plxr9 1/1 Running 0 26m node-ca-qhdc6 1/1 Running 0 26m node-ca-qwc4x 1/1 Running 0 26m [wzheng@openshift-qe 4.3]$ oc logs pods/image-registry-79966c8f54-8dl8h time="2019-12-20T06:21:29.293241493Z" level=info msg="start registry" distribution_version=v2.6.0+unknown go.version=go1.12.12 openshift_version=v4.3.0-201912130552+a9364d4-dirty time="2019-12-20T06:21:29.294077073Z" level=info msg="caching project quota objects with TTL 1m0s" go.version=go1.12.12 panic: No container parameter provided goroutine 1 [running]: github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/handlers.NewApp(0x1c03b20, 0xc0000c8058, 0xc0003f5c00, 0xc0003ca870) /go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/handlers/app.go:127 +0x31ac github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp(0x1c03b20, 0xc0000c8058, 0xc0003f5c00, 0x1c0aea0, 0xc0004d5560, 0x1c14d00) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0x85 github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp(0x1c03b20, 0xc0000c8058, 0x1bdbda0, 0xc00013c240, 0xc0003f5c00, 0xc0002997c0, 0x0, 0x0, 0x1, 0xc000052500) /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x2d4 github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer(0x1c03b20, 0xc0000c8058, 0xc0003f5c00, 0xc0002997c0, 0x0, 0x0, 0x1c3ef20) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:210 +0x1c2 github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute(0x1bc63e0, 0xc00013c018) /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:164 +0xa42 main.main() /go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x49c Then add container back, pod is running Version-Release number of selected component (if applicable): 4.3.0-0.nightly-2019-12-13-180405 How reproducible: Always Steps to Reproduce: 1.Change managementState to Removed, then image-registry Pod terminates; And there is no input in "container" as below: storage: swift: authURL: https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13000/v3 authVersion: "3" container: "" domain: redhat.com domainID: "" regionName: regionOne tenant: openshift-qe-jenkins tenantID: 542c6ebd48bf40fa857fc245c7572e30 2.Change back to Managed 3.Check pod status Actual results: Pod cannot start up and has panic inside pod log. Expected results: Pod should start up Additional info: 1. Workaround: manaully add container name back, pod will be running; 2. There are such error inside operator log: E1220 06:40:15.360862 12 controller.go:257] unable to sync: unable to remove storage: Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>, Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>, requeuing E1220 06:40:17.262434 12 controller.go:257] unable to sync: unable to remove storage: Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>, Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>, requeuing I1220 06:40:19.322896 12 controller.go:216] object changed: *v1.Config, Name=cluster (status=true): changed:status.conditions.0.lastTransitionTime={"2019-12-20T06:38:56Z" -> "2019-12-20T06:40:18Z"}, removed:status.conditions.0.message="Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>", changed:status.conditions.0.reason={"Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>" -> "Swift container Exists"}, changed:status.conditions.0.status={"Unknown" -> "True"}, changed:status.conditions.1.lastTransitionTime={"2019-12-20T06:39:09Z" -> "2019-12-20T06:40:19Z"}, changed:status.conditions.1.message={"The registry is removed" -> "The deployment does not have available replicas"}, changed:status.conditions.1.reason={"Removed" -> "NoReplicasAvailable"}, changed:status.conditions.1.status={"True" -> "False"}, changed:status.conditions.2.lastTransitionTime={"2019-12-20T06:40:17Z" -> "2019-12-20T06:40:19Z"}, changed:status.conditions.2.message={"All registry resources are removed" -> "The deployment has not completed"}, changed:status.conditions.2.reason={"Removed" -> "DeploymentNotCompleted"}, changed:status.conditions.2.status={"False" -> "True"}, changed:status.conditions.4.lastTransitionTime={"2019-12-20T06:38:56Z" -> "2019-12-20T06:40:19Z"}, removed:status.conditions.4.message="The registry is removed", removed:status.conditions.4.reason="Removed", changed:status.conditions.4.status={"True" -> "False"}, changed:status.observedGeneration={"11.000000" -> "12.000000"}