Bug 1785534 - [swift]image-registry crashes if change managementState from Removed to Managed for old swift container is forbidden to be removed
Summary: [swift]image-registry crashes if change managementState from Removed to Manag...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.3.0
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
: 4.5.0
Assignee: Oleg Bulatov
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks: 1806757
TreeView+ depends on / blocked
 
Reported: 2019-12-20 06:58 UTC by Wenjing Zheng
Modified: 2020-07-13 17:13 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: the image registry operator didn't cleanup storage status when it were removing storage Consequence: when the registry was switched back to managed, it wasn't able to detect that storage needs to be bootstrapped. Fix: cleanup storage status Result: the operator creates storage when it's switched back to managed state
Clone Of:
: 1806757 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:12:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-image-registry-operator pull 463 0 None closed Bug 1785534: fix regression 2020-12-03 15:52:32 UTC
Github openshift cluster-image-registry-operator pull 474 0 None closed Bug 1785534: remove storage status when storage is removed 2020-12-03 15:52:32 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:13:19 UTC

Description Wenjing Zheng 2019-12-20 06:58:19 UTC
Description of problem:
image registry crashes if change managementState from Removed to Managed:
[wzheng@openshift-qe 4.3]$ oc get pods
NAME                                               READY   STATUS             RESTARTS   AGE
cluster-image-registry-operator-84dd486bdd-ttf8t   2/2     Running            0          26m
image-registry-5d4f5f8d4f-2th4p                    0/1     CrashLoopBackOff   3          83s
image-registry-79966c8f54-8dl8h                    0/1     CrashLoopBackOff   3          83s
node-ca-2xjld                                      1/1     Running            0          21m
node-ca-m77kj                                      1/1     Running            0          23m
node-ca-mbzqc                                      1/1     Running            0          21m
node-ca-plxr9                                      1/1     Running            0          26m
node-ca-qhdc6                                      1/1     Running            0          26m
node-ca-qwc4x                                      1/1     Running            0          26m
[wzheng@openshift-qe 4.3]$ oc logs pods/image-registry-79966c8f54-8dl8h
time="2019-12-20T06:21:29.293241493Z" level=info msg="start registry" distribution_version=v2.6.0+unknown go.version=go1.12.12 openshift_version=v4.3.0-201912130552+a9364d4-dirty
time="2019-12-20T06:21:29.294077073Z" level=info msg="caching project quota objects with TTL 1m0s" go.version=go1.12.12
panic: No container parameter provided

goroutine 1 [running]:
github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/handlers.NewApp(0x1c03b20, 0xc0000c8058, 0xc0003f5c00, 0xc0003ca870)
    /go/src/github.com/openshift/image-registry/vendor/github.com/docker/distribution/registry/handlers/app.go:127 +0x31ac
github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware.NewApp(0x1c03b20, 0xc0000c8058, 0xc0003f5c00, 0x1c0aea0, 0xc0004d5560, 0x1c14d00)
    /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/supermiddleware/app.go:96 +0x85
github.com/openshift/image-registry/pkg/dockerregistry/server.NewApp(0x1c03b20, 0xc0000c8058, 0x1bdbda0, 0xc00013c240, 0xc0003f5c00, 0xc0002997c0, 0x0, 0x0, 0x1, 0xc000052500)
    /go/src/github.com/openshift/image-registry/pkg/dockerregistry/server/app.go:138 +0x2d4
github.com/openshift/image-registry/pkg/cmd/dockerregistry.NewServer(0x1c03b20, 0xc0000c8058, 0xc0003f5c00, 0xc0002997c0, 0x0, 0x0, 0x1c3ef20)
    /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:210 +0x1c2
github.com/openshift/image-registry/pkg/cmd/dockerregistry.Execute(0x1bc63e0, 0xc00013c018)
    /go/src/github.com/openshift/image-registry/pkg/cmd/dockerregistry/dockerregistry.go:164 +0xa42
main.main()
    /go/src/github.com/openshift/image-registry/cmd/dockerregistry/main.go:93 +0x49c

Then add container back, pod is running


Version-Release number of selected component (if applicable):
4.3.0-0.nightly-2019-12-13-180405

How reproducible:
Always

Steps to Reproduce:
1.Change managementState to Removed, then image-registry Pod terminates;
And there is no input in "container" as below:
  storage:
    swift:
      authURL: https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13000/v3
      authVersion: "3"
      container: ""
      domain: redhat.com
      domainID: ""
      regionName: regionOne
      tenant: openshift-qe-jenkins
      tenantID: 542c6ebd48bf40fa857fc245c7572e30
2.Change back to Managed
3.Check pod status

Actual results:
Pod cannot start up and has panic inside pod log.

Expected results:
Pod should start up

Additional info:
1. Workaround: manaully add container name back, pod will be running;
2. There are such error inside operator log:
E1220 06:40:15.360862      12 controller.go:257] unable to sync: unable to remove storage: Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>, Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>, requeuing
E1220 06:40:17.262434      12 controller.go:257] unable to sync: unable to remove storage: Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>, Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>, requeuing
I1220 06:40:19.322896      12 controller.go:216] object changed: *v1.Config, Name=cluster (status=true): changed:status.conditions.0.lastTransitionTime={"2019-12-20T06:38:56Z" -> "2019-12-20T06:40:18Z"}, removed:status.conditions.0.message="Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>", changed:status.conditions.0.reason={"Request forbidden: [DELETE https://rhos-d.infra.prod.upshift.rdu2.redhat.com:13808/v1/AUTH_542c6ebd48bf40fa857fc245c7572e30/], error message: <html><h1>Forbidden</h1><p>Access was denied to this resource.</p></html>" -> "Swift container Exists"}, changed:status.conditions.0.status={"Unknown" -> "True"}, changed:status.conditions.1.lastTransitionTime={"2019-12-20T06:39:09Z" -> "2019-12-20T06:40:19Z"}, changed:status.conditions.1.message={"The registry is removed" -> "The deployment does not have available replicas"}, changed:status.conditions.1.reason={"Removed" -> "NoReplicasAvailable"}, changed:status.conditions.1.status={"True" -> "False"}, changed:status.conditions.2.lastTransitionTime={"2019-12-20T06:40:17Z" -> "2019-12-20T06:40:19Z"}, changed:status.conditions.2.message={"All registry resources are removed" -> "The deployment has not completed"}, changed:status.conditions.2.reason={"Removed" -> "DeploymentNotCompleted"}, changed:status.conditions.2.status={"False" -> "True"}, changed:status.conditions.4.lastTransitionTime={"2019-12-20T06:38:56Z" -> "2019-12-20T06:40:19Z"}, removed:status.conditions.4.message="The registry is removed", removed:status.conditions.4.reason="Removed", changed:status.conditions.4.status={"True" -> "False"}, changed:status.observedGeneration={"11.000000" -> "12.000000"}

Comment 10 Wenjing Zheng 2020-03-26 03:18:33 UTC
Verified on 4.5.0-0.nightly-2020-03-25-223812.

Comment 12 errata-xmlrpc 2020-07-13 17:12:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.