Bug 1671816 - Error: secrets "image-registry-private-configuration" not found
Summary: Error: secrets "image-registry-private-configuration" not found
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image Registry
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Ben Parees
QA Contact: Wenjing Zheng
URL:
Whiteboard:
Depends On:
Blocks: 1664187
TreeView+ depends on / blocked
 
Reported: 2019-02-01 18:06 UTC by jooho lee
Modified: 2019-10-22 15:16 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2019-06-04 10:42:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
image registry operator log (8.69 KB, text/plain)
2019-02-01 18:53 UTC, jooho lee
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:43:52 UTC

Description jooho lee 2019-02-01 18:06:21 UTC
Description of problem:
After OCP 4 installation on AWS, I checked pods and found image-registry pods have issues.

The error messages are "Error: secrets "image-registry-private-configuration" not found" and there is not the secret in the project.

How can I create the secret?

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Create a cluster with installer
2.
3.

Actual results:
image-registry pod fail to run.


Expected results:
image-registry pod should run without errors.


Additional info:

Comment 1 Ben Parees 2019-02-01 18:25:30 UTC
The secret should be created automatically, please provide the registry operator pod logs and the registry config resource yaml.

Comment 2 jooho lee 2019-02-01 18:53:02 UTC
Hi Ben,

I just recreated the cluster again to make sure this issue.

It turned out I can reproduce the issue all the time.

I will attach operator pod logs but what is registry config resource yaml file?

-Jooho

Comment 3 jooho lee 2019-02-01 18:53:34 UTC
Created attachment 1525927 [details]
image registry operator log

Comment 4 Ben Parees 2019-02-01 19:03:08 UTC
you can retrieve the registry config via:

$ oc get configs.imageregistry.operator.openshift.io/instance -o yaml

Comment 5 jooho lee 2019-02-01 19:07:04 UTC
Thanks.

Here is the data.

~~~
apiVersion: imageregistry.operator.openshift.io/v1
kind: Config
metadata:
  creationTimestamp: 2019-02-01T18:49:33Z
  finalizers:
  - imageregistry.operator.openshift.io/finalizer
  generation: 1
  name: instance
  resourceVersion: "18351"
  selfLink: /apis/imageregistry.operator.openshift.io/v1/configs/instance
  uid: 1d2d64a3-2652-11e9-b5fa-0207f390ebee
spec:
  httpSecret: 61ad07f5cdb8105fc626806f9bfb0172702534ef81f7137c02cc3804431364fd41b2d6a1357768b99392784e337eecd61a29bbe61ede42e1e20721c53caf4482
  logging: 2
  managementState: Managed
  proxy: {}
  replicas: 1
  requests:
    read: {}
    write: {}
  storage:
    s3: {}
status:
  conditions:
  - lastTransitionTime: 2019-02-01T18:49:33Z
    message: Deployment does not have available replicas
    status: "False"
    type: Available
  - lastTransitionTime: 2019-02-01T18:49:33Z
    message: 'Unable to apply resources: unable to sync secrets: timed out waiting
      for the condition'
    status: "True"
    type: Progressing
  - lastTransitionTime: 2019-02-01T18:49:33Z
    status: "False"
    type: Failing
  - lastTransitionTime: 2019-02-01T18:49:33Z
    status: "False"
    type: Removed
  generations: null
  internalRegistryHostname: ""
  observedGeneration: 1
  readyReplicas: 0
  storage: {}
  storageManaged: false
  version: ""
~~~

Comment 6 Ben Parees 2019-02-01 19:33:52 UTC
This cluster seems significantly old, we've fixed several issues related to this since v4.0.0-0.148.0.0-dirty


Please install a newer cluster version (looks like .153 is the latest).

Comment 7 jooho lee 2019-02-01 19:37:13 UTC
I use openshift installer 0.11.0 that is the latest.

Should I change the cluster version manually?

If so, please let me know how to change it.

Thanks,
Jooho Lee

Comment 8 Ben Parees 2019-02-01 19:42:39 UTC
> I use openshift installer 0.11.0 that is the latest.
> Should I change the cluster version manually?
> If so, please let me know how to change it.


That would be a question for the install team, maybe Trevor can answer it.  On origin it's simply a matter of using the master openshift-installer branch.  Not sure what the OCP process is.

Comment 9 jooho lee 2019-02-01 19:53:17 UTC
Look like the OpenShift try to updating the operatator but it failed with this error:

```
Unable to apply resources: unable to apply objects: failed to create object *v1.Image, Name=cluster: images.config.openshift.io "cluster" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update
```

is there a way to change it manually?

-Jooho

Comment 10 Ben Parees 2019-02-01 20:03:29 UTC
No, that is an issue in the operator itself which is fixed in newer releases.  The only resolution is to move to a newer openshift release.

Comment 11 jooho lee 2019-02-01 22:50:11 UTC
@Trevor,

I tried to use the latest version after building the tool on my end.

this is using 4.0.0-24-g29c4cc2-dirty but still have the issue..

Moreover, it has more issues .

1. Cluster Setting show following messages:

Could not retrieve updates. Unable to retrieve available updates: Get http://localhost:8080/graph: dial tcp [::1]:8080: connect: connection refused


2.machine-config-operator failed with following errors.

 FailingFailed when progressing towards 3.11.0-543-g6c3e3e6a-dirty because: error syncing: timed out waiting for the condition during syncRequiredMachineConfigPools: error pool master is not ready. status: (total: 3, updated: 0, unavailable: 1)


-Jooho

Comment 12 W. Trevor King 2019-02-06 03:26:31 UTC
> I use openshift installer 0.11.0 that is the latest.

That sould have installed 4.0.0-0.2.  And we just cut installer v0.12.0 pinning update payload 4.0.0-0.3.  How were you getting v4.0.0-0.148.0.0-dirty?  Anyhow, try 0.12.0, which monitors CVO progress and should make debugging operator issues easier.

Comment 13 Ben Parees 2019-02-06 03:50:38 UTC
I think the 0.148.0 is the operator version.  How/why our operator version information is different from the payload version i'm not sure (I need to learn more about our release process...).

Comment 14 W. Trevor King 2019-02-06 03:50:58 UTC
Ah, v4.0.0-0.148.0.0-dirty is the operator version.  Installer v0.12.0's pinned update payload 4.0.0-0.3 looks like it pins registry operator v4.0.0-0.150.0.0-dirty [1].  In comment 6, Ben places the fix between 148 and 153, so I'm not sure if it has the fix or not.  Ben, was the fix in [2]?

[1]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/3674/artifacts/release-e2e-aws/clusteroperators.json
[2]: https://github.com/openshift/cluster-image-registry-operator/pull/170

Comment 15 W. Trevor King 2019-02-06 03:54:37 UTC
> How/why our operator version information is different from the payload version i'm not sure (I need to learn more about our release process...).

I dunno where operator versions come from.  Clayton picks update-payload versions when he pushes to quay.io.  The installer team picks installer versions when we push intaller tags to GitHub.

Comment 16 Ben Parees 2019-02-06 04:01:07 UTC
The operator versions come from something the ART team does when they tag the commits in dist-git during the release build process.  (The operator code itself uses the current git tag to determine its version)

I'm surprised 0.12 is picking up such an old version still, given that 0.153 was the latest 4 days ago and 0.12 was just created?  Is there a lengthy QE/vetting process?


In any case I believe this in particular was the credential minter issue in which the credential minter creds/secrets were getting deleted by garbage collection because the cred minter operator was tagging ownerrefs that crossed namespaces.  Devan fixed it (fix went into the credential operator) so he'd have to tell us which specific dist-git tag included the fix.

Comment 17 W. Trevor King 2019-02-06 04:31:26 UTC
> I'm surprised 0.12 is picking up such an old version still, given that 0.153 was the latest 4 days ago and 0.12 was just created?  Is there a lengthy QE/vetting process?

There is a QE process (although less this week with the new year).  But that update payload was selected on Friday when it was fairly young, and the OCP builds haven't passed CI since then [1].  I expect the release pipeline will tighten up as we get used to ART releases.

[1]: https://openshift-release.svc.ci.openshift.org

Comment 18 jooho lee 2019-02-06 22:25:02 UTC
I tested with 0.12.0 but still this issue is around.

~~~
I0206 22:13:05.498224       1 main.go:24] Cluster Image Registry Operator Version: v4.0.0-0.150.0.0-dirty
~~~

Comment 19 jooho lee 2019-02-06 22:34:15 UTC
This is event logs

~~~
LAST SEEN   TYPE      REASON              KIND         MESSAGE
19m         Normal    Scheduled           Pod          Successfully assigned openshift-image-registry/cluster-image-registry-operator-6d6b45bfdf-qq76v to ip-10-0-36-239.us-east-2.compute.internal
19m         Normal    Pulling             Pod          pulling image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419bbec1250bbd9214b18059d1621f28d3cdcac5a7e757cf3ada69a7e0b55679"
19m         Normal    Pulled              Pod          Successfully pulled image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:419bbec1250bbd9214b18059d1621f28d3cdcac5a7e757cf3ada69a7e0b55679"
19m         Normal    Created             Pod          Created container
19m         Normal    Started             Pod          Started container
20m         Warning   FailedCreate        ReplicaSet   Error creating: No API token found for service account "cluster-image-registry-operator", retry after the token is automatically created and added to the service account
19m         Normal    SuccessfulCreate    ReplicaSet   Created pod: cluster-image-registry-operator-6d6b45bfdf-qq76v
20m         Normal    ScalingReplicaSet   Deployment   Scaled up replica set cluster-image-registry-operator-6d6b45bfdf to 1
~~~

Comment 20 W. Trevor King 2019-02-06 23:11:24 UTC
> In any case I believe this in particular was the credential minter issue in which the credential minter creds/secrets were getting deleted by garbage collection because the cred minter operator was tagging ownerrefs that crossed namespaces.

That sounds like [1].  Recent credentials operator master commits:

  $ git log --first-parent --oneline -7 origin/master
  0798caf Merge pull request #28 from dgoodwin/updated-docs
  0d85290 Merge pull request #27 from dgoodwin/oom-memory-bump
  36a39e8 Merge pull request #25 from joelddiaz/byo-aws-verbs
  8567ac5 Merge pull request #26 from dgoodwin/secret-hotloop
  df8cdda Merge pull request #16 from Miciah/NE-140-openshift-ingress-change-namespace-and-permission
  7b5bec4 Merge pull request #23 from joelddiaz/update-controller-runtime
  94ce207 Merge pull request #15 from joelddiaz/cr-conditions
  $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-01-30-145955 | grep credential
    cloud-credential-operator                     https://github.com/openshift/cloud-credential-operator                     94ce2075731d1d031f0e36664e49887c13c75ca5

so yeah, 2019-01-30-145955 was too early.  Sampling [2]:

  $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-01-31-184459 | grep credential
    cloud-credential-operator                     https://github.com/openshift/cloud-credential-operator                     94ce2075731d1d031f0e36664e49887c13c75ca5
  $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-06-035427
  error: image does not exist
  $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-06-214833
  error: image does not exist
  $ oc adm release info --commits registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-06-225535
  error: image does not exist

So it looks like there are no OCP builds with the fixed credential operator, and recent OCP update payloads are missing entirely despite showing up in [2].  The ART team understood the problem, but it sounded unresolved as of yesterday.  I'm not sure if there's a tracking issue for it or not.

[1]: https://github.com/openshift/cloud-credential-operator/commit/22b0b0781a799b83765a589bad1a74e200932862
[2]: https://openshift-release.svc.ci.openshift.org/

Comment 23 XiuJuan Wang 2019-02-28 10:00:19 UTC
Can't reproduce this bug with installer version
             name=openshift/ose-installer
             release=1
             version=v4.0.6

$ oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE     STATUS
version   4.0.0-0.nightly-2019-02-28-054829   True        False         37m       Cluster version is 4.0.0-0.nightly-2019-02-28-054829

$ oc get secret  -n openshift-image-registry   | grep image
cluster-image-registry-operator-dockercfg-hqbkg   kubernetes.io/dockercfg               1         50m
cluster-image-registry-operator-token-dw8cf       kubernetes.io/service-account-token   3         52m
cluster-image-registry-operator-token-mk5b8       kubernetes.io/service-account-token   3         52m
image-registry-private-configuration              Opaque                                2         51m
image-registry-tls                                kubernetes.io/tls                     2         51m

Comment 26 errata-xmlrpc 2019-06-04 10:42:30 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.