Hide Forgot
Description of problem: When I installed cluster and went to see image-registry pods, I found those in "CreateContainerConfigError" state. However, other pods are up and running. # oc get pods NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-684499b66b-4x5nd 1/1 Running 0 3h image-registry-5d74797757-b2llp 0/1 CreateContainerConfigError 0 33m image-registry-897d8cbdd-6khfl 0/1 CreateContainerConfigError 0 ---------------------------------------- # oc get configs.imageregistry.operator.openshift.io/instance -o yaml -n openshift-image-registry apiVersion: imageregistry.operator.openshift.io/v1 kind: Config metadata: creationTimestamp: 2019-01-29T07:52:51Z finalizers: - imageregistry.operator.openshift.io/finalizer generation: 1 name: instance resourceVersion: "16328" selfLink: /apis/imageregistry.operator.openshift.io/v1/configs/instance uid: e107e541-239a-11e9-ab16-02f627312978 spec: httpSecret: 4edf243191160ae3563d2681b80654734c53210e99241e6e077b2ea8ac03c53a0859fbedbb0aec96461f87dbb4e6ddf942998a74c447dce3f446ca932 logging: 2 managementState: Managed proxy: {} replicas: 1 requests: read: {} write: {} storage: s3: {} status: conditions: - lastTransitionTime: 2019-01-29T07:52:52Z message: Deployment does not have available replicas status: "False" type: Available - lastTransitionTime: 2019-01-29T07:52:52Z message: 'Unable to apply resources: unable to sync secrets: timed out waiting for the condition' status: "True" type: Progressing - lastTransitionTime: 2019-01-29T07:52:52Z status: "False" type: Failing - lastTransitionTime: 2019-01-29T07:52:52Z status: "False" type: Removed generations: null internalRegistryHostname: "" observedGeneration: 1 readyReplicas: 0 storage: {} storageManaged: false version: "" # oc get events LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE 1h 1h 12 image-registry-5d74797757-b2llp.157e4b3329abc697 Pod spec.containers{registry} Warning Failed kubelet, xyz.internal Error: secrets "image-registry-private-configuration" not found ---------------------------------------- Version-Release number of selected component (if applicable): oc v4.0.0-0.147.0 kubernetes v1.11.0+dde478551e Installer: $ openshift-install version ./openshift-install v0.11.0 Expected results: image-registry pods should be up and running as soon as the cluster is installed. Additional info: https://github.com/openshift/installer/issues/1138
Oleg, not sure if this would be resolved by your recent reordering of the secret sync'ing step.... That said, this should have resolved soon after anyway via a re-deploy once the secret was created. Rutvik, what platform were you installing on? AWS or Libvirt or something else?
Hello Ben, Thanks for the followup. It's AWS and Linux-amd64
The code that could produce 'unable to sync secrets' has been removed from the master today. But if it hadn't, we would have need logs of the operator to know what the operator is doing.
Rutvik and it never proceeded? We would have expected this to potentially be the initial state but then the secret should have been created and the registry deployed.
Looking at the code before Oleg's changes, the only place where I can see that we'd get a timeout like that would have been in s3's syncsecrets which does a poll looking for the cloud credentials... so this would mean the credential minter never populated the credentials for us. I don't know what the code looks like now, but we need to make sure if this happens again we report a better error than just a timeout. Assigning to Corey to fix this to report a better error when we can't locate the credentials.
Hi Ben Parees, This is an issue while installing on aws with the lastest installer 0.11.0 and there is no error with registry deployment in 0.10.1 In 0.11.0 the below secret is missing: # oc describe secret image-registry-private-configuration Name: image-registry-private-configuration Namespace: openshift-image-registry Labels: <none> Annotations: <none> Type: Opaque Data ==== REGISTRY_STORAGE_S3_ACCESSKEY: 20 bytes REGISTRY_STORAGE_S3_SECRETKEY: 40 bytes Logs from the non-working environment: # journalctl -u kubelet.service Jan 29 15:42:08 ip hyperkube[4426]: E0129 15:42:08.299166 4426 kuberuntime_manager.go:737] container start failed: CreateContainerConfigError: secrets "image-registry-private-configuration" not found Jan 29 15:42:08 ip hyperkube[4426]: E0129 15:42:08.299213 4426 pod_workers.go:186] Error syncing pod 3f384ab3-23db-11e9-a944-0661877c94b8 ("image-registry-64c69c9b8d-7hjmf_openshift-image-registry(3f384ab3-23db-11e9-a944-0661877c94b8)"), skipping: failed to "StartContainer" for "registry" with CreateContainerConfigError: "secrets \"image-registry-private-configuration\" not found" # oc logs -f cluster-image-registry-operator-684499b66b-wbm6b I0129 15:33:47.277921 1 main.go:24] Cluster Image Registry Operator Version: v4.0.0-0.148.0.0-dirty I0129 15:33:47.278018 1 main.go:25] Go Version: go1.10.3 I0129 15:33:47.278028 1 main.go:26] Go OS/Arch: linux/amd64 I0129 15:33:47.299125 1 controller.go:371] waiting for informer caches to sync I0129 15:33:48.502130 1 controller.go:380] started events processor E0129 15:38:48.534646 1 controller.go:208] unable to sync: unable to sync secrets: timed out waiting for the condition, requeuing W0129 15:39:16.339626 1 reflector.go:272] k8s.io/client-go/informers/factory.go:130: watch of *v1.ConfigMap ended with: too old resource version: 90931 (94049) W0129 15:43:29.321787 1 reflector.go:272] k8s.io/client-go/informers/factory.go:130: watch of *v1.ConfigMap ended with: too old resource version: 90931 (96485) E0129 15:43:48.570537 1 controller.go:208] unable to sync: unable to sync secrets: timed out waiting for the condition, requeuing W0129 15:45:10.396731 1 reflector.go:272] github.com/openshift/client-go/route/informers/externalversions/factory.go:101: watch of *v1.Route ended with: very short watch: github.com/openshift/client-go/route/informers/externalversions/factory.go:101: Unexpected watch close - watch lasted less than a second and no items received
I'm pretty sure this is indicative that the credential minter is not minting our creds. Can you look at all the secrets that exist in openshift-image-registry? I believe there should be one from the credential minter, named "installer-cloud-credentials" And then work with Corey Daley and Devan Goodwin.
cleaning up the error reporting here, but it doesn't address the underlying failure: https://github.com/openshift/cluster-image-registry-operator/pull/176
Devan resolved some issues w/ cred minter, so i'm going to optimistically put this back to QE. https://github.com/openshift/cloud-credential-operator/pull/26
Hello Ben, Pods are still in the same position as they were after installation. Let me know if required any more information from that environment.
The last release for origin was built 12 hours ago, so this fix probably hasn't made it in yet to what you are running (if you are using the installer)
Rutvik if you want to look at anything, you can look at the same thing I asked Abhishek in comment 7: https://bugzilla.redhat.com/show_bug.cgi?id=1670357#c7 I expect you will either not see a credential, if you watch repeatedly you'll see the credential is being deleted + recreated repeatedly.
Hi! I have just deployed the clsuter to verify and here are the list of all the secrets present on my cluster deployed with latest installer: ~~~ $ oc get secrets NAME TYPE DATA AGE builder-dockercfg-mkz67 kubernetes.io/dockercfg 1 2h builder-token-fbsxd kubernetes.io/service-account-token 3 2h builder-token-r6xjd kubernetes.io/service-account-token 3 2h cluster-image-registry-operator-dockercfg-rsqs7 kubernetes.io/dockercfg 1 2h cluster-image-registry-operator-token-svhs9 kubernetes.io/service-account-token 3 2h cluster-image-registry-operator-token-z9zd6 kubernetes.io/service-account-token 3 2h default-dockercfg-zcmkh kubernetes.io/dockercfg 1 2h default-token-v26hv kubernetes.io/service-account-token 3 2h default-token-zsg4x kubernetes.io/service-account-token 3 2h deployer-dockercfg-6mzcc kubernetes.io/dockercfg 1 2h deployer-token-9vd49 kubernetes.io/service-account-token 3 2h deployer-token-qw87f kubernetes.io/service-account-token 3 2h image-registry Opaque 0 2h image-registry-tls kubernetes.io/tls 2 2h node-ca-dockercfg-dbh8r kubernetes.io/dockercfg 1 2h node-ca-token-flfl2 kubernetes.io/service-account-token 3 2h node-ca-token-hv8mx kubernetes.io/service-account-token 3 2h registry-dockercfg-gvwhr kubernetes.io/dockercfg 1 2h registry-token-lq9vx kubernetes.io/service-account-token 3 2h registry-token-xbjs4 kubernetes.io/service-account-token 3 2h ~~~ The secret does not exists. What information is expected in that secret. Can we manually create it? Thanks.
Sorry but you're just going to have to wait for the credential minter fix to make it into a build for you to pick up. I'm assigning this to Devan so it comes back to him if the issue is not resolved w/ his fix. There's nothing that can be done from a registry operator perspective, short of configuring the operator to use a custom secret for the aws credentials, and I don't think that's a productive use of anyone's time right now.
Hi Ben, I was monitoring this from the last couple of days and this secret "installer-cloud-credentials" does not exists and can't even see it in recreation state. The list of secrets present in my cluster is same as above mentioned by Sudarshan in #13. Nothing has changed after installation.
> I was monitoring this from the last couple of days and this secret "installer-cloud-credentials" does not exists and can't even see it in recreation state. yup, that's the bug, or a form of it. that secret should exist in the openshift-image-registry namespace. Hopefully Devan's fix resolves it. thanks.
I can see image-registry-private-configuration and installer-cloud-credentials as below: [wzheng@openshift-qe 4.0]$ oc get secrets NAME TYPE DATA AGE builder-dockercfg-swllb kubernetes.io/dockercfg 1 28h builder-token-bh5k2 kubernetes.io/service-account-token 3 28h builder-token-pkmqg kubernetes.io/service-account-token 3 28h cluster-image-registry-operator-dockercfg-cwd2z kubernetes.io/dockercfg 1 28h cluster-image-registry-operator-token-dlmxf kubernetes.io/service-account-token 3 28h cluster-image-registry-operator-token-sdb7w kubernetes.io/service-account-token 3 28h default-dockercfg-dnwmm kubernetes.io/dockercfg 1 28h default-token-7q4nq kubernetes.io/service-account-token 3 28h default-token-k4pkq kubernetes.io/service-account-token 3 28h deployer-dockercfg-f55l7 kubernetes.io/dockercfg 1 28h deployer-token-dwszl kubernetes.io/service-account-token 3 28h deployer-token-htns5 kubernetes.io/service-account-token 3 28h image-registry-private-configuration Opaque 2 61m image-registry-tls kubernetes.io/tls 2 28h installer-cloud-credentials Opaque 2 132m node-ca-dockercfg-g9dcr kubernetes.io/dockercfg 1 28h node-ca-token-2vfxb kubernetes.io/service-account-token 3 28h node-ca-token-w74zf kubernetes.io/service-account-token 3 28h registry-dockercfg-299jw kubernetes.io/dockercfg 1 28h registry-token-drjkf kubernetes.io/service-account-token 3 28h registry-token-llpw5 kubernetes.io/service-account-token 3 28h [wzheng@openshift-qe 4.0]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-02-17-024922 True False 28h Error while reconciling 4.0.0-0.nightly-2019-02-17-024922: the update could not be applied
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758