Bug 1670357
Summary: | image-registry pod stuck in "CreateContainerConfigError" | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Rutvik <rkshirsa> |
Component: | Image Registry | Assignee: | Oleg Bulatov <obulatov> |
Status: | CLOSED ERRATA | QA Contact: | Wenjing Zheng <wzheng> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.0 | CC: | aabhishe, aos-bugs, bparees, cdaley, gucore, rkshirsa, suchaudh |
Target Milestone: | --- | ||
Target Release: | 4.1.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-06-04 10:42:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1664187 |
Description
Rutvik
2019-01-29 11:44:09 UTC
Oleg, not sure if this would be resolved by your recent reordering of the secret sync'ing step.... That said, this should have resolved soon after anyway via a re-deploy once the secret was created. Rutvik, what platform were you installing on? AWS or Libvirt or something else? Hello Ben, Thanks for the followup. It's AWS and Linux-amd64 The code that could produce 'unable to sync secrets' has been removed from the master today. But if it hadn't, we would have need logs of the operator to know what the operator is doing. Rutvik and it never proceeded? We would have expected this to potentially be the initial state but then the secret should have been created and the registry deployed. Looking at the code before Oleg's changes, the only place where I can see that we'd get a timeout like that would have been in s3's syncsecrets which does a poll looking for the cloud credentials... so this would mean the credential minter never populated the credentials for us. I don't know what the code looks like now, but we need to make sure if this happens again we report a better error than just a timeout. Assigning to Corey to fix this to report a better error when we can't locate the credentials. Hi Ben Parees, This is an issue while installing on aws with the lastest installer 0.11.0 and there is no error with registry deployment in 0.10.1 In 0.11.0 the below secret is missing: # oc describe secret image-registry-private-configuration Name: image-registry-private-configuration Namespace: openshift-image-registry Labels: <none> Annotations: <none> Type: Opaque Data ==== REGISTRY_STORAGE_S3_ACCESSKEY: 20 bytes REGISTRY_STORAGE_S3_SECRETKEY: 40 bytes Logs from the non-working environment: # journalctl -u kubelet.service Jan 29 15:42:08 ip hyperkube[4426]: E0129 15:42:08.299166 4426 kuberuntime_manager.go:737] container start failed: CreateContainerConfigError: secrets "image-registry-private-configuration" not found Jan 29 15:42:08 ip hyperkube[4426]: E0129 15:42:08.299213 4426 pod_workers.go:186] Error syncing pod 3f384ab3-23db-11e9-a944-0661877c94b8 ("image-registry-64c69c9b8d-7hjmf_openshift-image-registry(3f384ab3-23db-11e9-a944-0661877c94b8)"), skipping: failed to "StartContainer" for "registry" with CreateContainerConfigError: "secrets \"image-registry-private-configuration\" not found" # oc logs -f cluster-image-registry-operator-684499b66b-wbm6b I0129 15:33:47.277921 1 main.go:24] Cluster Image Registry Operator Version: v4.0.0-0.148.0.0-dirty I0129 15:33:47.278018 1 main.go:25] Go Version: go1.10.3 I0129 15:33:47.278028 1 main.go:26] Go OS/Arch: linux/amd64 I0129 15:33:47.299125 1 controller.go:371] waiting for informer caches to sync I0129 15:33:48.502130 1 controller.go:380] started events processor E0129 15:38:48.534646 1 controller.go:208] unable to sync: unable to sync secrets: timed out waiting for the condition, requeuing W0129 15:39:16.339626 1 reflector.go:272] k8s.io/client-go/informers/factory.go:130: watch of *v1.ConfigMap ended with: too old resource version: 90931 (94049) W0129 15:43:29.321787 1 reflector.go:272] k8s.io/client-go/informers/factory.go:130: watch of *v1.ConfigMap ended with: too old resource version: 90931 (96485) E0129 15:43:48.570537 1 controller.go:208] unable to sync: unable to sync secrets: timed out waiting for the condition, requeuing W0129 15:45:10.396731 1 reflector.go:272] github.com/openshift/client-go/route/informers/externalversions/factory.go:101: watch of *v1.Route ended with: very short watch: github.com/openshift/client-go/route/informers/externalversions/factory.go:101: Unexpected watch close - watch lasted less than a second and no items received I'm pretty sure this is indicative that the credential minter is not minting our creds. Can you look at all the secrets that exist in openshift-image-registry? I believe there should be one from the credential minter, named "installer-cloud-credentials" And then work with Corey Daley and Devan Goodwin. cleaning up the error reporting here, but it doesn't address the underlying failure: https://github.com/openshift/cluster-image-registry-operator/pull/176 Devan resolved some issues w/ cred minter, so i'm going to optimistically put this back to QE. https://github.com/openshift/cloud-credential-operator/pull/26 Hello Ben, Pods are still in the same position as they were after installation. Let me know if required any more information from that environment. The last release for origin was built 12 hours ago, so this fix probably hasn't made it in yet to what you are running (if you are using the installer) Rutvik if you want to look at anything, you can look at the same thing I asked Abhishek in comment 7: https://bugzilla.redhat.com/show_bug.cgi?id=1670357#c7 I expect you will either not see a credential, if you watch repeatedly you'll see the credential is being deleted + recreated repeatedly. Hi! I have just deployed the clsuter to verify and here are the list of all the secrets present on my cluster deployed with latest installer: ~~~ $ oc get secrets NAME TYPE DATA AGE builder-dockercfg-mkz67 kubernetes.io/dockercfg 1 2h builder-token-fbsxd kubernetes.io/service-account-token 3 2h builder-token-r6xjd kubernetes.io/service-account-token 3 2h cluster-image-registry-operator-dockercfg-rsqs7 kubernetes.io/dockercfg 1 2h cluster-image-registry-operator-token-svhs9 kubernetes.io/service-account-token 3 2h cluster-image-registry-operator-token-z9zd6 kubernetes.io/service-account-token 3 2h default-dockercfg-zcmkh kubernetes.io/dockercfg 1 2h default-token-v26hv kubernetes.io/service-account-token 3 2h default-token-zsg4x kubernetes.io/service-account-token 3 2h deployer-dockercfg-6mzcc kubernetes.io/dockercfg 1 2h deployer-token-9vd49 kubernetes.io/service-account-token 3 2h deployer-token-qw87f kubernetes.io/service-account-token 3 2h image-registry Opaque 0 2h image-registry-tls kubernetes.io/tls 2 2h node-ca-dockercfg-dbh8r kubernetes.io/dockercfg 1 2h node-ca-token-flfl2 kubernetes.io/service-account-token 3 2h node-ca-token-hv8mx kubernetes.io/service-account-token 3 2h registry-dockercfg-gvwhr kubernetes.io/dockercfg 1 2h registry-token-lq9vx kubernetes.io/service-account-token 3 2h registry-token-xbjs4 kubernetes.io/service-account-token 3 2h ~~~ The secret does not exists. What information is expected in that secret. Can we manually create it? Thanks. Sorry but you're just going to have to wait for the credential minter fix to make it into a build for you to pick up. I'm assigning this to Devan so it comes back to him if the issue is not resolved w/ his fix. There's nothing that can be done from a registry operator perspective, short of configuring the operator to use a custom secret for the aws credentials, and I don't think that's a productive use of anyone's time right now. Hi Ben, I was monitoring this from the last couple of days and this secret "installer-cloud-credentials" does not exists and can't even see it in recreation state. The list of secrets present in my cluster is same as above mentioned by Sudarshan in #13. Nothing has changed after installation. > I was monitoring this from the last couple of days and this secret "installer-cloud-credentials" does not exists and can't even see it in recreation state.
yup, that's the bug, or a form of it. that secret should exist in the openshift-image-registry namespace. Hopefully Devan's fix resolves it. thanks.
I can see image-registry-private-configuration and installer-cloud-credentials as below: [wzheng@openshift-qe 4.0]$ oc get secrets NAME TYPE DATA AGE builder-dockercfg-swllb kubernetes.io/dockercfg 1 28h builder-token-bh5k2 kubernetes.io/service-account-token 3 28h builder-token-pkmqg kubernetes.io/service-account-token 3 28h cluster-image-registry-operator-dockercfg-cwd2z kubernetes.io/dockercfg 1 28h cluster-image-registry-operator-token-dlmxf kubernetes.io/service-account-token 3 28h cluster-image-registry-operator-token-sdb7w kubernetes.io/service-account-token 3 28h default-dockercfg-dnwmm kubernetes.io/dockercfg 1 28h default-token-7q4nq kubernetes.io/service-account-token 3 28h default-token-k4pkq kubernetes.io/service-account-token 3 28h deployer-dockercfg-f55l7 kubernetes.io/dockercfg 1 28h deployer-token-dwszl kubernetes.io/service-account-token 3 28h deployer-token-htns5 kubernetes.io/service-account-token 3 28h image-registry-private-configuration Opaque 2 61m image-registry-tls kubernetes.io/tls 2 28h installer-cloud-credentials Opaque 2 132m node-ca-dockercfg-g9dcr kubernetes.io/dockercfg 1 28h node-ca-token-2vfxb kubernetes.io/service-account-token 3 28h node-ca-token-w74zf kubernetes.io/service-account-token 3 28h registry-dockercfg-299jw kubernetes.io/dockercfg 1 28h registry-token-drjkf kubernetes.io/service-account-token 3 28h registry-token-llpw5 kubernetes.io/service-account-token 3 28h [wzheng@openshift-qe 4.0]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.nightly-2019-02-17-024922 True False 28h Error while reconciling 4.0.0-0.nightly-2019-02-17-024922: the update could not be applied Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |